Re: Number of fields in qf & fq

2015-11-20 Thread Mikhail Khludnev
Hello Steve,

debugQuery=true shows whether it's facets or query, whether it's query
parsing or searching (prepare vs process), cache statistics can tell about
its' efficiency; sometimes a problem is obvious from request parameters.
Simple sampling with jconsole or even by jstack can point on a smoking
gun.

On Fri, Nov 20, 2015 at 4:08 PM, Steven White  wrote:

> Thanks Erick.
>
> The 1500 fields is a design that I inherited.  I'm trying to figure out why
> it was done as such and what it will take to fix it.
>
> What about my other question: how does one go about debugging performance
> issues in Solr to find out where time is mostly spent?  How do I know my
> Solr parameters, such as cache and what have you are set right?  From what
> I see, we are using the defaults off solrconfig.xml.
>
> I'm on Solr 5.2
>
> Steve
>
>
> On Thu, Nov 19, 2015 at 11:36 PM, Erick Erickson 
> wrote:
>
> > An fq is still a single entry in your filterCache so from that
> > perspective it's the same.
> >
> > And to create that entry, you're still using all the underlying fields
> > to search, so they have to be loaded just like they would be in a q
> > clause.
> >
> > But really, the fundamental question here is why your design even has
> > 1,500 fields and, more specifically, why you would want to search them
> > all at once. From a 10,000 ft. view, that's a very suspect design.
> >
> > Best,
> > Erick
> >
> > On Thu, Nov 19, 2015 at 4:06 PM, Walter Underwood  >
> > wrote:
> > > The implementation for fq has changed from 4.x to 5.x, so I’ll let
> > someone else answer that in detail.
> > >
> > > In 4.x, the result of each filter query can be cached. After that, they
> > are quite fast.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > >
> > >> On Nov 19, 2015, at 3:59 PM, Steven White 
> wrote:
> > >>
> > >> Thanks Walter.  I see your point.  Does this apply to fq as will?
> > >>
> > >> Also, how does one go about debugging performance issues in Solr to
> find
> > >> out where time is mostly spent?
> > >>
> > >> Steve
> > >>
> > >> On Thu, Nov 19, 2015 at 6:54 PM, Walter Underwood <
> > wun...@wunderwood.org>
> > >> wrote:
> > >>
> > >>> With one field in qf for a single-term query, Solr is fetching one
> > posting
> > >>> list. With 1500 fields, it is fetching 1500 posting lists. It could
> > easily
> > >>> be 1500 times slower.
> > >>>
> > >>> It might be even slower than that, because we can’t guarantee that:
> a)
> > >>> every algorithm in Solr is linear, b) that all those lists will fit
> in
> > >>> memory.
> > >>>
> > >>> wunder
> > >>> Walter Underwood
> > >>> wun...@wunderwood.org
> > >>> http://observer.wunderwood.org/  (my blog)
> > >>>
> > >>>
> >  On Nov 19, 2015, at 3:46 PM, Steven White 
> > wrote:
> > 
> >  Hi everyone
> > 
> >  What is considered too many fields for qf and fq?  On average I will
> > have
> >  1500 fields in qf and 100 in fq (all of which are OR'ed).  Assuming
> I
> > can
> >  (I have to check with the design) for qf, if I cut it down to 1
> field,
> > >>> will
> >  I see noticeable performance improvement?  It will take a lot of
> > effort
> > >>> to
> >  test this which is why I'm asking first.
> > 
> >  As is, I'm seeing 2-5 sec response time for searches on an index of
> 1
> >  million records with total index size (on disk) of 4 GB.  I gave
> Solr
> > 2
> > >>> GB
> >  of RAM (also tested at 4 GB) in both cases Solr didn't use more
> then 1
> > >>> GB.
> > 
> >  Thanks in advanced
> > 
> >  Steve
> > >>>
> > >>>
> > >
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: solr indexing warning

2015-11-20 Thread Shawn Heisey
On 11/20/2015 12:33 AM, Midas A wrote:
> As we are this server as a master server  there are no queries running on
> it  . in that case should i remove these configuration from config file .

The following cache info says that there ARE queries being run on this
server:

> QueryResultCache:
> 
> lookups:3841
> hits:0
> hitratio:0.00
> inserts:4841
> evictions:3841
> size:1000
> warmupTime:213
> cumulative_lookups:58438
> cumulative_hits:153
> cumulative_hitratio:0.00
> cumulative_inserts:58285
> cumulative_evictions:57285

These queries might be related to indexing, and not actual user
searches.  On my indexes, I query for the existence of the documents I'm
about to delete, to make sure there's actually a need to run the delete.

This is the only cache that has a nonzero warmupTime, but it only took a
fifth of a second to warm 1000 queries, so this is not a problem.  It
has a very low hit ratio, so you could disable it and not really see a
performance difference.

Emir asked how you're doing your commits.  I'd like to know the same
thing, as well as how frequently you're doing them.

This is the best guide out there regarding commits:

http://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

One of the best pieces of advice on that page is this:

---
Don't listen to your product manager who says "we need no more than 1
second latency". Really.
---

Another piece of advice on that page is to set the hard commit
(autoCommit) interval to 15 seconds.  I personally think this is too
frequent, but many people are using that configuration and have reported
no problems with it.

Thanks,
Shawn



CloudSolrClient + Java8 Streams

2015-11-20 Thread Jürgen Jakobitsch
hi,

i was wondering if anyone has already create java8 streams
from a CloudSolrClient, a la

SolrQuery query = new SolrQuery();
  query.setQuery("id:*");

java.util.stream.Stream docs = cloudSolrClient.stream(query);

any pointer greatly appreciated.

wkr j

*Jürgen Jakobitsch*
Innovation Director
Semantic Web Company GmbH
EU: +43-1-4021235-EXT
Mobile: +43-000-000
US: (415) 800-3776
http://www.semantic-web.at
http://www.poolparty.biz


PERSONAL INFORMATION
| web   : http://www.turnguard.com
| foaf  : http://www.turnguard.com/turnguard
| g+: https://plus.google.com/111233759991616358206/posts
| skype : jakobitsch-punkt
| xmlns:tg  = "http://www.turnguard.com/turnguard#;


Using facets and stats with solr v4

2015-11-20 Thread Ian Harrigan
Hi guys

So i have a question about using facet queries but getting stats for each
facet item, it seems this is possible on solr v5+. Something like this:

 

q=*:*=true={!stats=t1}servicename={!tag=t1}dur
ation=0=true=json=true

 

It also seems this isnt avilable for lower verison (v4), so is there anyway
to acheive similar as we as stuck on v4?

Any help / advice / pointers would be great! - thanks in advance

Ian

 



Re: Using facets and stats with solr v4

2015-11-20 Thread Shalin Shekhar Mangar
Hi Ian,

Yes you are right. This feature is only available in 5.3.0. It may be
possible to backport the patches on SOLR-4212 to the 4.x release
branch but I am afraid you are on your own on this one.

On Fri, Nov 20, 2015 at 7:16 PM, Ian Harrigan  wrote:
> Hi guys
>
> So i have a question about using facet queries but getting stats for each
> facet item, it seems this is possible on solr v5+. Something like this:
>
>
>
> q=*:*=true={!stats=t1}servicename={!tag=t1}dur
> ation=0=true=json=true
>
>
>
> It also seems this isnt avilable for lower verison (v4), so is there anyway
> to acheive similar as we as stuck on v4?
>
> Any help / advice / pointers would be great! - thanks in advance
>
> Ian
>
>
>



-- 
Regards,
Shalin Shekhar Mangar.


AnalyzingInfixLookupFactory, Edgengramm with multiple terms

2015-11-20 Thread Szűcs Roland
Hi all,

I have a working suggester compnenet and requesthandler in my Solr 5.2.1
instance. It is working as I expected but I need a solution which handles
multiple query terms "correctly".

I have string field title. Let's see the following case:
title 1: Green Apple Color
title 2: Apple the master of innovation
title 3: Apple the master of presentation.
Using Edgengramm minsize3 for the copy of the string title field I get the
following:
suggest.q=''Appl", all documents are matched , fine.

suggest.q=''Apple inno", all documents are matched, wrong as the user
expectation is to have only title 2 matched

Is there any way to make the suggester component smarter to handle multi
term queries as user expect. AnalyzingInfixLookupFactory was a great
improvement to handle terms not only from the beginning of the expressions
but from the middle or the end.

I think if we can apply "AND" relationship among the multi-terms query
match like in case of normal queries it can help.

Any idea is appreciated
-- 
Szűcs Roland
Ismerkedjünk
meg a Linkedin 
-en ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu



Re: RealTimeGetHandler doesn't retrieve documents

2015-11-20 Thread Shawn Heisey
On 11/20/2015 5:21 AM, Alexandre Rafalovitch wrote:
> Actually I think / is a special character as of recent version of Solr.
> Can't remember why though.

Surrounding a query with slashes, at least when using the standard
parser, makes it a regex query.  I don't know if this happens with any
of the other parsers.

Thanks,
Shawn



Re: RealTimeGetHandler doesn't retrieve documents

2015-11-20 Thread Alexandre Rafalovitch
Actually I think / is a special character as of recent version of Solr.
Can't remember why though.

This could be the kind of things that would trigger an edge case bug.

What happens if you request id3,id2,id1? In the opposite order? Are the
same documents missing? Or same by request position? If later, it is
definitely something about syntax.
On 20 Nov 2015 3:27 am,  wrote:

> Thanks for your answer...
>
> I don't think it's a problem due tp special characters. All my IDs as the
> same format : "/ABCDZ123/123456" with no character with need to be escaped.
>
> And when I use a "normal" query on the key field, it works fine, SolR
> found the document...
>
> Cordialement,
> Monsinjon Jeremie
>
> -Original Message-
> From: Jack Krupansky 
> Date: Thu, 19 Nov 2015 16:50:14
> To: 
> Reply-To: solr-user@lucene.apache.org
> Subject: Re: RealTimeGetHandler doesn't retrieve documents
>
> Do the failing IDs have any special characters that might need to be
> escaped?
>
> Can you find the documents using a normal query on the unique key field?
>
> -- Jack Krupansky
>
> On Thu, Nov 19, 2015 at 10:27 AM, Jérémie MONSINJON <
> jeremie.monsin...@gmail.com> wrote:
>
> > Hello everyone !
> >
> > I'm using SolR 5.3.1 with solrj.SolrClient.
> > My index is sliced in 3 shards, each on different server. (No replica on
> > dev platform)
> > It has been up to date for a few days...
> >
> > [image: Images intégrées 2]
> >
> > I'm trying to use the RealTimeGetHandler to get documents by their Id.
> > In our usecase, documents are updated very frequently,  so  we have to
> > look in the tlog before searching the index.
> >
> > When I use the SolrClient.getById() (with a list of document Ids recently
> > extracted from the index)
> >
> > SolR doesn't return *all* the documents corresponding to these Ids.
> > So I tried to use the Solr api directly:
> >
> > http://server:port/solr/index/get/ids=id1,id2,id3
> > And this is the same. Some ids don't works.
> >
> > In my example, id1 doesn't return a document, id2 and id3 or OK.
> >
> > If I try a filtered query with the id1, it works fine, the document
> exists
> > in the index and is found by SolR
> >
> > Can anybody explain why a document, present in the index, with no
> > uncommited update or delete, is not found by the Real Time Get Handler ?
> >
> > Regards,
> > Jeremie
> >
> >
>
>


Re: Parallel SQL / calcite adapter

2015-11-20 Thread Joel Bernstein
After reading https://calcite.apache.org/docs/tutorial.html, I think it
should be possible to use the Solr's JDBC Driver with Calcites JDBC adapter.

If you give it a try and run into any problems, please create a jira.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Nov 19, 2015 at 7:58 PM, Joel Bernstein  wrote:

> It's an interesting question. The JDBC driver is still very basic. It
> would depend on how much of the JDBC spec needs to be implemented to
> connect to Calcite/Drill.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Nov 19, 2015 at 3:28 AM, Kai Gülzau  wrote:
>
>>
>> We are currently evaluating calcite as a SQL facade for different Data
>> Sources
>>
>> -  JDBC
>>
>> -  REST
>>
>> >SOLR
>>
>> -  ...
>>
>> I didn't found a "native" calcite adapter for solr (
>> http://calcite.apache.org/docs/adapter.html).
>>
>> Is it a good idea to use the parallel sql feature (over jdbc) to connect
>> calcite (or apache drill) to solr?
>> Any suggestions?
>>
>>
>> Thanks,
>>
>> Kai Gülzau
>>
>
>


Re: Number of fields in qf & fq

2015-11-20 Thread Steven White
Thanks Erick.

The 1500 fields is a design that I inherited.  I'm trying to figure out why
it was done as such and what it will take to fix it.

What about my other question: how does one go about debugging performance
issues in Solr to find out where time is mostly spent?  How do I know my
Solr parameters, such as cache and what have you are set right?  From what
I see, we are using the defaults off solrconfig.xml.

I'm on Solr 5.2

Steve


On Thu, Nov 19, 2015 at 11:36 PM, Erick Erickson 
wrote:

> An fq is still a single entry in your filterCache so from that
> perspective it's the same.
>
> And to create that entry, you're still using all the underlying fields
> to search, so they have to be loaded just like they would be in a q
> clause.
>
> But really, the fundamental question here is why your design even has
> 1,500 fields and, more specifically, why you would want to search them
> all at once. From a 10,000 ft. view, that's a very suspect design.
>
> Best,
> Erick
>
> On Thu, Nov 19, 2015 at 4:06 PM, Walter Underwood 
> wrote:
> > The implementation for fq has changed from 4.x to 5.x, so I’ll let
> someone else answer that in detail.
> >
> > In 4.x, the result of each filter query can be cached. After that, they
> are quite fast.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >> On Nov 19, 2015, at 3:59 PM, Steven White  wrote:
> >>
> >> Thanks Walter.  I see your point.  Does this apply to fq as will?
> >>
> >> Also, how does one go about debugging performance issues in Solr to find
> >> out where time is mostly spent?
> >>
> >> Steve
> >>
> >> On Thu, Nov 19, 2015 at 6:54 PM, Walter Underwood <
> wun...@wunderwood.org>
> >> wrote:
> >>
> >>> With one field in qf for a single-term query, Solr is fetching one
> posting
> >>> list. With 1500 fields, it is fetching 1500 posting lists. It could
> easily
> >>> be 1500 times slower.
> >>>
> >>> It might be even slower than that, because we can’t guarantee that: a)
> >>> every algorithm in Solr is linear, b) that all those lists will fit in
> >>> memory.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>
>  On Nov 19, 2015, at 3:46 PM, Steven White 
> wrote:
> 
>  Hi everyone
> 
>  What is considered too many fields for qf and fq?  On average I will
> have
>  1500 fields in qf and 100 in fq (all of which are OR'ed).  Assuming I
> can
>  (I have to check with the design) for qf, if I cut it down to 1 field,
> >>> will
>  I see noticeable performance improvement?  It will take a lot of
> effort
> >>> to
>  test this which is why I'm asking first.
> 
>  As is, I'm seeing 2-5 sec response time for searches on an index of 1
>  million records with total index size (on disk) of 4 GB.  I gave Solr
> 2
> >>> GB
>  of RAM (also tested at 4 GB) in both cases Solr didn't use more then 1
> >>> GB.
> 
>  Thanks in advanced
> 
>  Steve
> >>>
> >>>
> >
>


Re: solr indexing warning

2015-11-20 Thread Emir Arnautovic

Hi,
Since this is master node, and not expected to have queries, you can 
disable caches completely. However, from numbers cache autowarm is not 
an issue here but probably frequency of commits and/or warmup queries. 
How do you do commits? Since master-slave, I don't see reason to have 
them too frequently. If you need NRT you should switch to SolrCloud. Do 
you have warmup queries? You don't need them on master node.


Regards,
Emir

On 20.11.2015 08:33, Midas A wrote:

thanks Shawn,

As we are this server as a master server  there are no queries running on
it  . in that case should i remove these configuration from config file .

Total Docs: 40 0

Stats
#

Document cache :
lookups:823
hits:4
hitratio:0.00
inserts:820
evictions:0
size:820
warmupTime:0
cumulative_lookups:24474
cumulative_hits:1746
cumulative_hitratio:0.07
cumulative_inserts:22728
cumulative_evictions:13345


fieldcache:
stats:
entries_count:2
entry#0:'SegmentCoreReader(​owner=_3bph(​4.2.1):C3918553)'=>'_version_',long,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_LONG_PARSER=>org.apache.lucene.search.FieldCacheImpl$LongsFromArray#1919958905
entry#1:'SegmentCoreReader(​owner=_3bph(​4.2.1):C3918553)'=>'_version_',class
org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache,null=>org.apache.lucene.util.Bits$MatchAllBits#660036513
insanity_count:0


fieldValuecache:

lookups:0

hits:0

hitratio:0.00

inserts:0

evictions:0

size:0

warmupTime:0

cumulative_lookups:0

cumulative_hits:0

cumulative_hitratio:0.00

cumulative_inserts:0

cumulative_evictions:0


filtercache:


lookups:0

hits:0

hitratio:0.00

inserts:0

evictions:0

size:0

warmupTime:0

cumulative_lookups:0

cumulative_hits:0

cumulative_hitratio:0.00

cumulative_inserts:0

cumulative_evictions:0


QueryResultCache:

lookups:3841

hits:0

hitratio:0.00

inserts:4841

evictions:3841

size:1000

warmupTime:213

cumulative_lookups:58438

cumulative_hits:153

cumulative_hitratio:0.00

cumulative_inserts:58285

cumulative_evictions:57285



Please suggest .



On Fri, Nov 20, 2015 at 12:15 PM, Shawn Heisey  wrote:


On 11/19/2015 11:06 PM, Midas A wrote:


initialSize

="1000" autowarmCount="1000"/>

Your caches are quite large.  More importantly, your autowarmCount is
very large.  How many documents are in each of your cores?  If you check
the Plugins/Stats area in the admin UI for your core(s), how many
entries are actually in each of those three caches?  Also shown there is
the number of milliseconds that it took for each cache to warm.

The documentCache cannot be autowarmed, so that config is not doing
anything.

When a cache is autowarmed, what this does is look up the key for the
top N entries in the old cache, which contains the query used to
generate that cache entry, and executes each of those queries on the new
index to populate the new cache.

This means that up to 2000 queries are being executed every time you
commit and open a new searcher.  The actual number may be less, if the
filterCache and queryResultCache are not actually reaching 1000 entries
each.  Autowarming can take a significant amount of time when the
autowarmCount is high.  It should be lowered.

Thanks,
Shawn




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



RE: DIH Caching w/ BerkleyBackedCache

2015-11-20 Thread Dyer, James
Todd,

With the DIH request, are you specifying "cacheDeletePriorData=false".  Looking 
at the BerkleyBackedCache code if this is set to true, it deletes the cache and 
assumes the current update is to fully repopulate it.  If you want to do an 
incremental update to the cache, it needs to be false.  You might also need to 
specify "clean=false", but I'm not sure if this is a requirement.

I've used DIH with BerkleyBackedCache for a few years and it works well for us. 
 But rather than using it inline, we have a number of DIH handlers that just 
build caches, then when they're all built, a final DIH joins data from the 
caches and indexes it to solr.  We also do like you are, with several handlers 
running at once, each doing part of the data.

But I have to warn you this code hasn't been maintained by anyone.  I'm using 
an older DIH jar (4.6) with newer solr.  I think there might have been an api 
change or something that prevented the uncommitted caching code from working 
with newer versions, but I honestly forget.  This is probably a viable solution 
if you don't want to write any code, but it might take some trial and error 
getting it to work.

James Dyer
Ingram Content Group


-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Tuesday, November 17, 2015 8:11 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH Caching w/ BerkleyBackedCache

Mikhail Khludnev wrote
> It's worth to mention that for really complex relations scheme it might be
> challenging to organize all of them into parallel ordered streams.

This will most likely be the issue for us which is why I would like to have
the Berkley cache solution to fall back on, if possible. Again, I'm not sure
why but it appears that the Berkley cache is overwriting itself (i.e.
cleaning up unused data) when building the database... I've read plenty of
other threads where it appears folks are having success using that caching
solution.


Mikhail Khludnev wrote
> threads... you said? Which ones? Declarative parallelization in
> EntityProcessor worked only with certain 3.x version.

We are running multiple DIH instances which query against specific
partitions of the data (i.e. mod of the document id we're indexing).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4240562.html
Sent from the Solr - User mailing list archive at Nabble.com.



Additional field in facet result

2015-11-20 Thread Florin Mandoc
Hi, I have a index that contains products and tags as nested children, 
like below:


{
"id":"413863",
"ProductID":"413863",
"type":"product",
"MainProductID":"",
"SKU":"8D68595B",
"Name":"mere",
"alltext":["mere",
  "",
  "",
  "test product"],
"CustomName":"",
"CategoryID":0,
"CategoryName":"",
"MasterCategoryID":"0",
"MasterCategoryName":"",
"ShortDescription":"test product",
"UomID":"",
"Uom":"dz",
"UomSize":"0.9",
"ImageName":"",
"AllowFractionOrdering":"false",
"DisplayOrder":0,
"InternalSku":"8D68595B",
"PrivateProduct":"true",
"GroupedProduct":"",
"ProductGroupID":"-1",
"UomSizeParentID":"",
"CustomerSiteID":"4553",
"CustomerAccountID":"2323",
"IsAccountProduct":"true",
"IsVariation":"",
"Tags":["1754ztag"],
"Vendors":["4554qw"],
"_version_":1518371006298718216,
"_childDocuments_":[
{
  "id":"413863_1754",
  "type":"tag",
  "TagID":1754,
  "TagName":"ztag",
  "ProductID":"413863"},
{
  "id":"413863_4554_qw",
  "type":"vendor",
  "VendorID":4554,
  "VendorSKU":"qw",
  "ProductID":"413863"}]}


I am using this query to get the products and facets for tags:

http://localhost:8983/solr/favsearch/select?q=*:*=json=true=0=100&=*,[child%20parentFilter=type:product]=CustomerSiteID:4553={tags:{type:%20terms,field:TagName,domain:%20{%20blockChildren%20:%20%22type:product%22%20}},categories:{type:%20terms,field:CategoryID}}

The facets response is:

"facets":{
"count":15,
"tags":{
  "buckets":[{
  "val":"vendor 1",
  "count":3},
{
  "val":"vendor 2",
  "count":3},
{
  "val":"ztag",
  "count":2}]},
"categories":{
  "buckets":[{
  "val":-1,
  "count":5}]}}}


Is it possible to get in the response the field TagID also?
Or if I add another facet field for tag ids, like this

"facets":{
"count":15,
"tagNames":{
  "buckets":[{
  "val":"vendor 1",
  "count":3},
{
  "val":"vendor 2",
  "count":3},
{
  "val":"ztag",
  "count":2}]},
"tagId":{
  "buckets":[{
  "val":1752,
  "count":3},
{
  "val":1753,
  "count":3},
{
  "val":1754,
  "count":2}]},
"categories":{
  "buckets":[{
  "val":-1,
  "count":5}]}}}

can I be sure that always the first bucket from tagNames corresponds to 
the first bucket of tagId?


Thank you,
Florin



Re: Number of fields in qf & fq

2015-11-20 Thread Scott Stults
Steve,

Another thing debugQuery will give you is a breakdown of how much each
field contributed to the final score of each hit. That's going to give you
a nice shopping list of qf to weed out.


k/r,
Scott

On Fri, Nov 20, 2015 at 9:26 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello Steve,
>
> debugQuery=true shows whether it's facets or query, whether it's query
> parsing or searching (prepare vs process), cache statistics can tell about
> its' efficiency; sometimes a problem is obvious from request parameters.
> Simple sampling with jconsole or even by jstack can point on a smoking
> gun.
>
> On Fri, Nov 20, 2015 at 4:08 PM, Steven White 
> wrote:
>
> > Thanks Erick.
> >
> > The 1500 fields is a design that I inherited.  I'm trying to figure out
> why
> > it was done as such and what it will take to fix it.
> >
> > What about my other question: how does one go about debugging performance
> > issues in Solr to find out where time is mostly spent?  How do I know my
> > Solr parameters, such as cache and what have you are set right?  From
> what
> > I see, we are using the defaults off solrconfig.xml.
> >
> > I'm on Solr 5.2
> >
> > Steve
> >
> >
> > On Thu, Nov 19, 2015 at 11:36 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > An fq is still a single entry in your filterCache so from that
> > > perspective it's the same.
> > >
> > > And to create that entry, you're still using all the underlying fields
> > > to search, so they have to be loaded just like they would be in a q
> > > clause.
> > >
> > > But really, the fundamental question here is why your design even has
> > > 1,500 fields and, more specifically, why you would want to search them
> > > all at once. From a 10,000 ft. view, that's a very suspect design.
> > >
> > > Best,
> > > Erick
> > >
> > > On Thu, Nov 19, 2015 at 4:06 PM, Walter Underwood <
> wun...@wunderwood.org
> > >
> > > wrote:
> > > > The implementation for fq has changed from 4.x to 5.x, so I’ll let
> > > someone else answer that in detail.
> > > >
> > > > In 4.x, the result of each filter query can be cached. After that,
> they
> > > are quite fast.
> > > >
> > > > wunder
> > > > Walter Underwood
> > > > wun...@wunderwood.org
> > > > http://observer.wunderwood.org/  (my blog)
> > > >
> > > >
> > > >> On Nov 19, 2015, at 3:59 PM, Steven White 
> > wrote:
> > > >>
> > > >> Thanks Walter.  I see your point.  Does this apply to fq as will?
> > > >>
> > > >> Also, how does one go about debugging performance issues in Solr to
> > find
> > > >> out where time is mostly spent?
> > > >>
> > > >> Steve
> > > >>
> > > >> On Thu, Nov 19, 2015 at 6:54 PM, Walter Underwood <
> > > wun...@wunderwood.org>
> > > >> wrote:
> > > >>
> > > >>> With one field in qf for a single-term query, Solr is fetching one
> > > posting
> > > >>> list. With 1500 fields, it is fetching 1500 posting lists. It could
> > > easily
> > > >>> be 1500 times slower.
> > > >>>
> > > >>> It might be even slower than that, because we can’t guarantee that:
> > a)
> > > >>> every algorithm in Solr is linear, b) that all those lists will fit
> > in
> > > >>> memory.
> > > >>>
> > > >>> wunder
> > > >>> Walter Underwood
> > > >>> wun...@wunderwood.org
> > > >>> http://observer.wunderwood.org/  (my blog)
> > > >>>
> > > >>>
> > >  On Nov 19, 2015, at 3:46 PM, Steven White 
> > > wrote:
> > > 
> > >  Hi everyone
> > > 
> > >  What is considered too many fields for qf and fq?  On average I
> will
> > > have
> > >  1500 fields in qf and 100 in fq (all of which are OR'ed).
> Assuming
> > I
> > > can
> > >  (I have to check with the design) for qf, if I cut it down to 1
> > field,
> > > >>> will
> > >  I see noticeable performance improvement?  It will take a lot of
> > > effort
> > > >>> to
> > >  test this which is why I'm asking first.
> > > 
> > >  As is, I'm seeing 2-5 sec response time for searches on an index
> of
> > 1
> > >  million records with total index size (on disk) of 4 GB.  I gave
> > Solr
> > > 2
> > > >>> GB
> > >  of RAM (also tested at 4 GB) in both cases Solr didn't use more
> > then 1
> > > >>> GB.
> > > 
> > >  Thanks in advanced
> > > 
> > >  Steve
> > > >>>
> > > >>>
> > > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


RE: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-20 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Thank you again for the reply.

Below is the Email I was about to send prior to your reply a moment ago: shall 
I try again without "read" in the security.json?



The Collections API method was not discussed in the "Unleashed" class at the 
conference in DC in 2014 (probably because it was not yet available), so I was 
using the method I knew.

I have now tried again using admin/collections?action=CREATE (using different 
port numbers to avoid confusion from the failed previous attempts: the 
previously created nodes had been shutdown and their core.properties files 
renamed so as not to be discovered), but the results are the same:
INFO  - 2015-11-20 16:56:25.283; [c:xmpl3 s:shard1 r:core_node2 
x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Starting 
Replication Recovery.
INFO  - 2015-11-20 16:56:25.284; [c:xmpl3 s:shard1 r:core_node2 
x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Begin 
buffering updates.
INFO  - 2015-11-20 16:56:25.284; [c:xmpl3 s:shard1 r:core_node2 
x:xmpl3_shard1_replica2] org.apache.solr.update.UpdateLog; Starting to buffer 
updates. FSUpdateLog{state=ACTIVE, tlog=null}
INFO  - 2015-11-20 16:56:25.284; [c:xmpl3 s:shard1 r:core_node2 
x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Attempting to 
replicate from http://{IP-address-redacted}:4685/solr/xmpl3_shard1_replica1/.
ERROR - 2015-11-20 16:56:25.292; [c:xmpl3 s:shard1 r:core_node2 
x:xmpl3_shard1_replica2] org.apache.solr.common.SolrException; Error while 
trying to 
recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at 
http://{IP-address-redacted}:4685/solr/xmpl3_shard1_replica1: Expected mime 
type application/octet-stream but got text/html. 


Error 401 Unauthorized request, Response code: 401

HTTP ERROR 401
Problem accessing /solr/xmpl3_shard1_replica1/update. Reason:
Unauthorized request, Response code: 
401Powered by Jetty://




at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:528)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
at 
org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:207)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:147)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)

INFO  - 2015-11-20 16:56:25.292; [c:xmpl3 s:shard1 r:core_node2 
x:xmpl3_shard1_replica2] org.apache.solr.update.UpdateLog; Dropping buffered 
updates FSUpdateLog{state=BUFFERING, tlog=null}
ERROR - 2015-11-20 16:56:25.293; [c:xmpl3 s:shard1 r:core_node2 
x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Recovery 
failed - trying again... (2)
INFO  - 2015-11-20 16:56:25.293; [c:xmpl3 s:shard1 r:core_node2 
x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Wait 8.0 
seconds before trying to recover again (3)


Below is a list of the steps I took.

./zkcli.sh --zkhost localhost:4545 -cmd makepath /solr/xmpl3
./zkcli.sh --zkhost localhost:4545/solr/xmpl3 -cmd putfile /security.json 
~/solr/security151119a.json
./zkcli.sh --zkhost localhost:4545/solr/xmpl3 -cmd upconfig -confdir 
../../solr/configsets/basic_configs/conf -confname xmpl3
cd ../../../bin/
./solr -c -p 4695 -d ~dbman/solr/straight531outofbox/solr-5.3.1/server/ -z 
localhost:4545/solr/xmpl3 -s 
~dbman/solr/straight531outofbox/solr-5.3.1/example/solr
./solr -c -p 4685 -d ~dbman/solr/straight531outofbox/solr-5.3.1/server/ -z 
localhost:4545/solr/xmpl3 -s 
~dbman/solr/straight531outofbox/solr-5.3.1/server/solr
curl -u solr:SolrRocks 
'http://nosqltest11:4685/solr/admin/collections?action=CREATE=xmpl3=1=1={IP-address-redacted}:4685_solr'
curl -u solr:SolrRocks 
'http://nosqltest11:4685/solr/admin/collections?action=ADDREPLICA=xmpl3=shard1={IP-address-redacted}:4695_solr=json=true'




Can you provide a list of steps to take in an out-of-the-box directory tree 
whereby ADDREPLICA _will_ work with security.json already in place?




-Original Message-
From: Anshum Gupta [mailto:ans...@anshumgupta.net] 
Sent: Thursday, November 19, 2015 3:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Re:Re: Implementing security.json is breaking ADDREPLICA

I'll try out what you did later in the day, as soon as I get time but why
exactly are you creating cores manually? Seems like you manually create a
core and the try to add a replica. Can you try using the Collections API to
create a collection?

Starting Solr 5.0, the only supported way to create a new collection is via
the Collections API. Creating a core would lead to 

Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-20 Thread Anshum Gupta
>From my tests, it seems like the 'read' permission interferes with the
Replication and so the ADDREPLICA also fails. You're also bound to run into
issues if you have 'read' permission setup and restart your cluster,
provided you have a collection that has a replication factor > 1 for at
least one shard.

I'll create a JIRA for this and mark it to be a blocker for 5.4. Thanks for
bringing this up.


On Thu, Nov 19, 2015 at 12:43 PM, Anshum Gupta 
wrote:

> I'll try out what you did later in the day, as soon as I get time but why
> exactly are you creating cores manually? Seems like you manually create a
> core and the try to add a replica. Can you try using the Collections API to
> create a collection?
>
> Starting Solr 5.0, the only supported way to create a new collection is
> via the Collections API. Creating a core would lead to a collection
> creation but that's not really supported. It was just something that was
> done when there were no Collections API.
>
>
> On Thu, Nov 19, 2015 at 12:36 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
> craig.oak...@nih.gov> wrote:
>
>> I tried again with the following security.json, but the results were the
>> same:
>>
>> {
>>   "authentication":{
>> "class":"solr.BasicAuthPlugin",
>> "credentials":{
>>   "solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
>> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=",
>>   "solruser":"VgZX1TAMNHT2IJikoGdKtxQdXc+MbNwfqzf89YqcLEE=
>> 37pPWQ9v4gciIKHuTmFmN0Rv66rnlMOFEWfEy9qjJfY="},
>> "":{"v":9}},
>>   "authorization":{
>> "class":"solr.RuleBasedAuthorizationPlugin",
>> "user-role":{
>>   "solr":[
>> "admin",
>> "read",
>> "xmpladmin",
>> "xmplgen",
>> "xmplsel"],
>>   "solruser":[
>> "read",
>> "xmplgen",
>> "xmplsel"]},
>> "permissions":[
>>   {
>> "name":"security-edit",
>> "role":"admin"},
>>   {
>> "name":"xmpl_admin",
>> "collection":"xmpl",
>> "path":"/admin/*",
>> "role":"xmpladmin"},
>>   {
>> "name":"xmpl_sel",
>> "collection":"xmpl",
>> "path":"/select/*",
>> "role":null},
>>   {
>>  "name":"all-admin",
>>  "collection":null,
>>  "path":"/*",
>>  "role":"xmplgen"},
>>   {
>>  "name":"all-core-handlers",
>>  "path":"/*",
>>  "role":"xmplgen"}],
>> "":{"v":42}}}
>>
>> -Original Message-
>> From: Oakley, Craig (NIH/NLM/NCBI) [C]
>> Sent: Thursday, November 19, 2015 1:46 PM
>> To: 'solr-user@lucene.apache.org' 
>> Subject: RE: Re:Re: Implementing security.json is breaking ADDREPLICA
>>
>> I note that the thread called "Security Problems" (most recent post by
>> Nobel Paul) seems like it may help with much of what I'm trying to do. I
>> will see to what extent that may help.
>>
>
>
>
> --
> Anshum Gupta
>



-- 
Anshum Gupta


Re: shard range is empty...

2015-11-20 Thread Anshum Gupta
This uses the Collections API and shouldn't have led to that state. Have
you had similar issues before?

I'm also wondering if you already had something from previous runs/installs
on the fs/zk.

On Fri, Nov 20, 2015 at 10:26 AM, Don Bosco Durai  wrote:

> Anshum,
>
> Thanks for the workaround. It resolved my issue.
>
> Here is the command I used. It is pretty standard and has worked for me
> almost all the time (so far)...
> bin/solr create -c my_collection -d
> /tmp/solr_configsets/my_collection/conf -s 3 -rf 1
>
>
> Thanks
>
> Bosco
>
>
>
>
>
> On 11/20/15, 9:56 AM, "Anshum Gupta"  wrote:
>
> >You can manually update the cluster state so that the range for shard1
> says
> >8000-d554. Also remove the "parent" tag from there.
> >
> >Can you tell me how did you create this collection ? This shouldn't really
> >happen unless you didn't use the Collections API to create the collection.
> >
> >
> >
> >
> >
> >On Fri, Nov 20, 2015 at 9:39 AM, Don Bosco Durai 
> wrote:
> >
> >> I created a 3 shard cluster, but seems for one of the shard, the range
> is
> >> empty. Anyway to fix it without deleting and recreating the collection?
> >>
> >> 2015-11-20 08:59:50,901 [solr,writer=0] ERROR
> >> apache.solr.client.solrj.impl.CloudSolrClient
> (CloudSolrClient.java:902) -
> >> Request to collection my_collection failed due to (400)
> >> org.apache.solr.common.SolrException: No active slice servicing hash
> code
> >> b637e7f1 in DocCollection(my_collection)={
> >>   "replicationFactor":"1",
> >>   "shards":{
> >> "shard2":{
> >>   "range":"d555-2aa9",
> >>   "state":"active",
> >>   "replicas":{"core_node2":{
> >>   "core":"my_collection_shard2_replica1",
> >>   "base_url":"http://172.22.64.65:8886/solr;,
> >>   "node_name":"172.22.64.65:8886_solr",
> >>   "state":"active",
> >>   "leader":"true"}}},
> >> "shard3":{
> >>   "range":"2aaa-7fff",
> >>   "state":"active",
> >>   "replicas":{"core_node3":{
> >>   "core":"my_collection_shard3_replica1",
> >>   "base_url":"http://172.22.64.64:8886/solr;,
> >>   "node_name":"172.22.64.64:8886_solr",
> >>   "state":"active",
> >>   "leader":"true"}}},
> >> "shard1":{
> >>   "parent":null,
> >>   "range":null,
> >>   "state":"active",
> >>   "replicas":{"core_node4":{
> >>   "core":"my_collection_shard1_replica1",
> >>   "base_url":"http://172.22.64.63:8886/solr;,
> >>   "node_name":"172.22.64.63:8886_solr",
> >>   "state":"active",
> >>   "leader":"true",
> >>   "router":{"name":"compositeId"},
> >>   "maxShardsPerNode":"1",
> >>   "autoAddReplicas":"false"}, retry? 0
> >>
> >> Thanks
> >>
> >> Bosco
> >>
> >>
> >>
> >
> >
> >--
> >Anshum Gupta
>
>


-- 
Anshum Gupta


Re: shard range is empty...

2015-11-20 Thread Don Bosco Durai
That could be. I had a flaky network and I had to rerun my script twice. Maybe 
it got into some inconsistent state. I will keep an eye on this. If I am able 
to reproduce, then I will create a JIRA.

Bosco




On 11/20/15, 10:34 AM, "Anshum Gupta"  wrote:

>This uses the Collections API and shouldn't have led to that state. Have
>you had similar issues before?
>
>I'm also wondering if you already had something from previous runs/installs
>on the fs/zk.
>
>On Fri, Nov 20, 2015 at 10:26 AM, Don Bosco Durai  wrote:
>
>> Anshum,
>>
>> Thanks for the workaround. It resolved my issue.
>>
>> Here is the command I used. It is pretty standard and has worked for me
>> almost all the time (so far)...
>> bin/solr create -c my_collection -d
>> /tmp/solr_configsets/my_collection/conf -s 3 -rf 1
>>
>>
>> Thanks
>>
>> Bosco
>>
>>
>>
>>
>>
>> On 11/20/15, 9:56 AM, "Anshum Gupta"  wrote:
>>
>> >You can manually update the cluster state so that the range for shard1
>> says
>> >8000-d554. Also remove the "parent" tag from there.
>> >
>> >Can you tell me how did you create this collection ? This shouldn't really
>> >happen unless you didn't use the Collections API to create the collection.
>> >
>> >
>> >
>> >
>> >
>> >On Fri, Nov 20, 2015 at 9:39 AM, Don Bosco Durai 
>> wrote:
>> >
>> >> I created a 3 shard cluster, but seems for one of the shard, the range
>> is
>> >> empty. Anyway to fix it without deleting and recreating the collection?
>> >>
>> >> 2015-11-20 08:59:50,901 [solr,writer=0] ERROR
>> >> apache.solr.client.solrj.impl.CloudSolrClient
>> (CloudSolrClient.java:902) -
>> >> Request to collection my_collection failed due to (400)
>> >> org.apache.solr.common.SolrException: No active slice servicing hash
>> code
>> >> b637e7f1 in DocCollection(my_collection)={
>> >>   "replicationFactor":"1",
>> >>   "shards":{
>> >> "shard2":{
>> >>   "range":"d555-2aa9",
>> >>   "state":"active",
>> >>   "replicas":{"core_node2":{
>> >>   "core":"my_collection_shard2_replica1",
>> >>   "base_url":"http://172.22.64.65:8886/solr;,
>> >>   "node_name":"172.22.64.65:8886_solr",
>> >>   "state":"active",
>> >>   "leader":"true"}}},
>> >> "shard3":{
>> >>   "range":"2aaa-7fff",
>> >>   "state":"active",
>> >>   "replicas":{"core_node3":{
>> >>   "core":"my_collection_shard3_replica1",
>> >>   "base_url":"http://172.22.64.64:8886/solr;,
>> >>   "node_name":"172.22.64.64:8886_solr",
>> >>   "state":"active",
>> >>   "leader":"true"}}},
>> >> "shard1":{
>> >>   "parent":null,
>> >>   "range":null,
>> >>   "state":"active",
>> >>   "replicas":{"core_node4":{
>> >>   "core":"my_collection_shard1_replica1",
>> >>   "base_url":"http://172.22.64.63:8886/solr;,
>> >>   "node_name":"172.22.64.63:8886_solr",
>> >>   "state":"active",
>> >>   "leader":"true",
>> >>   "router":{"name":"compositeId"},
>> >>   "maxShardsPerNode":"1",
>> >>   "autoAddReplicas":"false"}, retry? 0
>> >>
>> >> Thanks
>> >>
>> >> Bosco
>> >>
>> >>
>> >>
>> >
>> >
>> >--
>> >Anshum Gupta
>>
>>
>
>
>-- 
>Anshum Gupta



Re: shard range is empty...

2015-11-20 Thread Don Bosco Durai
Anshum,

Thanks for the workaround. It resolved my issue.

Here is the command I used. It is pretty standard and has worked for me almost 
all the time (so far)...
bin/solr create -c my_collection -d /tmp/solr_configsets/my_collection/conf -s 
3 -rf 1


Thanks

Bosco





On 11/20/15, 9:56 AM, "Anshum Gupta"  wrote:

>You can manually update the cluster state so that the range for shard1 says
>8000-d554. Also remove the "parent" tag from there.
>
>Can you tell me how did you create this collection ? This shouldn't really
>happen unless you didn't use the Collections API to create the collection.
>
>
>
>
>
>On Fri, Nov 20, 2015 at 9:39 AM, Don Bosco Durai  wrote:
>
>> I created a 3 shard cluster, but seems for one of the shard, the range is
>> empty. Anyway to fix it without deleting and recreating the collection?
>>
>> 2015-11-20 08:59:50,901 [solr,writer=0] ERROR
>> apache.solr.client.solrj.impl.CloudSolrClient (CloudSolrClient.java:902) -
>> Request to collection my_collection failed due to (400)
>> org.apache.solr.common.SolrException: No active slice servicing hash code
>> b637e7f1 in DocCollection(my_collection)={
>>   "replicationFactor":"1",
>>   "shards":{
>> "shard2":{
>>   "range":"d555-2aa9",
>>   "state":"active",
>>   "replicas":{"core_node2":{
>>   "core":"my_collection_shard2_replica1",
>>   "base_url":"http://172.22.64.65:8886/solr;,
>>   "node_name":"172.22.64.65:8886_solr",
>>   "state":"active",
>>   "leader":"true"}}},
>> "shard3":{
>>   "range":"2aaa-7fff",
>>   "state":"active",
>>   "replicas":{"core_node3":{
>>   "core":"my_collection_shard3_replica1",
>>   "base_url":"http://172.22.64.64:8886/solr;,
>>   "node_name":"172.22.64.64:8886_solr",
>>   "state":"active",
>>   "leader":"true"}}},
>> "shard1":{
>>   "parent":null,
>>   "range":null,
>>   "state":"active",
>>   "replicas":{"core_node4":{
>>   "core":"my_collection_shard1_replica1",
>>   "base_url":"http://172.22.64.63:8886/solr;,
>>   "node_name":"172.22.64.63:8886_solr",
>>   "state":"active",
>>   "leader":"true",
>>   "router":{"name":"compositeId"},
>>   "maxShardsPerNode":"1",
>>   "autoAddReplicas":"false"}, retry? 0
>>
>> Thanks
>>
>> Bosco
>>
>>
>>
>
>
>-- 
>Anshum Gupta



shard range is empty...

2015-11-20 Thread Don Bosco Durai
I created a 3 shard cluster, but seems for one of the shard, the range is 
empty. Anyway to fix it without deleting and recreating the collection?

2015-11-20 08:59:50,901 [solr,writer=0] ERROR 
apache.solr.client.solrj.impl.CloudSolrClient (CloudSolrClient.java:902) - 
Request to collection my_collection failed due to (400) 
org.apache.solr.common.SolrException: No active slice servicing hash code 
b637e7f1 in DocCollection(my_collection)={
  "replicationFactor":"1",
  "shards":{
"shard2":{
  "range":"d555-2aa9",
  "state":"active",
  "replicas":{"core_node2":{
  "core":"my_collection_shard2_replica1",
  "base_url":"http://172.22.64.65:8886/solr;,
  "node_name":"172.22.64.65:8886_solr",
  "state":"active",
  "leader":"true"}}},
"shard3":{
  "range":"2aaa-7fff",
  "state":"active",
  "replicas":{"core_node3":{
  "core":"my_collection_shard3_replica1",
  "base_url":"http://172.22.64.64:8886/solr;,
  "node_name":"172.22.64.64:8886_solr",
  "state":"active",
  "leader":"true"}}},
"shard1":{
  "parent":null,
  "range":null,
  "state":"active",
  "replicas":{"core_node4":{
  "core":"my_collection_shard1_replica1",
  "base_url":"http://172.22.64.63:8886/solr;,
  "node_name":"172.22.64.63:8886_solr",
  "state":"active",
  "leader":"true",
  "router":{"name":"compositeId"},
  "maxShardsPerNode":"1",
  "autoAddReplicas":"false"}, retry? 0

Thanks

Bosco




Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-20 Thread Anshum Gupta
Collections API were available before November of 2014, if that is when you
took the class. However, it was only with Solr 5.0 (released in Feb 2015)
that the only supported mechanism to create a collection was restricted to
Collections API.

Here are the list of steps that you'd need to run to see that things are
fine for you without the read permission:
* Untar and setup Solr, don't start it yet
* Start clean zookeeper
* Put the security.json in zk, without anything other than a security-edit
permission. Find the content of the file below. Upload it using your own zk
client or through the solr script:
> solr-5.3.1/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181
-cmd putfile /security.json ~/security.json

security.json:
{"authentication":{"class":"solr.BasicAuthPlugin","credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}},"authorization":{"class":"solr.RuleBasedAuthorizationPlugin","user-role":{"solr":["admin"]},"permissions":[{"name":"security-edit","role":"admin"}]}}

* Start solr:
> solr-5.3.1/bin/solr start -e cloud -z localhost:2181

You would need to key in a few things e.g. #nodes and ports, leave them at
the default values of 2 nodes and 8983/7574, unless you want to run Solr on
a different port. Then let it create a default collection to just make sure
that everything works fine.

* Add the collection-admin-edit command:
> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authorization
-H 'Content-type:application/json' -d '{"set-permission" :
{"name":"collection-admin-edit", "role":"admin"}}'

At this point, everything should be working fine. Restarting the nodes
 should also work fine. You can try 2 things at this point:
1. Create a new collection with 1 shard and 1 replica and then try adding a
replica, here's how:
> curl --user solr:SolrRocks
http://localhost:8983/solr/admin/collections?action=CREATE=testcollection=gettingstarted=1

> curl --user solr:SolrRocks
http://localhost:8983/solr/admin/collections?action=ADDREPLICA=testcollection=shard1

This should work fine.

2. After this, try restarting the solr cluster. Here's how you can do so,
assuming you didn't change any of the defaults and you are running zk on
localhost:2181. If not, just change those values below:
> bin/solr stop -all

After this, check that Solr was actually stopped. I'd also suggest you tail
the logs on both nodes when they are coming up to see any errors, if any.
The logs would be here: example/cloud/node1/logs/solr.log
and example/cloud/node2/logs/solr.log

> bin/solr start -c -p 8983 -s "example/cloud/node1/solr" -z localhost:2181
> bin/solr start -c -p 7574 -s "example/cloud/node2/solr" -z localhost:2181

If you get to this checkpoint fine, try adding a read permission.
Add a permission:
> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authorization
-H 'Content-type:application/json' -d '{"set-permission" : {"name":"read",
"role":"read"}}'

Add a user:
> curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication
-H 'Content-type:application/json' -d '{"set-user" :
{"solrread":"solrRocks"}}'

Assign a role to the user:
>curl --user solr:SolrRocks http://localhost:8983/solr/admin/authorization
-H 'Content-type:application/json' -d '{"set-user-role" :
{"solrread":["read"]}}'

After this, you should start having issues with ADDREPLICA.
Also, as you would at this point have a collection with a shard that has a
replication factor > 1 (remember the ADDREPLICA we did earlier), you would
have issues when you restart the cluster again using the steps I mentioned
above.


Can you confirm this? I guess I'll just use this text to create a new JIRA
now.


On Fri, Nov 20, 2015 at 10:04 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> Thank you again for the reply.
>
> Below is the Email I was about to send prior to your reply a moment ago:
> shall I try again without "read" in the security.json?
>
>
>
> The Collections API method was not discussed in the "Unleashed" class at
> the conference in DC in 2014 (probably because it was not yet available),
> so I was using the method I knew.
>
> I have now tried again using admin/collections?action=CREATE (using
> different port numbers to avoid confusion from the failed previous
> attempts: the previously created nodes had been shutdown and their
> core.properties files renamed so as not to be discovered), but the results
> are the same:
> INFO  - 2015-11-20 16:56:25.283; [c:xmpl3 s:shard1 r:core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Starting
> Replication Recovery.
> INFO  - 2015-11-20 16:56:25.284; [c:xmpl3 s:shard1 r:core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.cloud.RecoveryStrategy; Begin
> buffering updates.
> INFO  - 2015-11-20 16:56:25.284; [c:xmpl3 s:shard1 r:core_node2
> x:xmpl3_shard1_replica2] org.apache.solr.update.UpdateLog; Starting to
> buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
> 

Re: Parallel SQL / calcite adapter

2015-11-20 Thread William Bell
How is performance on Calcite?

On Fri, Nov 20, 2015 at 5:12 AM, Joel Bernstein  wrote:

> After reading https://calcite.apache.org/docs/tutorial.html, I think it
> should be possible to use the Solr's JDBC Driver with Calcites JDBC
> adapter.
>
> If you give it a try and run into any problems, please create a jira.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Nov 19, 2015 at 7:58 PM, Joel Bernstein 
> wrote:
>
> > It's an interesting question. The JDBC driver is still very basic. It
> > would depend on how much of the JDBC spec needs to be implemented to
> > connect to Calcite/Drill.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Nov 19, 2015 at 3:28 AM, Kai Gülzau 
> wrote:
> >
> >>
> >> We are currently evaluating calcite as a SQL facade for different Data
> >> Sources
> >>
> >> -  JDBC
> >>
> >> -  REST
> >>
> >> >SOLR
> >>
> >> -  ...
> >>
> >> I didn't found a "native" calcite adapter for solr (
> >> http://calcite.apache.org/docs/adapter.html).
> >>
> >> Is it a good idea to use the parallel sql feature (over jdbc) to connect
> >> calcite (or apache drill) to solr?
> >> Any suggestions?
> >>
> >>
> >> Thanks,
> >>
> >> Kai Gülzau
> >>
> >
> >
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: shard range is empty...

2015-11-20 Thread Don Bosco Durai
I am using Solr 5.2 version.

Thanks

Bosco





On 11/20/15, 9:39 AM, "Don Bosco Durai"  wrote:

>I created a 3 shard cluster, but seems for one of the shard, the range is 
>empty. Anyway to fix it without deleting and recreating the collection?
>
>2015-11-20 08:59:50,901 [solr,writer=0] ERROR 
>apache.solr.client.solrj.impl.CloudSolrClient (CloudSolrClient.java:902) - 
>Request to collection my_collection failed due to (400) 
>org.apache.solr.common.SolrException: No active slice servicing hash code 
>b637e7f1 in DocCollection(my_collection)={
>  "replicationFactor":"1",
>  "shards":{
>"shard2":{
>  "range":"d555-2aa9",
>  "state":"active",
>  "replicas":{"core_node2":{
>  "core":"my_collection_shard2_replica1",
>  "base_url":"http://172.22.64.65:8886/solr;,
>  "node_name":"172.22.64.65:8886_solr",
>  "state":"active",
>  "leader":"true"}}},
>"shard3":{
>  "range":"2aaa-7fff",
>  "state":"active",
>  "replicas":{"core_node3":{
>  "core":"my_collection_shard3_replica1",
>  "base_url":"http://172.22.64.64:8886/solr;,
>  "node_name":"172.22.64.64:8886_solr",
>  "state":"active",
>  "leader":"true"}}},
>"shard1":{
>  "parent":null,
>  "range":null,
>  "state":"active",
>  "replicas":{"core_node4":{
>  "core":"my_collection_shard1_replica1",
>  "base_url":"http://172.22.64.63:8886/solr;,
>  "node_name":"172.22.64.63:8886_solr",
>  "state":"active",
>  "leader":"true",
>  "router":{"name":"compositeId"},
>  "maxShardsPerNode":"1",
>  "autoAddReplicas":"false"}, retry? 0
>
>Thanks
>
>Bosco
>
>



Re: shard range is empty...

2015-11-20 Thread Anshum Gupta
You can manually update the cluster state so that the range for shard1 says
8000-d554. Also remove the "parent" tag from there.

Can you tell me how did you create this collection ? This shouldn't really
happen unless you didn't use the Collections API to create the collection.





On Fri, Nov 20, 2015 at 9:39 AM, Don Bosco Durai  wrote:

> I created a 3 shard cluster, but seems for one of the shard, the range is
> empty. Anyway to fix it without deleting and recreating the collection?
>
> 2015-11-20 08:59:50,901 [solr,writer=0] ERROR
> apache.solr.client.solrj.impl.CloudSolrClient (CloudSolrClient.java:902) -
> Request to collection my_collection failed due to (400)
> org.apache.solr.common.SolrException: No active slice servicing hash code
> b637e7f1 in DocCollection(my_collection)={
>   "replicationFactor":"1",
>   "shards":{
> "shard2":{
>   "range":"d555-2aa9",
>   "state":"active",
>   "replicas":{"core_node2":{
>   "core":"my_collection_shard2_replica1",
>   "base_url":"http://172.22.64.65:8886/solr;,
>   "node_name":"172.22.64.65:8886_solr",
>   "state":"active",
>   "leader":"true"}}},
> "shard3":{
>   "range":"2aaa-7fff",
>   "state":"active",
>   "replicas":{"core_node3":{
>   "core":"my_collection_shard3_replica1",
>   "base_url":"http://172.22.64.64:8886/solr;,
>   "node_name":"172.22.64.64:8886_solr",
>   "state":"active",
>   "leader":"true"}}},
> "shard1":{
>   "parent":null,
>   "range":null,
>   "state":"active",
>   "replicas":{"core_node4":{
>   "core":"my_collection_shard1_replica1",
>   "base_url":"http://172.22.64.63:8886/solr;,
>   "node_name":"172.22.64.63:8886_solr",
>   "state":"active",
>   "leader":"true",
>   "router":{"name":"compositeId"},
>   "maxShardsPerNode":"1",
>   "autoAddReplicas":"false"}, retry? 0
>
> Thanks
>
> Bosco
>
>
>


-- 
Anshum Gupta


Search with very large boolean filter

2015-11-20 Thread jichi
Hi,

I am using Solr 4.7.0 to search text with an id filter, like this:

  id:(100 OR 2 OR 5 OR 81 OR 10 ...)

The number of IDs in the boolean filter are usually less than 100, but
could sometimes be very large (around 30k IDs).

We currently set maxBooleanClauses to 1024, partitioned the IDs by every
1000, and batched the solr queries, which worked but became slow when the
total number of IDs is larger than 10k.

I am wondering what would be the best strategy to handle this kind of
problem?
Can we increase the maxBooleanClauses to reduce the number of batches?
And if possible, we prefer not to create additionally large indexes.

Thanks!


Re: Search with very large boolean filter

2015-11-20 Thread Alexandre Rafalovitch
I don't know what to do about 30K ids, but you definitely can improve
on the ORing the ids with
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 20 November 2015 at 16:50, jichi  wrote:
> Hi,
>
> I am using Solr 4.7.0 to search text with an id filter, like this:
>
>   id:(100 OR 2 OR 5 OR 81 OR 10 ...)
>
> The number of IDs in the boolean filter are usually less than 100, but
> could sometimes be very large (around 30k IDs).
>
> We currently set maxBooleanClauses to 1024, partitioned the IDs by every
> 1000, and batched the solr queries, which worked but became slow when the
> total number of IDs is larger than 10k.
>
> I am wondering what would be the best strategy to handle this kind of
> problem?
> Can we increase the maxBooleanClauses to reduce the number of batches?
> And if possible, we prefer not to create additionally large indexes.
>
> Thanks!


Re: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-20 Thread Anshum Gupta
This seems unrelated and more like a user error somewhere. Can you just
follow the steps, without any security settings i.e. not even uploading
security.json and see if you still see this? Sorry, but I don't have access
to the code right now, I try and look at this later tonight.

On Fri, Nov 20, 2015 at 3:07 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> Thank you for opening SOLR-8326
>
> As a side note, in the procedure you listed, even before adding the
> collection-admin-edit authorization, I'm already hitting trouble: stopping
> and restarting a node results in the following
>
> INFO  - 2015-11-20 22:48:41.275; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy;
> Publishing state of core solr8326_shard2_replica1 as recovering, leader is
> http://{IP-address-redacted}:8983/solr/solr8326_shard2_replica2/ and I am
> http://{IP-address-redacted}:7574/solr/solr8326_shard2_replica1/
> INFO  - 2015-11-20 22:48:41.275; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.cloud.ZkController; publishing
> state=recovering
> INFO  - 2015-11-20 22:48:41.278; [c:solr8326 s:shard1 r:core_node3
> x:solr8326_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy;
> Publishing state of core solr8326_shard1_replica1 as recovering, leader is
> http://{IP-address-redacted}:8983/solr/solr8326_shard1_replica2/ and I am
> http://{IP-address-redacted}:7574/solr/solr8326_shard1_replica1/
> INFO  - 2015-11-20 22:48:41.280; [c:solr8326 s:shard1 r:core_node3
> x:solr8326_shard1_replica1] org.apache.solr.cloud.ZkController; publishing
> state=recovering
> INFO  - 2015-11-20 22:48:41.282; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy; Sending
> prep recovery command to http://{IP-address-redacted}:8983/solr;
> WaitForState:
> action=PREPRECOVERY=solr8326_shard2_replica2={IP-address-redacted}%3A7574_solr=core_node4=recovering=true=true=true
> INFO  - 2015-11-20 22:48:41.289; [   ]
> org.apache.solr.common.cloud.ZkStateReader$8; A cluster state change:
> WatchedEvent state:SyncConnected type:NodeDataChanged
> path:/collections/solr8326/state.json for collection solr8326 has occurred
> - updating... (live nodes size: 2)
> INFO  - 2015-11-20 22:48:41.290; [c:solr8326 s:shard1 r:core_node3
> x:solr8326_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Sending
> prep recovery command to http://{IP-address-redacted}:8983/solr;
> WaitForState:
> action=PREPRECOVERY=solr8326_shard1_replica2={IP-address-redacted}%3A7574_solr=core_node3=recovering=true=true=true
> INFO  - 2015-11-20 22:48:41.291; [   ]
> org.apache.solr.common.cloud.ZkStateReader; Updating data for solr8326 to
> ver 25
> ERROR - 2015-11-20 22:48:41.298; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.common.SolrException; Error
> while trying to recover.:java.util.concurrent.ExecutionException:
> org.apache.http.ParseException: Invalid content type:
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:361)
> at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> Caused by: org.apache.http.ParseException: Invalid content type:
> at org.apache.http.entity.ContentType.parse(ContentType.java:273)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:512)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:270)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:266)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> INFO  - 2015-11-20 22:48:41.298; [   ]
> org.apache.solr.common.cloud.ZkStateReader$8; A cluster state change:
> WatchedEvent state:SyncConnected type:NodeDataChanged
> path:/collections/solr8326/state.json for collection solr8326 has occurred
> - updating... (live nodes size: 2)
> ERROR - 2015-11-20 22:48:41.298; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy;
> Recovery failed - trying again... (4)
> INFO  - 2015-11-20 22:48:41.300; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy; Wait
> 32.0 seconds before trying to recover again (5)
> ERROR - 2015-11-20 

Re: Search with very large boolean filter

2015-11-20 Thread Jack Krupansky
1. Are you trying to retrieve a large number of documents, or simply
perform queries against a subset of the index?

2. How many unique queries are you expecting to perform against each
specific filter set of IDs?

3. How often does the set of IDs change?

4. Is there more than one filter set of IDs in use during a particular
interval of time?



-- Jack Krupansky

On Fri, Nov 20, 2015 at 4:50 PM, jichi  wrote:

> Hi,
>
> I am using Solr 4.7.0 to search text with an id filter, like this:
>
>   id:(100 OR 2 OR 5 OR 81 OR 10 ...)
>
> The number of IDs in the boolean filter are usually less than 100, but
> could sometimes be very large (around 30k IDs).
>
> We currently set maxBooleanClauses to 1024, partitioned the IDs by every
> 1000, and batched the solr queries, which worked but became slow when the
> total number of IDs is larger than 10k.
>
> I am wondering what would be the best strategy to handle this kind of
> problem?
> Can we increase the maxBooleanClauses to reduce the number of batches?
> And if possible, we prefer not to create additionally large indexes.
>
> Thanks!
>


Re: Search with very large boolean filter

2015-11-20 Thread jichi
Thanks for the quick replies, Alex and Jack!

> definitely can improve on the ORing the ids with
Going to try that! But I guess it would still hit the maxBooleanClauses=1024
threshold.

> 1. Are you trying to retrieve a large number of documents, or simply
perform queries against a subset of the index?
We would like to perform queries against a subset of the index.

> 2. How many unique queries are you expecting to perform against each
specific filter set of IDs?
There are usually only a couple (around 10) of unique queries for the same
set of IDs for a short period of time (around 1min).

> 3. How often does the set of IDs change?
The IDs are almost different for each query.
btw., the total number would be 99% be less than 1k.
But in 1% rare cases it could be more than 10k.

> 4. Is there more than one filter set of IDs in use during a particular
interval of time?
No. The ID set will be the only filter applied to "id".


Thanks!


2015-11-20 14:26 GMT-08:00 Jack Krupansky :

> 1. Are you trying to retrieve a large number of documents, or simply
> perform queries against a subset of the index?
>
> 2. How many unique queries are you expecting to perform against each
> specific filter set of IDs?
>
> 3. How often does the set of IDs change?
>
> 4. Is there more than one filter set of IDs in use during a particular
> interval of time?
>
>
>
> -- Jack Krupansky
>
> On Fri, Nov 20, 2015 at 4:50 PM, jichi  wrote:
>
>> Hi,
>>
>> I am using Solr 4.7.0 to search text with an id filter, like this:
>>
>>   id:(100 OR 2 OR 5 OR 81 OR 10 ...)
>>
>> The number of IDs in the boolean filter are usually less than 100, but
>> could sometimes be very large (around 30k IDs).
>>
>> We currently set maxBooleanClauses to 1024, partitioned the IDs by every
>> 1000, and batched the solr queries, which worked but became slow when the
>> total number of IDs is larger than 10k.
>>
>> I am wondering what would be the best strategy to handle this kind of
>> problem?
>> Can we increase the maxBooleanClauses to reduce the number of batches?
>> And if possible, we prefer not to create additionally large indexes.
>>
>> Thanks!
>>
>
>


-- 
jichi


Re: Search with very large boolean filter

2015-11-20 Thread jichi
Hi Shawn,

We have already switched the request method to POST.
I am going to try the term query parser soon. I will post the performance
difference against the IN syntax here later.

Thanks!

2015-11-20 15:23 GMT-08:00 Shawn Heisey :

> On 11/20/2015 4:09 PM, jichi wrote:
> > Thanks for the quick replies, Alex and Jack!
> >
> >> definitely can improve on the ORing the ids with
> > Going to try that! But I guess it would still hit the
> maxBooleanClauses=1024
> > threshold.
>
> The terms query parser does not have a limit like boolean queries do.
> This query parser was added in version 4.10, so be aware of that.
> Querying for a large number of terms with the terms query parser will
> scale a lot better than a boolean query -- better performance.
>
> The number of terms you query will affect the size of the query text.
> The query size is constrained by either the max HTTP header size if the
> request is a GET, or the max form size if it's a POST.  The max HTTP
> header size is configurable in the servlet container (jetty, tomcat,
> etc) and I would not recommend going over about 32K with it.  The max
> form size is configurable in solrconfig.xml with the
> formdataUploadLimitInKB attribute on the requestParsers element.  That
> attribute defaults to 2048, which yields a default size of 2MB.
> Switching your queries to POST requests is advisable.
>
> Thanks,
> Shawn
>
>


-- 
jichi


RE: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-20 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Thank you for opening SOLR-8326

As a side note, in the procedure you listed, even before adding the 
collection-admin-edit authorization, I'm already hitting trouble: stopping and 
restarting a node results in the following

INFO  - 2015-11-20 22:48:41.275; [c:solr8326 s:shard2 r:core_node4 
x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy; Publishing 
state of core solr8326_shard2_replica1 as recovering, leader is 
http://{IP-address-redacted}:8983/solr/solr8326_shard2_replica2/ and I am 
http://{IP-address-redacted}:7574/solr/solr8326_shard2_replica1/
INFO  - 2015-11-20 22:48:41.275; [c:solr8326 s:shard2 r:core_node4 
x:solr8326_shard2_replica1] org.apache.solr.cloud.ZkController; publishing 
state=recovering
INFO  - 2015-11-20 22:48:41.278; [c:solr8326 s:shard1 r:core_node3 
x:solr8326_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Publishing 
state of core solr8326_shard1_replica1 as recovering, leader is 
http://{IP-address-redacted}:8983/solr/solr8326_shard1_replica2/ and I am 
http://{IP-address-redacted}:7574/solr/solr8326_shard1_replica1/
INFO  - 2015-11-20 22:48:41.280; [c:solr8326 s:shard1 r:core_node3 
x:solr8326_shard1_replica1] org.apache.solr.cloud.ZkController; publishing 
state=recovering
INFO  - 2015-11-20 22:48:41.282; [c:solr8326 s:shard2 r:core_node4 
x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy; Sending 
prep recovery command to http://{IP-address-redacted}:8983/solr; WaitForState: 
action=PREPRECOVERY=solr8326_shard2_replica2={IP-address-redacted}%3A7574_solr=core_node4=recovering=true=true=true
INFO  - 2015-11-20 22:48:41.289; [   ] 
org.apache.solr.common.cloud.ZkStateReader$8; A cluster state change: 
WatchedEvent state:SyncConnected type:NodeDataChanged 
path:/collections/solr8326/state.json for collection solr8326 has occurred - 
updating... (live nodes size: 2)
INFO  - 2015-11-20 22:48:41.290; [c:solr8326 s:shard1 r:core_node3 
x:solr8326_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Sending 
prep recovery command to http://{IP-address-redacted}:8983/solr; WaitForState: 
action=PREPRECOVERY=solr8326_shard1_replica2={IP-address-redacted}%3A7574_solr=core_node3=recovering=true=true=true
INFO  - 2015-11-20 22:48:41.291; [   ] 
org.apache.solr.common.cloud.ZkStateReader; Updating data for solr8326 to ver 
25 
ERROR - 2015-11-20 22:48:41.298; [c:solr8326 s:shard2 r:core_node4 
x:solr8326_shard2_replica1] org.apache.solr.common.SolrException; Error while 
trying to recover.:java.util.concurrent.ExecutionException: 
org.apache.http.ParseException: Invalid content type: 
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:361)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
Caused by: org.apache.http.ParseException: Invalid content type: 
at org.apache.http.entity.ContentType.parse(ContentType.java:273)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:512)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:270)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:266)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

INFO  - 2015-11-20 22:48:41.298; [   ] 
org.apache.solr.common.cloud.ZkStateReader$8; A cluster state change: 
WatchedEvent state:SyncConnected type:NodeDataChanged 
path:/collections/solr8326/state.json for collection solr8326 has occurred - 
updating... (live nodes size: 2)
ERROR - 2015-11-20 22:48:41.298; [c:solr8326 s:shard2 r:core_node4 
x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy; Recovery 
failed - trying again... (4)
INFO  - 2015-11-20 22:48:41.300; [c:solr8326 s:shard2 r:core_node4 
x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy; Wait 32.0 
seconds before trying to recover again (5)
ERROR - 2015-11-20 22:48:41.300; [c:solr8326 s:shard1 r:core_node3 
x:solr8326_shard1_replica1] org.apache.solr.common.SolrException; Error while 
trying to recover.:java.util.concurrent.ExecutionException: 
org.apache.http.ParseException: Invalid content type: 
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
at 

Re: Search with very large boolean filter

2015-11-20 Thread Shawn Heisey
On 11/20/2015 4:09 PM, jichi wrote:
> Thanks for the quick replies, Alex and Jack!
>
>> definitely can improve on the ORing the ids with
> Going to try that! But I guess it would still hit the maxBooleanClauses=1024
> threshold.

The terms query parser does not have a limit like boolean queries do. 
This query parser was added in version 4.10, so be aware of that. 
Querying for a large number of terms with the terms query parser will
scale a lot better than a boolean query -- better performance.

The number of terms you query will affect the size of the query text. 
The query size is constrained by either the max HTTP header size if the
request is a GET, or the max form size if it's a POST.  The max HTTP
header size is configurable in the servlet container (jetty, tomcat,
etc) and I would not recommend going over about 32K with it.  The max
form size is configurable in solrconfig.xml with the
formdataUploadLimitInKB attribute on the requestParsers element.  That
attribute defaults to 2048, which yields a default size of 2MB. 
Switching your queries to POST requests is advisable.

Thanks,
Shawn



RE: Re:Re: Implementing security.json is breaking ADDREPLICA

2015-11-20 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Thanks

It seems to work when there is no security.json, so perhaps there's some typo 
in the initial version.

I notice that the version you sent is different from the documentation at 
cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
 in that the Wiki version has "permissions" before "user-role": I also notice 
that (at least as of right this moment) the Wiki version has a comma at the end 
of '"user-role":{"solr":"admin"},' even though it is at the end: and I notice 
that the Wiki version seems to lack a comma between the "permissions" section 
and the "user-role" section. I just now also noticed that the version you sent 
has '"user-role":{"solr":["admin"]}' (with square brackets) whereas the Wiki 
does not have square brackets.

The placement of the comma definitely looks wrong in the Wiki at the moment 
(though perhaps someone might correct the Wiki before too long). Other than 
that, I don’t know whether the order and/or the square brackets make a 
difference. I can try with different permutations.

Thanks again

P.S. for the record, the Wiki currently has
{
"authentication":{
   "class":"solr.BasicAuthPlugin",
   "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
},
"authorization":{
   "class":"solr.RuleBasedAuthorizationPlugin",
   "permissions":[{"name":"security-edit",
  "role":"admin"}]
   "user-role":{"solr":"admin"},
}}

-Original Message-
From: Anshum Gupta [mailto:ans...@anshumgupta.net] 
Sent: Friday, November 20, 2015 6:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Re:Re: Implementing security.json is breaking ADDREPLICA

This seems unrelated and more like a user error somewhere. Can you just
follow the steps, without any security settings i.e. not even uploading
security.json and see if you still see this? Sorry, but I don't have access
to the code right now, I try and look at this later tonight.

On Fri, Nov 20, 2015 at 3:07 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> Thank you for opening SOLR-8326
>
> As a side note, in the procedure you listed, even before adding the
> collection-admin-edit authorization, I'm already hitting trouble: stopping
> and restarting a node results in the following
>
> INFO  - 2015-11-20 22:48:41.275; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy;
> Publishing state of core solr8326_shard2_replica1 as recovering, leader is
> http://{IP-address-redacted}:8983/solr/solr8326_shard2_replica2/ and I am
> http://{IP-address-redacted}:7574/solr/solr8326_shard2_replica1/
> INFO  - 2015-11-20 22:48:41.275; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.cloud.ZkController; publishing
> state=recovering
> INFO  - 2015-11-20 22:48:41.278; [c:solr8326 s:shard1 r:core_node3
> x:solr8326_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy;
> Publishing state of core solr8326_shard1_replica1 as recovering, leader is
> http://{IP-address-redacted}:8983/solr/solr8326_shard1_replica2/ and I am
> http://{IP-address-redacted}:7574/solr/solr8326_shard1_replica1/
> INFO  - 2015-11-20 22:48:41.280; [c:solr8326 s:shard1 r:core_node3
> x:solr8326_shard1_replica1] org.apache.solr.cloud.ZkController; publishing
> state=recovering
> INFO  - 2015-11-20 22:48:41.282; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.cloud.RecoveryStrategy; Sending
> prep recovery command to http://{IP-address-redacted}:8983/solr;
> WaitForState:
> action=PREPRECOVERY=solr8326_shard2_replica2={IP-address-redacted}%3A7574_solr=core_node4=recovering=true=true=true
> INFO  - 2015-11-20 22:48:41.289; [   ]
> org.apache.solr.common.cloud.ZkStateReader$8; A cluster state change:
> WatchedEvent state:SyncConnected type:NodeDataChanged
> path:/collections/solr8326/state.json for collection solr8326 has occurred
> - updating... (live nodes size: 2)
> INFO  - 2015-11-20 22:48:41.290; [c:solr8326 s:shard1 r:core_node3
> x:solr8326_shard1_replica1] org.apache.solr.cloud.RecoveryStrategy; Sending
> prep recovery command to http://{IP-address-redacted}:8983/solr;
> WaitForState:
> action=PREPRECOVERY=solr8326_shard1_replica2={IP-address-redacted}%3A7574_solr=core_node3=recovering=true=true=true
> INFO  - 2015-11-20 22:48:41.291; [   ]
> org.apache.solr.common.cloud.ZkStateReader; Updating data for solr8326 to
> ver 25
> ERROR - 2015-11-20 22:48:41.298; [c:solr8326 s:shard2 r:core_node4
> x:solr8326_shard2_replica1] org.apache.solr.common.SolrException; Error
> while trying to recover.:java.util.concurrent.ExecutionException:
> org.apache.http.ParseException: Invalid content type:
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
> at
> 

Re: Parallel SQL / calcite adapter

2015-11-20 Thread Tirthankar Chatterjee
We are implementing a JDBC driver on drill with Solr as a storage plugin
The code had been here and we need help if anyone can contribute doing a code 
review and performance testing
https://github.com/apache/drill/pull/201/files

Thanks,
Tirthankar

On Nov 20, 2015, at 12:25 PM, William Bell 
> wrote:

How is performance on Calcite?

On Fri, Nov 20, 2015 at 5:12 AM, Joel Bernstein 
> wrote:

After reading https://calcite.apache.org/docs/tutorial.html, I think it
should be possible to use the Solr's JDBC Driver with Calcites JDBC
adapter.

If you give it a try and run into any problems, please create a jira.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Nov 19, 2015 at 7:58 PM, Joel Bernstein 
>
wrote:

It's an interesting question. The JDBC driver is still very basic. It
would depend on how much of the JDBC spec needs to be implemented to
connect to Calcite/Drill.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Nov 19, 2015 at 3:28 AM, Kai G?lzau 
>
wrote:


We are currently evaluating calcite as a SQL facade for different Data
Sources

-  JDBC

-  REST

  SOLR

-  ...

I didn't found a "native" calcite adapter for solr (
http://calcite.apache.org/docs/adapter.html).

Is it a good idea to use the parallel sql feature (over jdbc) to connect
calcite (or apache drill) to solr?
Any suggestions?


Thanks,

Kai G?lzau







--
Bill Bell
billnb...@gmail.com
cell 720-256-8076



***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**

Re: RealTimeGetHandler doesn't retrieve documents

2015-11-20 Thread jeremie . monsinjon
Thanks for your answer...

I don't think it's a problem due tp special characters. All my IDs as the same 
format : "/ABCDZ123/123456" with no character with need to be escaped.

And when I use a "normal" query on the key field, it works fine, SolR found the 
document...

Cordialement,
Monsinjon Jeremie

-Original Message-
From: Jack Krupansky 
Date: Thu, 19 Nov 2015 16:50:14 
To: 
Reply-To: solr-user@lucene.apache.org
Subject: Re: RealTimeGetHandler doesn't retrieve documents

Do the failing IDs have any special characters that might need to be
escaped?

Can you find the documents using a normal query on the unique key field?

-- Jack Krupansky

On Thu, Nov 19, 2015 at 10:27 AM, Jérémie MONSINJON <
jeremie.monsin...@gmail.com> wrote:

> Hello everyone !
>
> I'm using SolR 5.3.1 with solrj.SolrClient.
> My index is sliced in 3 shards, each on different server. (No replica on
> dev platform)
> It has been up to date for a few days...
>
> [image: Images intégrées 2]
>
> I'm trying to use the RealTimeGetHandler to get documents by their Id.
> In our usecase, documents are updated very frequently,  so  we have to
> look in the tlog before searching the index.
>
> When I use the SolrClient.getById() (with a list of document Ids recently
> extracted from the index)
>
> SolR doesn't return *all* the documents corresponding to these Ids.
> So I tried to use the Solr api directly:
>
> http://server:port/solr/index/get/ids=id1,id2,id3
> And this is the same. Some ids don't works.
>
> In my example, id1 doesn't return a document, id2 and id3 or OK.
>
> If I try a filtered query with the id1, it works fine, the document exists
> in the index and is found by SolR
>
> Can anybody explain why a document, present in the index, with no
> uncommited update or delete, is not found by the Real Time Get Handler ?
>
> Regards,
> Jeremie
>
>