Query on SuggestComponent in SolrCloud(6.2.1).

2016-12-04 Thread Pavan Kumar VVR
Hi,

We are in process of migrating our solr environment from standalone Solr(4.9) 
to SolrCloud(6.2.1).
During this process we facing issues with solr.SuggestComponent, which is 
working fine earlier in 4.9 with solr.TermsComponent

PFB detailed description for same and kindly assistance.
Let us know if any additional details are required

Collection Name: Al_prnt_grp
Shards  : IN_39_SWG and IN_40_SWG(dynamically created using 
implicit router)
Document indexed into IN_39_SWG shared(which has words like company, computer, 
compare with etc.)


* Using below entries/configuration in solrconfig.xml and 
managed-schema.xml I have uploaded the configsets to zookeeper.

* And reloaded the collection using collection api.


* Finally when I was running below query for auto-complete, it was not 
displaying 0 suggestions found.
http://:/solr/Al_prnt_grp/suggesthandler?suggest.dictionary=mySuggester=true=true=comp=:/solr/IN_39_SWG=true=/suggest=false=true=json


* When I have gone through solr.log I was seeing below error.
INFO  - 2016-12-05 05:45:05.982; [c:Al_prnt_grp s:IN_40_SWG r:core_node5 
x:Al_prnt_grp_IN_40_SWG_replica3] 
org.apache.solr.spelling.suggest.SolrSuggester; SolrSuggester.build(mySuggester)
ERROR - 2016-12-05 05:45:05.984; [c:Al_prnt_grp s:IN_40_SWG r:core_node5 
x:Al_prnt_grp_IN_40_SWG_replica3] 
org.apache.solr.spelling.suggest.SolrSuggester; Store Lookup build failed
INFO  - 2016-12-05 05:45:05.984; [c:Al_prnt_grp s:IN_40_SWG r:core_node5 
x:Al_prnt_grp_IN_40_SWG_replica3] 
org.apache.solr.handler.component.SuggestComponent; SuggestComponent process 
with : shards.keys=:/solr/IN_39_SWG=false=true=true=true=comp=10==true=mySuggester=json=/suggest


* When I am looking into forums I see similar issue was raised on 
20/Jun/2016 which is still open.
https://issues.apache.org/jira/browse/SOLR-9227

Step 1: Entry in solrconfig.xml

* Added searchcomponent and requesthandler for same in solrconfig.xml



  
mySuggester
FuzzyLookupFactory
suggester_fuzzy_dir
DocumentDictionaryFactory
content_autosuggest
text_auto
false
false
  



  
true
10
mySuggester
false
  
  
suggest
  



Step 2 : Entry in managed-schema.xml


   ...
   
 
   
  
 
  
  
  
  


  

Step 3 : Uploaded configsets to zookeeper for one collection

./zkcli.sh zkcli.sh -cmd upconfig -zkhost  zookeeperdev:8080,zookeeperprod:8081 
 -collection Al_prnt_grp -confname IR_CORE -solrhome 
/opt/solrusrcloud/solr-6.2.1/server/solr -confdir 
/opt/solrcloud/solr-6.2.1/server/solr/configsets/data_driven_schema_configs/conf

Step 4 : Reloaded the collection

http://:/solr/admin/collections?action=RELOAD=Al_prnt_grp



Best Regards,
V.V.R.Pavan Kumar
Technology Lead
INDD Unit
Infosys Limited
Desk : +91 044 44462476
Mob   : +91 7823982845



Re: Memory leak in Solr

2016-12-04 Thread Jeff Wartes

Here’s an earlier post where I mentioned some GC investigation tools:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3c8f8fa32d-ec0e-4352-86f7-4b2d8a906...@whitepages.com%3E

In my experience, there are many aspects of the Solr/Lucene memory allocation 
model that scale with things other than documents returned. (such as 
cardinality, or simply index size) A single query on a large index might 
consume dozens of megabytes of heap to complete. But that heap should also be 
released quickly after the query finishes.
The key characteristic of a memory leak is that the software is allocating 
memory that it cannot reclaim. If it’s a leak, you ought to be able to 
reproduce it at any query rate - have you tried this? A run with, say, half the 
rate, over twice the duration?

I’m inclined to agree with others here, that although you’ve correctly 
attributed the cause to GC, it’s probably less an indication of a leak, and 
more an indication of simply allocating memory faster than it can be reclaimed, 
combined with the long pauses that are increasingly unavoidable as heap size 
goes up.
Note that in the case of a CMS allocation failure, the fallback full-GC is 
*single threaded*, which means it’ll usually take considerably longer than a 
normal GC - even for a comparable amount of garbage.

In addition to GC tuning, you can address these by sharding more, both at the 
core and jvm level.


On 12/4/16, 3:46 PM, "Shawn Heisey"  wrote:

On 12/3/2016 9:46 PM, S G wrote:
> The symptom we see is that the java clients querying Solr see response
> times in 10s of seconds (not milliseconds).

> Some numbers for the Solr Cloud:
>
> *Overall infrastructure:*
> - Only one collection
> - 16 VMs used
> - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
>
> *Overview from one core:*
> - Num Docs:193,623,388
> - Max Doc:230,577,696
> - Heap Memory Usage:231,217,880
> - Deleted Docs:36,954,308
> - Version:2,357,757
> - Segment Count:37

The heap memory usage number isn't useful.  It doesn't cover all the
memory used.

> *Stats from QueryHandler/select*
> - requests:78,557
> - errors:358
> - timeouts:0
> - totalTime:1,639,975.27
> - avgRequestsPerSecond:2.62
> - 5minRateReqsPerSecond:1.39
> - 15minRateReqsPerSecond:1.64
> - avgTimePerRequest:20.87
> - medianRequestTime:0.70
> - 75thPcRequestTime:1.11
> - 95thPcRequestTime:191.76

These times are in *milliseconds*, not seconds .. and these are even
better numbers than you showed before.  Where are you seeing 10 plus
second query times?  Solr is not showing numbers like that.

If your VM host has 16 VMs on it and each one has a total memory size of
92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
oversubscribed, and this is going to lead to terrible performance... but
the numbers you've shown here do not show terrible performance.

> Plus, on every server, we are seeing lots of exceptions.
> For example:
>
> Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>
> 1) Request says it is coming from leader, but we are the leader:
> 
update.distrib=FROMLEADER=HOSTB_ca_1_1456430020/=javabin=2
>
> 2) org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> 3) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 4) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 5) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 6) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 7) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 8) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 9) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 10) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server 

Re: Memory leak in Solr

2016-12-04 Thread Shawn Heisey
On 12/3/2016 9:46 PM, S G wrote:
> The symptom we see is that the java clients querying Solr see response
> times in 10s of seconds (not milliseconds).

> Some numbers for the Solr Cloud:
>
> *Overall infrastructure:*
> - Only one collection
> - 16 VMs used
> - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
>
> *Overview from one core:*
> - Num Docs:193,623,388
> - Max Doc:230,577,696
> - Heap Memory Usage:231,217,880
> - Deleted Docs:36,954,308
> - Version:2,357,757
> - Segment Count:37

The heap memory usage number isn't useful.  It doesn't cover all the
memory used.

> *Stats from QueryHandler/select*
> - requests:78,557
> - errors:358
> - timeouts:0
> - totalTime:1,639,975.27
> - avgRequestsPerSecond:2.62
> - 5minRateReqsPerSecond:1.39
> - 15minRateReqsPerSecond:1.64
> - avgTimePerRequest:20.87
> - medianRequestTime:0.70
> - 75thPcRequestTime:1.11
> - 95thPcRequestTime:191.76

These times are in *milliseconds*, not seconds .. and these are even
better numbers than you showed before.  Where are you seeing 10 plus
second query times?  Solr is not showing numbers like that.

If your VM host has 16 VMs on it and each one has a total memory size of
92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
oversubscribed, and this is going to lead to terrible performance... but
the numbers you've shown here do not show terrible performance.

> Plus, on every server, we are seeing lots of exceptions.
> For example:
>
> Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>
> 1) Request says it is coming from leader, but we are the leader:
> update.distrib=FROMLEADER=HOSTB_ca_1_1456430020/=javabin=2
>
> 2) org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> 3) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 4) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 5) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 6) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 7) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 8) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 9) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 10) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 11) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 12) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast

These errors sound like timeouts, possibly caused by long GC pauses ...
but as already mentioned, the query handler statistics do not indicate
long query times.  If a long GC were to happen during a query, then the
query time would be long as well.

The core information above doesn't include the size of the index on
disk.  That number would be useful for telling you whether there's
enough memory.

As I said at the beginning of the thread, I haven't seen anything here
to indicate a memory leak, and others are using version 4.10 without any
problems.  If there were a memory leak in a released version of Solr,
many people would have run into problems with it.

Thanks,
Shawn



field length within BM25 score calculation in Solr 6.3

2016-12-04 Thread Sascha Szott
Hi folks,

my Solr index consists of one document with a single valued field "title" of 
type "text_general". The title field was index with the content: 1 2 3 4 5 6 7 
8 9. The field type text_general uses a StandardTokenizer which should result 
in 9 tokens. The corresponding length of field title in the given document is 9.

The field type is defined as follows:

  

  
  
  


  
  
  
  

  


I’ve checked that none of the nine tokens (1, 2, …, 9) is a stop word.

As expected, the query title:1 returns the given document. The BM25 score of 
the document for the given query is 0.272. 

But why does Solr 6.3 states that the length of field title is 10.24?

0.27233246 = weight(title_alt:1 in 0) [SchemaSimilarity], result of:
  0.27233246 = score(doc=0,freq=1.0 = termFreq=1.0), product of:
0.2876821 = idf(docFreq=1, docCount=1)
0.94664377 = tfNorm, computed from:
  1.0 = termFreq=1.0
  1.2 = parameter k1
  0.75 = parameter b
  9.0 = avgFieldLength
  10.24 = fieldLength

In contrast, the value of avgFieldLength is correct.

The same observation can be made if the index consists of two simple documents:

doc1: title = 1 2 3 4
doc2: title = 1 2 3 4 5 6 7 8

The BM25 score calculation of doc2 is explained as:

0.14143422 = weight(title_alt:1 in 1) [SchemaSimilarity], result of:
  0.14143422 = score(doc=1,freq=1.0 = termFreq=1.0), product of:
0.18232156 = idf(docFreq=2, docCount=2)
0.7757405 = tfNorm, computed from:
  1.0 = termFreq=1.0
  1.2 = parameter k1
  0.75 = parameter b
  6.0 = avgFieldLength
  10.24 = fieldLength

The value of fieldLength does not match 8.

Is there same "magic“ applied to the value of field length that goes beyond the 
standard BM25 score formula? 

If so, what is the idea behind this modification. If not, is this a Lucene / 
Solr bug?

Best regards,
Sascha






Re: Memory leak in Solr

2016-12-04 Thread Walter Underwood
That is a huge heap.

Once you have enough heap memory to hold a Java program’s working set,
more memory doesn’t make it faster. I just makes the GC take longer.

If you have GC monitoring, look at how much memory is in use after a full GC.
Add the space for new generation (eden, whatever), then a bit more for 
burst memory usage. Set the heap to that.

I recommend fairly large new generation memory allocation. An HTTP service
has a fair amount of allocation that has a lifetime of one HTTP request. Those
allocations should never be promoted to tenured space.

We run with an 8G heap and a 2G new generation with 4.10.4.

Of course, make sure you are running some sort of parallel GC. You can use
G1 or use CMS with ParNew, your choice. We are running CMS/ParNew, but
will be experimenting with G1 soon.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 4, 2016, at 11:07 AM, S G  wrote:
> 
> Thank you Eric.
> Our Solr version is 4.10 and we are not doing any sorting or faceting.
> 
> I am trying to find some ways of investigating this problem.
> Hence asking a few more questions to see what are the normal steps taken in
> such situations.
> (I did search a few of them on the Internet but could not find anything
> good).
> Any pointers provided here will help us resolve a little more quickly.
> 
> 
> 1) Is there a conclusive way to know about the memory leaks?
>  How does Solr ensure with each release that there are no memory leaks?
>  With a heap 24gb (-Xmx parameter), I sometimes see GC pauses of about 1
> second now.
>  Looks like we will need to scale it down.
>  Total VM memory is 92gb and Solr is the only process running on it.
> 
> 
> 2) How can I know that the zookeeper connectivity to Solr is not good?
>  What commands/steps are normally used to resolve this?
>  Does Solr has some metrics that share the zookeeper interaction
> statistics?
> 
> 
> 3) In a span of 9 hours, I see:
>  4 times: java.net.SocketException: Connection reset
>  32 times: java.net.SocketTimeoutException: Read timed out
> 
> And several other exceptions that ultimately bring a whole shard down
> (leader is recovery-failed and replica is down).
> 
> I understand that the above information might not be sufficient to get the
> full picture.
> But just in case, someone has resolved or debugged these issues before,
> please share your experience.
> It would be of great help to me.
> 
> Thanks,
> SG
> 
> 
> 
> 
> 
> On Sun, Dec 4, 2016 at 8:59 AM, Erick Erickson 
> wrote:
> 
>> All of this is consistent with not having a properly
>> tuned Solr instance wrt # documents, usage
>> pattern, memory allocated to the JVM, GC
>> settings and the like.
>> 
>> Your leader issues can be explained by long
>> GC pauses too. Zookeeper periodically pings
>> each replica it knows about and if the response
>> times out (due to GC in this case) then Zookeeper
>> thinks the node has gone away and marks
>> it as "down". Similarly when a leader forwards
>> an update to a follower and the request times
>> out, the leader will mark the follower as down.
>> Do this enough and the state of the cluster gets
>> "interesting".
>> 
>> You still haven't told us what version of Solr
>> you're using, the "Version" you took from
>> the core stats is the version of the _index_,
>> not Solr.
>> 
>> You have almost 200M documents on
>> a single core. That's definitely on the high side,
>> although I've seen that work. Assuming
>> you aren't doing things like faceting and
>> sorting and the like on non docValues fields.
>> 
>> As others have pointed out, the link you
>> provided doesn't provide much in the way of
>> any "smoking guns" as far as a memory
>> leak is concerned.
>> 
>> I've certainly seen situations where memory
>> required by Solr is close to the total memory
>> allocated to the JVM for instance. Then the GC
>> cycle kicks in and recovers just enough to
>> go on for a very brief time before going into another
>> GC cycle resulting in very poor performance.
>> 
>> So overall this looks like you need to do some
>> serious tuning of your Solr instances, take a
>> hard look at how you're using your physical
>> machines. You specify that these are VMs,
>> but how many VMs are you running per box?
>> How much JVM have you allocated for each?
>> How much total physical memory do you have
>> to work with per box?
>> 
>> Even if you provide the answers to the above
>> questions, there's not much we can do to
>> help you resolve your issues assuming it's
>> simply inappropriate sizing. I'd really recommend
>> you create a stress environment so you can
>> test different scenarios to become confident about
>> your expected performance, here's a blog on the
>> subject:
>> 
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
>> the-abstract-why-we-dont-have-a-definitive-answer/
>> 
>> Best,
>> Erick
>> 
>> On Sat, Dec 3, 2016 at 8:46 PM, S G 

Re: Memory leak in Solr

2016-12-04 Thread S G
Thank you Eric.
Our Solr version is 4.10 and we are not doing any sorting or faceting.

I am trying to find some ways of investigating this problem.
Hence asking a few more questions to see what are the normal steps taken in
such situations.
(I did search a few of them on the Internet but could not find anything
good).
Any pointers provided here will help us resolve a little more quickly.


1) Is there a conclusive way to know about the memory leaks?
  How does Solr ensure with each release that there are no memory leaks?
  With a heap 24gb (-Xmx parameter), I sometimes see GC pauses of about 1
second now.
  Looks like we will need to scale it down.
  Total VM memory is 92gb and Solr is the only process running on it.


2) How can I know that the zookeeper connectivity to Solr is not good?
  What commands/steps are normally used to resolve this?
  Does Solr has some metrics that share the zookeeper interaction
statistics?


3) In a span of 9 hours, I see:
  4 times: java.net.SocketException: Connection reset
  32 times: java.net.SocketTimeoutException: Read timed out

And several other exceptions that ultimately bring a whole shard down
(leader is recovery-failed and replica is down).

I understand that the above information might not be sufficient to get the
full picture.
But just in case, someone has resolved or debugged these issues before,
please share your experience.
It would be of great help to me.

Thanks,
SG





On Sun, Dec 4, 2016 at 8:59 AM, Erick Erickson 
wrote:

> All of this is consistent with not having a properly
> tuned Solr instance wrt # documents, usage
> pattern, memory allocated to the JVM, GC
> settings and the like.
>
> Your leader issues can be explained by long
> GC pauses too. Zookeeper periodically pings
> each replica it knows about and if the response
> times out (due to GC in this case) then Zookeeper
> thinks the node has gone away and marks
> it as "down". Similarly when a leader forwards
> an update to a follower and the request times
> out, the leader will mark the follower as down.
> Do this enough and the state of the cluster gets
> "interesting".
>
> You still haven't told us what version of Solr
> you're using, the "Version" you took from
> the core stats is the version of the _index_,
> not Solr.
>
> You have almost 200M documents on
> a single core. That's definitely on the high side,
> although I've seen that work. Assuming
> you aren't doing things like faceting and
> sorting and the like on non docValues fields.
>
> As others have pointed out, the link you
> provided doesn't provide much in the way of
> any "smoking guns" as far as a memory
> leak is concerned.
>
> I've certainly seen situations where memory
> required by Solr is close to the total memory
> allocated to the JVM for instance. Then the GC
> cycle kicks in and recovers just enough to
> go on for a very brief time before going into another
> GC cycle resulting in very poor performance.
>
> So overall this looks like you need to do some
> serious tuning of your Solr instances, take a
> hard look at how you're using your physical
> machines. You specify that these are VMs,
> but how many VMs are you running per box?
> How much JVM have you allocated for each?
> How much total physical memory do you have
> to work with per box?
>
> Even if you provide the answers to the above
> questions, there's not much we can do to
> help you resolve your issues assuming it's
> simply inappropriate sizing. I'd really recommend
> you create a stress environment so you can
> test different scenarios to become confident about
> your expected performance, here's a blog on the
> subject:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
> the-abstract-why-we-dont-have-a-definitive-answer/
>
> Best,
> Erick
>
> On Sat, Dec 3, 2016 at 8:46 PM, S G  wrote:
> > The symptom we see is that the java clients querying Solr see response
> > times in 10s of seconds (not milliseconds).
> > And on the tomcat's gc.log file (where Solr is running), we see very bad
> GC
> > pauses - threads being paused for 0.5 seconds per second approximately.
> >
> > Some numbers for the Solr Cloud:
> >
> > *Overall infrastructure:*
> > - Only one collection
> > - 16 VMs used
> > - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
> >
> > *Overview from one core:*
> > - Num Docs:193,623,388
> > - Max Doc:230,577,696
> > - Heap Memory Usage:231,217,880
> > - Deleted Docs:36,954,308
> > - Version:2,357,757
> > - Segment Count:37
> >
> > *Stats from QueryHandler/select*
> > - requests:78,557
> > - errors:358
> > - timeouts:0
> > - totalTime:1,639,975.27
> > - avgRequestsPerSecond:2.62
> > - 5minRateReqsPerSecond:1.39
> > - 15minRateReqsPerSecond:1.64
> > - avgTimePerRequest:20.87
> > - medianRequestTime:0.70
> > - 75thPcRequestTime:1.11
> > - 95thPcRequestTime:191.76
> >
> > *Stats from QueryHandler/update*
> > - requests:33,555
> > - errors:0
> > - timeouts:0
> > - 

Re: Memory leak in Solr

2016-12-04 Thread Erick Erickson
All of this is consistent with not having a properly
tuned Solr instance wrt # documents, usage
pattern, memory allocated to the JVM, GC
settings and the like.

Your leader issues can be explained by long
GC pauses too. Zookeeper periodically pings
each replica it knows about and if the response
times out (due to GC in this case) then Zookeeper
thinks the node has gone away and marks
it as "down". Similarly when a leader forwards
an update to a follower and the request times
out, the leader will mark the follower as down.
Do this enough and the state of the cluster gets
"interesting".

You still haven't told us what version of Solr
you're using, the "Version" you took from
the core stats is the version of the _index_,
not Solr.

You have almost 200M documents on
a single core. That's definitely on the high side,
although I've seen that work. Assuming
you aren't doing things like faceting and
sorting and the like on non docValues fields.

As others have pointed out, the link you
provided doesn't provide much in the way of
any "smoking guns" as far as a memory
leak is concerned.

I've certainly seen situations where memory
required by Solr is close to the total memory
allocated to the JVM for instance. Then the GC
cycle kicks in and recovers just enough to
go on for a very brief time before going into another
GC cycle resulting in very poor performance.

So overall this looks like you need to do some
serious tuning of your Solr instances, take a
hard look at how you're using your physical
machines. You specify that these are VMs,
but how many VMs are you running per box?
How much JVM have you allocated for each?
How much total physical memory do you have
to work with per box?

Even if you provide the answers to the above
questions, there's not much we can do to
help you resolve your issues assuming it's
simply inappropriate sizing. I'd really recommend
you create a stress environment so you can
test different scenarios to become confident about
your expected performance, here's a blog on the
subject:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Sat, Dec 3, 2016 at 8:46 PM, S G  wrote:
> The symptom we see is that the java clients querying Solr see response
> times in 10s of seconds (not milliseconds).
> And on the tomcat's gc.log file (where Solr is running), we see very bad GC
> pauses - threads being paused for 0.5 seconds per second approximately.
>
> Some numbers for the Solr Cloud:
>
> *Overall infrastructure:*
> - Only one collection
> - 16 VMs used
> - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
>
> *Overview from one core:*
> - Num Docs:193,623,388
> - Max Doc:230,577,696
> - Heap Memory Usage:231,217,880
> - Deleted Docs:36,954,308
> - Version:2,357,757
> - Segment Count:37
>
> *Stats from QueryHandler/select*
> - requests:78,557
> - errors:358
> - timeouts:0
> - totalTime:1,639,975.27
> - avgRequestsPerSecond:2.62
> - 5minRateReqsPerSecond:1.39
> - 15minRateReqsPerSecond:1.64
> - avgTimePerRequest:20.87
> - medianRequestTime:0.70
> - 75thPcRequestTime:1.11
> - 95thPcRequestTime:191.76
>
> *Stats from QueryHandler/update*
> - requests:33,555
> - errors:0
> - timeouts:0
> - totalTime:227,870.58
> - avgRequestsPerSecond:1.12
> - 5minRateReqsPerSecond:1.16
> - 15minRateReqsPerSecond:1.23
> - avgTimePerRequest:6.79
> - medianRequestTime:3.16
> - 75thPcRequestTime:5.27
> - 95thPcRequestTime:9.33
>
> And yet the Solr clients are reporting timeouts and very long read times.
>
> Plus, on every server, we are seeing lots of exceptions.
> For example:
>
> Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>
> 1) Request says it is coming from leader, but we are the leader:
> update.distrib=FROMLEADER=HOSTB_ca_1_1456430020/=javabin=2
>
> 2) org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> 3) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 4) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 5) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 6) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 7) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 8) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server 

Re: Solr custom document routing

2016-12-04 Thread Erick Erickson
_Why_ do you need to do this? Frankly this sounds
like an XY problem, you think the proper solution
is partitioning the docs as you're asking about because
it'll solve some problem. But you haven't stated what
the problem you're trying to solve is.

If you're just trying to optimize querying, I _strongly_
you measure the performance and see if it's worth
the effort. Premature optimization and all that.

Best,
Erick

On Sat, Dec 3, 2016 at 11:14 PM, SOLR4189  wrote:
> Lets say I have a collection with 4 shards. I need shard1 to contain all
> documents with fieldX=true and shard2-shard4 to contain all documents with
> fieldX=false. I need this to work while indexing and while quering. How can
> I do it in SOLR?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-custom-document-routing-tp4308432.html
> Sent from the Solr - User mailing list archive at Nabble.com.