Re: Injest pauses
Hey Brad, This leads me to believe that a single merge thread is blocking indexing from occuring. When this happens our producers, which distribute their updates amongst all the shards, pile up on this shard and wait. Which version of Solr you are using? Have you tried 4.0 beta? * http://searchhub.org/dev/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/ * https://issues.apache.org/jira/browse/SOLR-2565 Alexey
RE: Injest pauses
Very interesting links, after much more digging yesterday this appears to be exactly what I'm seeing. I am using 4.0 beta currently for my testing. FWIW I've also pulled trunk from svn as of yesterday and experienced the same issue. From: Alexey Serba [ase...@gmail.com] Sent: Wednesday, August 29, 2012 6:45 AM To: solr-user@lucene.apache.org Subject: Re: Injest pauses Hey Brad, This leads me to believe that a single merge thread is blocking indexing from occuring. When this happens our producers, which distribute their updates amongst all the shards, pile up on this shard and wait. Which version of Solr you are using? Have you tried 4.0 beta? * http://searchhub.org/dev/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/ * https://issues.apache.org/jira/browse/SOLR-2565 Alexey
RE: Injest pauses
Anyone know the actual status of SOLR-2565, it looks to be marked as resolved in 4.* but I am still seeing long pauses during commits using 4.* I am currently digging through code to see what I can find, but java not being my primary (or secondary ) language it is mostly slow going. -Original Message- From: Voth, Brad (GE Corporate) Sent: Wednesday, August 29, 2012 10:17 AM To: solr-user@lucene.apache.org Subject: RE: Injest pauses Very interesting links, after much more digging yesterday this appears to be exactly what I'm seeing. I am using 4.0 beta currently for my testing. FWIW I've also pulled trunk from svn as of yesterday and experienced the same issue. From: Alexey Serba [ase...@gmail.com] Sent: Wednesday, August 29, 2012 6:45 AM To: solr-user@lucene.apache.org Subject: Re: Injest pauses Hey Brad, This leads me to believe that a single merge thread is blocking indexing from occuring. When this happens our producers, which distribute their updates amongst all the shards, pile up on this shard and wait. Which version of Solr you are using? Have you tried 4.0 beta? * http://searchhub.org/dev/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/ * https://issues.apache.org/jira/browse/SOLR-2565 Alexey
Re: Injest pauses
On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) brad.v...@ge.com wrote: Anyone know the actual status of SOLR-2565, it looks to be marked as resolved in 4.* but I am still seeing long pauses during commits using 4.* SOLR-2565 is definitely committed - adds are no longer blocked by commits (at least at the Solr level). -Yonik http://lucidworks.com
RE: Injest pauses
Thanks, I'll continue with my testing and tracking down the block. -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, August 29, 2012 12:28 PM To: solr-user@lucene.apache.org Subject: Re: Injest pauses On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) brad.v...@ge.com wrote: Anyone know the actual status of SOLR-2565, it looks to be marked as resolved in 4.* but I am still seeing long pauses during commits using 4.* SOLR-2565 is definitely committed - adds are no longer blocked by commits (at least at the Solr level). -Yonik http://lucidworks.com
RE: Injest pauses
Interestingly it is not pausing during every commit so at least a portion of the time the async commit code is working. Trying to track down the case where a wait would still be issued. -Original Message- From: Voth, Brad (GE Corporate) Sent: Wednesday, August 29, 2012 12:32 PM To: solr-user@lucene.apache.org Subject: RE: Injest pauses Thanks, I'll continue with my testing and tracking down the block. -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, August 29, 2012 12:28 PM To: solr-user@lucene.apache.org Subject: Re: Injest pauses On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) brad.v...@ge.com wrote: Anyone know the actual status of SOLR-2565, it looks to be marked as resolved in 4.* but I am still seeing long pauses during commits using 4.* SOLR-2565 is definitely committed - adds are no longer blocked by commits (at least at the Solr level). -Yonik http://lucidworks.com
Re: Injest pauses
Could you take jstack dump when it's happening and post it here? Interestingly it is not pausing during every commit so at least a portion of the time the async commit code is working. Trying to track down the case where a wait would still be issued. -Original Message- From: Voth, Brad (GE Corporate) Sent: Wednesday, August 29, 2012 12:32 PM To: solr-user@lucene.apache.org Subject: RE: Injest pauses Thanks, I'll continue with my testing and tracking down the block. -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, August 29, 2012 12:28 PM To: solr-user@lucene.apache.org Subject: Re: Injest pauses On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) brad.v...@ge.com wrote: Anyone know the actual status of SOLR-2565, it looks to be marked as resolved in 4.* but I am still seeing long pauses during commits using 4.* SOLR-2565 is definitely committed - adds are no longer blocked by commits (at least at the Solr level). -Yonik http://lucidworks.com
Re: Injest pauses
Hello Brad, At one point you said CPU is at 100% and there is no disk IO. Then in a separate email I think you said this happens during RAM - Disk flush. Isn't there a contradiction here? A few thread dumps may tell you where things are stuck. Also, how does your JVM look while this is happening? Could this be just Garbage Collection? SPM (see URL in sig) may be helpful here. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm - Original Message - From: Voth, Brad (GE Corporate) brad.v...@ge.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Cc: Sent: Wednesday, August 29, 2012 1:05 PM Subject: RE: Injest pauses Interestingly it is not pausing during every commit so at least a portion of the time the async commit code is working. Trying to track down the case where a wait would still be issued. -Original Message- From: Voth, Brad (GE Corporate) Sent: Wednesday, August 29, 2012 12:32 PM To: solr-user@lucene.apache.org Subject: RE: Injest pauses Thanks, I'll continue with my testing and tracking down the block. -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, August 29, 2012 12:28 PM To: solr-user@lucene.apache.org Subject: Re: Injest pauses On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) brad.v...@ge.com wrote: Anyone know the actual status of SOLR-2565, it looks to be marked as resolved in 4.* but I am still seeing long pauses during commits using 4.* SOLR-2565 is definitely committed - adds are no longer blocked by commits (at least at the Solr level). -Yonik http://lucidworks.com
RE: Injest pauses
I've noticed the following messages in the infostream log around the times the pauses begin... DW 0 [Tue Aug 28 13:25:29 UTC 2012; qtp435584308-969]: WARNING DocumentsWriter has stalled threads; waiting From: Voth, Brad (GE Corporate) Sent: Monday, August 27, 2012 4:54 PM To: solr-user@lucene.apache.org Subject: Injest pauses Hello all, I'm working on implementing a solution in a very high index rate, lower query rate project. We have a very small record size, 2 small strings, 6 longs, 7 ints, and 2 dates, indexing on 8 of those fields. We need to be able to maintain an injest rate of 50k records/sec average. Through sharding and a reasonable sized cluster we've hit most of our performance goals, but have found that our producers tend to get hung on a shard that is doing a merge. I've done a bit of digging and found some tips and hints on configuring the merging, but have yet to get rid of the issue. What we see on the server hosting the shard during a problematic period is a single cpu core at 100%, and very little IO activity on the disk and merge messages in the logs. This leads me to believe that a single merge thread is blocking indexing from occuring. When this happens our producers, which distribute their updates amongst all the shards, pile up on this shard and wait. This causes our overall injest rate to plummet and is currently keeping us from going to prod with the solution. The relevant portion of our solrconfig.xml is: indexConfig ramBufferSizeMB1024/ramBufferSizeMB mergeFactor20/mergeFactor mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce8/int int name=segmentsPerTier20/int /mergePolicy mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxMergeCount10/int int name=maxThreadCount10/int /mergeScheduler autoCommit maxTime150/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1/maxTime /autoSoftCommit /indexConfig Most of the settings above are the result of many trial runs and minimal difference between each change. Any thoughts? Brad Voth
Re: Injest pauses
On 8/27/2012 2:54 PM, Voth, Brad (GE Corporate) wrote: Hello all, I'm working on implementing a solution in a very high index rate, lower query rate project. We have a very small record size, 2 small strings, 6 longs, 7 ints, and 2 dates, indexing on 8 of those fields. We need to be able to maintain an injest rate of 50k records/sec average. Through sharding and a reasonable sized cluster we've hit most of our performance goals, but have found that our producers tend to get hung on a shard that is doing a merge. I've done a bit of digging and found some tips and hints on configuring the merging, but have yet to get rid of the issue. What we see on the server hosting the shard during a problematic period is a single cpu core at 100%, and very little IO activity on the disk and merge messages in the logs. This leads me to believe that a single merge thread is blocking indexing from occuring. When this happens our producers, which distribute their updates amongst all the shards, pile up on this shard and wait. This causes our overall injest rate to plummet and is currently keeping us from going to prod with the solution. The relevant portion of our solrconfig.xml is: indexConfig ramBufferSizeMB1024/ramBufferSizeMB mergeFactor20/mergeFactor mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce8/int int name=segmentsPerTier20/int /mergePolicy mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxMergeCount10/int int name=maxThreadCount10/int /mergeScheduler autoCommit maxTime150/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1/maxTime /autoSoftCommit /indexConfig The mergeFactor configuration is no longer applicable if you are using TieredMergePolicy, so that should be removed. Your maxMergeAtOnce field looks low to me, it should probably be the same as segmentsPerTier. Here is what I have currently on my Solr 3.5 setup. This emulates what a mergeFactor of 35 would have done on Solr 3.4 or lower. The maxMergeAtOnceExplicit is three times the other values so that three levels of merging can take place simultaneously: mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce35/int int name=segmentsPerTier35/int int name=maxMergeAtOnceExplicit105/int /mergePolicy With a merge and thread count of 10, this is probably what you want: mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce20/int int name=segmentsPerTier20/int int name=maxMergeAtOnceExplicit200/int /mergePolicy I admit that I am only taking a shot in the dark here, I have no idea whether it will help or not. Thanks, Shawn
RE: Injest pauses
Thanks for the insight on the merge config. That bit makes a bit more sense to me now. Though in digging more it looks like my primary issue is less to do with the merging and more to do with flushes from rambuffer - disk. I've spent most of the morning digging through the lucene code to drum up where the flushes happen and how/why the flushing blocks indexing and begins only flushing with a single thread. If anyone has any insight there it would be much appreciated. In the meantime while I'm doing some .java file reading I'm working on tuning back my producers to see if the flushing is able to keep up at some point. From: Shawn Heisey [s...@elyograg.org] Sent: Tuesday, August 28, 2012 1:37 PM To: solr-user@lucene.apache.org Subject: Re: Injest pauses On 8/27/2012 2:54 PM, Voth, Brad (GE Corporate) wrote: Hello all, I'm working on implementing a solution in a very high index rate, lower query rate project. We have a very small record size, 2 small strings, 6 longs, 7 ints, and 2 dates, indexing on 8 of those fields. We need to be able to maintain an injest rate of 50k records/sec average. Through sharding and a reasonable sized cluster we've hit most of our performance goals, but have found that our producers tend to get hung on a shard that is doing a merge. I've done a bit of digging and found some tips and hints on configuring the merging, but have yet to get rid of the issue. What we see on the server hosting the shard during a problematic period is a single cpu core at 100%, and very little IO activity on the disk and merge messages in the logs. This leads me to believe that a single merge thread is blocking indexing from occuring. When this happens our producers, which distribute their updates amongst all the shards, pile up on this shard and wait. This causes our overall injest rate to plummet and is currently keeping us from going to prod with the solution. The relevant portion of our solrconfig.xml is: indexConfig ramBufferSizeMB1024/ramBufferSizeMB mergeFactor20/mergeFactor mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce8/int int name=segmentsPerTier20/int /mergePolicy mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxMergeCount10/int int name=maxThreadCount10/int /mergeScheduler autoCommit maxTime150/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1/maxTime /autoSoftCommit /indexConfig The mergeFactor configuration is no longer applicable if you are using TieredMergePolicy, so that should be removed. Your maxMergeAtOnce field looks low to me, it should probably be the same as segmentsPerTier. Here is what I have currently on my Solr 3.5 setup. This emulates what a mergeFactor of 35 would have done on Solr 3.4 or lower. The maxMergeAtOnceExplicit is three times the other values so that three levels of merging can take place simultaneously: mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce35/int int name=segmentsPerTier35/int int name=maxMergeAtOnceExplicit105/int /mergePolicy With a merge and thread count of 10, this is probably what you want: mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce20/int int name=segmentsPerTier20/int int name=maxMergeAtOnceExplicit200/int /mergePolicy I admit that I am only taking a shot in the dark here, I have no idea whether it will help or not. Thanks, Shawn