Re: Injest pauses

2012-08-29 Thread Alexey Serba
Hey Brad,

 This leads me to believe that a single merge thread is blocking indexing from 
 occuring.
 When this happens our producers, which distribute their updates amongst all 
 the shards, pile up on this shard and wait.
Which version of Solr you are using? Have you tried 4.0 beta?

* 
http://searchhub.org/dev/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/
* https://issues.apache.org/jira/browse/SOLR-2565

Alexey


RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Very interesting links, after much more digging yesterday this appears to be 
exactly what I'm seeing.

I am using 4.0 beta currently for my testing.  FWIW I've also pulled trunk from 
svn  as of yesterday and experienced the same issue.

From: Alexey Serba [ase...@gmail.com]
Sent: Wednesday, August 29, 2012 6:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Injest pauses

Hey Brad,

 This leads me to believe that a single merge thread is blocking indexing from 
 occuring.
 When this happens our producers, which distribute their updates amongst all 
 the shards, pile up on this shard and wait.
Which version of Solr you are using? Have you tried 4.0 beta?

* 
http://searchhub.org/dev/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/
* https://issues.apache.org/jira/browse/SOLR-2565

Alexey


RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Anyone know the actual status of SOLR-2565, it looks to be marked as resolved 
in 4.* but I am still seeing long pauses during commits using 4.*

I am currently digging through code to see what I can find, but java not being 
my primary (or secondary ) language it is mostly slow going.

-Original Message-
From: Voth, Brad (GE Corporate) 
Sent: Wednesday, August 29, 2012 10:17 AM
To: solr-user@lucene.apache.org
Subject: RE: Injest pauses

Very interesting links, after much more digging yesterday this appears to be 
exactly what I'm seeing.

I am using 4.0 beta currently for my testing.  FWIW I've also pulled trunk from 
svn  as of yesterday and experienced the same issue.

From: Alexey Serba [ase...@gmail.com]
Sent: Wednesday, August 29, 2012 6:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Injest pauses

Hey Brad,

 This leads me to believe that a single merge thread is blocking indexing from 
 occuring.
 When this happens our producers, which distribute their updates amongst all 
 the shards, pile up on this shard and wait.
Which version of Solr you are using? Have you tried 4.0 beta?

* 
http://searchhub.org/dev/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/
* https://issues.apache.org/jira/browse/SOLR-2565

Alexey


Re: Injest pauses

2012-08-29 Thread Yonik Seeley
On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate)
brad.v...@ge.com wrote:
 Anyone know the actual status of SOLR-2565, it looks to be marked as resolved 
 in 4.* but I am still seeing long pauses during commits using 4.*

SOLR-2565 is definitely committed - adds are no longer blocked by
commits (at least at the Solr level).

-Yonik
http://lucidworks.com


RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Thanks, I'll continue with my testing and tracking down the block.

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Wednesday, August 29, 2012 12:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Injest pauses

On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) brad.v...@ge.com 
wrote:
 Anyone know the actual status of SOLR-2565, it looks to be marked as 
 resolved in 4.* but I am still seeing long pauses during commits using 
 4.*

SOLR-2565 is definitely committed - adds are no longer blocked by commits (at 
least at the Solr level).

-Yonik
http://lucidworks.com


RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Interestingly it is not pausing during every commit so at least a portion of 
the time the async commit code is working.  Trying to track down the case where 
a wait would still be issued.

-Original Message-
From: Voth, Brad (GE Corporate) 
Sent: Wednesday, August 29, 2012 12:32 PM
To: solr-user@lucene.apache.org
Subject: RE: Injest pauses

Thanks, I'll continue with my testing and tracking down the block.

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Wednesday, August 29, 2012 12:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Injest pauses

On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) brad.v...@ge.com 
wrote:
 Anyone know the actual status of SOLR-2565, it looks to be marked as 
 resolved in 4.* but I am still seeing long pauses during commits using
 4.*

SOLR-2565 is definitely committed - adds are no longer blocked by commits (at 
least at the Solr level).

-Yonik
http://lucidworks.com


Re: Injest pauses

2012-08-29 Thread Alexey Serba
Could you take jstack dump when it's happening and post it here?

 Interestingly it is not pausing during every commit so at least a portion of 
 the time the async commit code is working.  Trying to track down the case 
 where a wait would still be issued.
 
 -Original Message-
 From: Voth, Brad (GE Corporate) 
 Sent: Wednesday, August 29, 2012 12:32 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Injest pauses
 
 Thanks, I'll continue with my testing and tracking down the block.
 
 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: Wednesday, August 29, 2012 12:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Injest pauses
 
 On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) 
 brad.v...@ge.com wrote:
 Anyone know the actual status of SOLR-2565, it looks to be marked as 
 resolved in 4.* but I am still seeing long pauses during commits using
 4.*
 
 SOLR-2565 is definitely committed - adds are no longer blocked by commits (at 
 least at the Solr level).
 
 -Yonik
 http://lucidworks.com


Re: Injest pauses

2012-08-29 Thread Otis Gospodnetic
Hello Brad,

At one point you said CPU is at 100% and there is no disk IO.  Then in a 
separate email I think you said this happens during RAM - Disk flush.  Isn't 
there a contradiction here?

A few thread dumps may tell you where things are stuck.

Also, how does your JVM look while this is happening?  Could this be just 
Garbage Collection?  SPM (see URL in sig) may be helpful here.

Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




- Original Message -
 From: Voth, Brad (GE Corporate) brad.v...@ge.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Cc: 
 Sent: Wednesday, August 29, 2012 1:05 PM
 Subject: RE: Injest pauses
 
 Interestingly it is not pausing during every commit so at least a portion of 
 the 
 time the async commit code is working.  Trying to track down the case where a 
 wait would still be issued.
 
 -Original Message-
 From: Voth, Brad (GE Corporate) 
 Sent: Wednesday, August 29, 2012 12:32 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Injest pauses
 
 Thanks, I'll continue with my testing and tracking down the block.
 
 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: Wednesday, August 29, 2012 12:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Injest pauses
 
 On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) 
 brad.v...@ge.com wrote:
  Anyone know the actual status of SOLR-2565, it looks to be marked as 
  resolved in 4.* but I am still seeing long pauses during commits using
  4.*
 
 SOLR-2565 is definitely committed - adds are no longer blocked by commits (at 
 least at the Solr level).
 
 -Yonik
 http://lucidworks.com



RE: Injest pauses

2012-08-28 Thread Voth, Brad (GE Corporate)
I've noticed the following messages in the infostream log around the times the 
pauses begin...

DW 0 [Tue Aug 28 13:25:29 UTC 2012; qtp435584308-969]: WARNING DocumentsWriter 
has stalled threads; waiting




From: Voth, Brad (GE Corporate)
Sent: Monday, August 27, 2012 4:54 PM
To: solr-user@lucene.apache.org
Subject: Injest pauses

Hello all,

I'm working on implementing a solution in a very high index rate, lower query 
rate project.  We have a very small record size, 2 small strings, 6 longs, 7 
ints, and 2 dates, indexing on 8 of those fields.  We need to be able to 
maintain an injest rate of 50k records/sec average.  Through sharding and a 
reasonable sized cluster we've hit most of our performance goals, but have 
found that our producers tend to get hung on a shard that is doing a merge.  
I've done a bit of digging and found some tips and hints on configuring the 
merging, but have yet to get rid of the issue.

What we see on the server hosting the shard during a problematic period is a 
single cpu core at 100%, and very little IO activity on the disk and merge 
messages in the logs.  This leads me to believe that a single merge thread is 
blocking indexing from occuring.  When this happens our producers, which 
distribute their updates amongst all the shards, pile up on this shard and 
wait.  This causes our overall injest rate to plummet and is currently keeping 
us from going to prod with the solution.

The relevant portion of our solrconfig.xml is:

indexConfig

ramBufferSizeMB1024/ramBufferSizeMB
mergeFactor20/mergeFactor
mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  int name=maxMergeAtOnce8/int
  int name=segmentsPerTier20/int
/mergePolicy
   mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxMergeCount10/int
int name=maxThreadCount10/int
/mergeScheduler
 autoCommit
   maxTime150/maxTime
   openSearcherfalse/openSearcher
 /autoCommit
   autoSoftCommit
 maxTime1/maxTime
   /autoSoftCommit


/indexConfig

Most of the settings above are the result of many trial runs and minimal 
difference between each change.

Any thoughts?

Brad Voth



Re: Injest pauses

2012-08-28 Thread Shawn Heisey

On 8/27/2012 2:54 PM, Voth, Brad (GE Corporate) wrote:

Hello all,

I'm working on implementing a solution in a very high index rate, lower query 
rate project.  We have a very small record size, 2 small strings, 6 longs, 7 
ints, and 2 dates, indexing on 8 of those fields.  We need to be able to 
maintain an injest rate of 50k records/sec average.  Through sharding and a 
reasonable sized cluster we've hit most of our performance goals, but have 
found that our producers tend to get hung on a shard that is doing a merge.  
I've done a bit of digging and found some tips and hints on configuring the 
merging, but have yet to get rid of the issue.

What we see on the server hosting the shard during a problematic period is a 
single cpu core at 100%, and very little IO activity on the disk and merge 
messages in the logs.  This leads me to believe that a single merge thread is 
blocking indexing from occuring.  When this happens our producers, which 
distribute their updates amongst all the shards, pile up on this shard and 
wait.  This causes our overall injest rate to plummet and is currently keeping 
us from going to prod with the solution.

The relevant portion of our solrconfig.xml is:

indexConfig

 ramBufferSizeMB1024/ramBufferSizeMB
 mergeFactor20/mergeFactor
 mergePolicy class=org.apache.lucene.index.TieredMergePolicy
   int name=maxMergeAtOnce8/int
   int name=segmentsPerTier20/int
 /mergePolicy
mergeScheduler 
class=org.apache.lucene.index.ConcurrentMergeScheduler
 int name=maxMergeCount10/int
 int name=maxThreadCount10/int
 /mergeScheduler
  autoCommit
maxTime150/maxTime
openSearcherfalse/openSearcher
  /autoCommit
autoSoftCommit
  maxTime1/maxTime
/autoSoftCommit


/indexConfig


The mergeFactor configuration is no longer applicable if you are using 
TieredMergePolicy, so that should be removed.


Your maxMergeAtOnce field looks low to me, it should probably be the 
same as segmentsPerTier.  Here is what I have currently on my Solr 3.5 
setup.  This emulates what a mergeFactor of 35 would have done on Solr 
3.4 or lower.  The maxMergeAtOnceExplicit is three times the other 
values so that three levels of merging can take place simultaneously:


mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  int name=maxMergeAtOnce35/int
  int name=segmentsPerTier35/int
  int name=maxMergeAtOnceExplicit105/int
/mergePolicy

With a merge and thread count of 10, this is probably what you want:

mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  int name=maxMergeAtOnce20/int
  int name=segmentsPerTier20/int
  int name=maxMergeAtOnceExplicit200/int
/mergePolicy

I admit that I am only taking a shot in the dark here, I have no idea 
whether it will help or not.


Thanks,
Shawn



RE: Injest pauses

2012-08-28 Thread Voth, Brad (GE Corporate)
Thanks for the insight on the merge config.  That bit makes a bit more sense to 
me now. 

Though in digging more it looks like my primary issue is less to do with the 
merging and more to do with flushes from rambuffer - disk.  I've spent most of 
the morning digging through the lucene code to drum up where the flushes happen 
and how/why the flushing blocks indexing and begins only flushing with a single 
thread.  If anyone has any insight there it would be much appreciated.  

In the meantime while I'm doing some .java file reading I'm working on tuning 
back my producers to see if the flushing is able to keep up at some point.

From: Shawn Heisey [s...@elyograg.org]
Sent: Tuesday, August 28, 2012 1:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Injest pauses

On 8/27/2012 2:54 PM, Voth, Brad (GE Corporate) wrote:
 Hello all,

 I'm working on implementing a solution in a very high index rate, lower query 
 rate project.  We have a very small record size, 2 small strings, 6 longs, 7 
 ints, and 2 dates, indexing on 8 of those fields.  We need to be able to 
 maintain an injest rate of 50k records/sec average.  Through sharding and a 
 reasonable sized cluster we've hit most of our performance goals, but have 
 found that our producers tend to get hung on a shard that is doing a merge.  
 I've done a bit of digging and found some tips and hints on configuring the 
 merging, but have yet to get rid of the issue.

 What we see on the server hosting the shard during a problematic period is a 
 single cpu core at 100%, and very little IO activity on the disk and merge 
 messages in the logs.  This leads me to believe that a single merge thread is 
 blocking indexing from occuring.  When this happens our producers, which 
 distribute their updates amongst all the shards, pile up on this shard and 
 wait.  This causes our overall injest rate to plummet and is currently 
 keeping us from going to prod with the solution.

 The relevant portion of our solrconfig.xml is:

 indexConfig
 
  ramBufferSizeMB1024/ramBufferSizeMB
  mergeFactor20/mergeFactor
  mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=maxMergeAtOnce8/int
int name=segmentsPerTier20/int
  /mergePolicy
 mergeScheduler 
 class=org.apache.lucene.index.ConcurrentMergeScheduler
  int name=maxMergeCount10/int
  int name=maxThreadCount10/int
  /mergeScheduler
   autoCommit
 maxTime150/maxTime
 openSearcherfalse/openSearcher
   /autoCommit
 autoSoftCommit
   maxTime1/maxTime
 /autoSoftCommit

 
 /indexConfig

The mergeFactor configuration is no longer applicable if you are using
TieredMergePolicy, so that should be removed.

Your maxMergeAtOnce field looks low to me, it should probably be the
same as segmentsPerTier.  Here is what I have currently on my Solr 3.5
setup.  This emulates what a mergeFactor of 35 would have done on Solr
3.4 or lower.  The maxMergeAtOnceExplicit is three times the other
values so that three levels of merging can take place simultaneously:

mergePolicy class=org.apache.lucene.index.TieredMergePolicy
   int name=maxMergeAtOnce35/int
   int name=segmentsPerTier35/int
   int name=maxMergeAtOnceExplicit105/int
/mergePolicy

With a merge and thread count of 10, this is probably what you want:

mergePolicy class=org.apache.lucene.index.TieredMergePolicy
   int name=maxMergeAtOnce20/int
   int name=segmentsPerTier20/int
   int name=maxMergeAtOnceExplicit200/int
/mergePolicy

I admit that I am only taking a shot in the dark here, I have no idea
whether it will help or not.

Thanks,
Shawn