Re: Index optimize runs in background.

2015-06-11 Thread Walter Underwood
Why would you care when the forced merge (not an “optimize”) is done? Start it 
and get back to work.

Or even better, never force merge and let the algorithm take care of it. 
Seriously, I’ve been giving this advice since before Lucene was written, 
because Ultraseek had the same approach for managing index segments.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Jun 10, 2015, at 10:35 PM, Erick Erickson erickerick...@gmail.com wrote:

 If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
 sent out to each replica) should be sent in parallel and then
 each thread should wait for completion from the replicas. There
 is no real check for optimize, I believe that the return from the
 call is considered sufficient. If we can track down if there are
 conditions under which this is not true we can fix it.
 
 But until there's a way to reproduce it, it's pretty much speculation.
 
 Best,
 Erick
 
 On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather modather1...@gmail.com 
 wrote:
 Hi,
 
 There are 5 cores and a separate server for indexing on this solrcloud. Can
 you please share your suggestions on:
  How can indexer know that the optimize has completed even if the
 commit/optimize runs in background without going to the solr servers may be
 by using any solrj or other API?
 
 I tried but could not find any API/handler to check if the optimizations is
 completed. Kindly share your inputs.
 
 Thanks,
 Modassar
 
 On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 Can't get any failures to happen on my end so I really haven't a clue.
 
 Best,
 Erick
 
 On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com
 wrote:
 Hi,
 
 Please provide your inputs on optimize and commit running as background.
 Your suggestion will be really helpful.
 
 Thanks,
 Modassar
 
 On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
 wrote:
 
 Erick! I could not find any underlying setting of 10 minutes.
 It is not only optimize but commit is also behaving in the same fashion
 and is taking lesser time than usually had taken.
 As per my observation both are running in background.
 
 On Fri, May 29, 2015 at 7:21 PM, Erick Erickson 
 erickerick...@gmail.com
 wrote:
 
 I'm not talking about you setting a timeout, but the underlying
 connection timing out...
 
 The 10 minutes then the indexer exits comment points in that
 direction.
 
 Best,
 Erick
 
 On Thu, May 28, 2015 at 11:43 PM, Modassar Ather 
 modather1...@gmail.com
 wrote:
 I have not added any timeout in the indexer except zk client time out
 which
 is 30 seconds. I am simply calling client.close() at the end of
 indexing.
 The same code was not running in background for optimize with
 solr-4.10.3
 and org.apache.solr.client.solrj.impl.CloudSolrServer.
 
 On Fri, May 29, 2015 at 11:13 AM, Erick Erickson 
 erickerick...@gmail.com
 wrote:
 
 Are you timing out on the client request? The theory here is that
 it's
 still a synchronous call, but you're just timing out at the client
 level. At that point, the optimize is still running it's just the
 connection has been dropped
 
 Shot in the dark.
 Erick
 
 On Thu, May 28, 2015 at 10:31 PM, Modassar Ather 
 modather1...@gmail.com
 wrote:
 I could not notice it but with my past experience of commit which
 used to
 take around 2 minutes is now taking around 8 seconds. I think
 this is
 also
 running as background.
 
 On Fri, May 29, 2015 at 10:52 AM, Modassar Ather 
 modather1...@gmail.com
 
 wrote:
 
 The indexer takes almost 2 hours to optimize. It has a
 multi-threaded
 add
 of batches of documents to
 org.apache.solr.client.solrj.impl.CloudSolrClient.
 Once all the documents are indexed it invokes commit and
 optimize. I
 have
 seen that the optimize goes into background after 10 minutes and
 indexer
 exits.
 I am not sure why this 10 minutes it hangs on indexer. This
 behavior I
 have seen in multiple iteration of the indexing of same data.
 
 There is nothing significant I found in log which I can share. I
 can see
 following in log.
 org.apache.solr.update.DirectUpdateHandler2; start
 
 
 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 
 On Wed, May 27, 2015 at 10:59 PM, Erick Erickson 
 erickerick...@gmail.com
 wrote:
 
 All strange of course. What do your Solr logs show when this
 happens?
 And how reproducible is this?
 
 Best,
 Erick
 
 On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk
 wrote:
 In this case, optimising makes sense, once the index is
 generated,
 you
 are not updating It.
 
 Upayavira
 
 On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
 Our index has almost 100M documents running on SolrCloud of 5
 shards
 and
 each shard has an index size of about 170+GB (for the record,
 we are
 not
 using stored fields - our documents are pretty large). We
 perform a
 full
 indexing every weekend and during the week 

Re: Index optimize runs in background.

2015-06-11 Thread Upayavira
Until somewhere around Lucene 3.5, you needed to optimise, because the
merge strategy used wasn't that clever and left lots of deletes in your
largest segment. Around that point, the TieredMergePolicy became the
default. Because its algorithm is much more sophisticated, it took away
the need to optimize in the majority of scenarios. In fact, it
transformed optimizing from being a necessary thing to being a bad
thing in most cases.

So yes, let the algorithm take care of it, so long as you are using the
TieredMergePolicy, which has been the default for over 2 years.

Upayavira

On Thu, Jun 11, 2015, at 07:01 AM, Walter Underwood wrote:
 Why would you care when the forced merge (not an “optimize”) is done?
 Start it and get back to work.
 
 Or even better, never force merge and let the algorithm take care of it.
 Seriously, I’ve been giving this advice since before Lucene was written,
 because Ultraseek had the same approach for managing index segments.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
 
 On Jun 10, 2015, at 10:35 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
  sent out to each replica) should be sent in parallel and then
  each thread should wait for completion from the replicas. There
  is no real check for optimize, I believe that the return from the
  call is considered sufficient. If we can track down if there are
  conditions under which this is not true we can fix it.
  
  But until there's a way to reproduce it, it's pretty much speculation.
  
  Best,
  Erick
  
  On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather modather1...@gmail.com 
  wrote:
  Hi,
  
  There are 5 cores and a separate server for indexing on this solrcloud. Can
  you please share your suggestions on:
   How can indexer know that the optimize has completed even if the
  commit/optimize runs in background without going to the solr servers may be
  by using any solrj or other API?
  
  I tried but could not find any API/handler to check if the optimizations is
  completed. Kindly share your inputs.
  
  Thanks,
  Modassar
  
  On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com
  wrote:
  
  Can't get any failures to happen on my end so I really haven't a clue.
  
  Best,
  Erick
  
  On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com
  wrote:
  Hi,
  
  Please provide your inputs on optimize and commit running as background.
  Your suggestion will be really helpful.
  
  Thanks,
  Modassar
  
  On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
  wrote:
  
  Erick! I could not find any underlying setting of 10 minutes.
  It is not only optimize but commit is also behaving in the same fashion
  and is taking lesser time than usually had taken.
  As per my observation both are running in background.
  
  On Fri, May 29, 2015 at 7:21 PM, Erick Erickson 
  erickerick...@gmail.com
  wrote:
  
  I'm not talking about you setting a timeout, but the underlying
  connection timing out...
  
  The 10 minutes then the indexer exits comment points in that
  direction.
  
  Best,
  Erick
  
  On Thu, May 28, 2015 at 11:43 PM, Modassar Ather 
  modather1...@gmail.com
  wrote:
  I have not added any timeout in the indexer except zk client time out
  which
  is 30 seconds. I am simply calling client.close() at the end of
  indexing.
  The same code was not running in background for optimize with
  solr-4.10.3
  and org.apache.solr.client.solrj.impl.CloudSolrServer.
  
  On Fri, May 29, 2015 at 11:13 AM, Erick Erickson 
  erickerick...@gmail.com
  wrote:
  
  Are you timing out on the client request? The theory here is that
  it's
  still a synchronous call, but you're just timing out at the client
  level. At that point, the optimize is still running it's just the
  connection has been dropped
  
  Shot in the dark.
  Erick
  
  On Thu, May 28, 2015 at 10:31 PM, Modassar Ather 
  modather1...@gmail.com
  wrote:
  I could not notice it but with my past experience of commit which
  used to
  take around 2 minutes is now taking around 8 seconds. I think
  this is
  also
  running as background.
  
  On Fri, May 29, 2015 at 10:52 AM, Modassar Ather 
  modather1...@gmail.com
  
  wrote:
  
  The indexer takes almost 2 hours to optimize. It has a
  multi-threaded
  add
  of batches of documents to
  org.apache.solr.client.solrj.impl.CloudSolrClient.
  Once all the documents are indexed it invokes commit and
  optimize. I
  have
  seen that the optimize goes into background after 10 minutes and
  indexer
  exits.
  I am not sure why this 10 minutes it hangs on indexer. This
  behavior I
  have seen in multiple iteration of the indexing of same data.
  
  There is nothing significant I found in log which I can share. I
  can see
  following in log.
  org.apache.solr.update.DirectUpdateHandler2; start
  
  
  
  

Re: Index optimize runs in background.

2015-06-10 Thread Erick Erickson
If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
sent out to each replica) should be sent in parallel and then
each thread should wait for completion from the replicas. There
is no real check for optimize, I believe that the return from the
call is considered sufficient. If we can track down if there are
conditions under which this is not true we can fix it.

But until there's a way to reproduce it, it's pretty much speculation.

Best,
Erick

On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather modather1...@gmail.com wrote:
 Hi,

 There are 5 cores and a separate server for indexing on this solrcloud. Can
 you please share your suggestions on:
   How can indexer know that the optimize has completed even if the
 commit/optimize runs in background without going to the solr servers may be
 by using any solrj or other API?

 I tried but could not find any API/handler to check if the optimizations is
 completed. Kindly share your inputs.

 Thanks,
 Modassar

 On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Can't get any failures to happen on my end so I really haven't a clue.

 Best,
 Erick

 On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com
 wrote:
  Hi,
 
  Please provide your inputs on optimize and commit running as background.
  Your suggestion will be really helpful.
 
  Thanks,
  Modassar
 
  On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
  wrote:
 
  Erick! I could not find any underlying setting of 10 minutes.
  It is not only optimize but commit is also behaving in the same fashion
  and is taking lesser time than usually had taken.
  As per my observation both are running in background.
 
  On Fri, May 29, 2015 at 7:21 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  I'm not talking about you setting a timeout, but the underlying
  connection timing out...
 
  The 10 minutes then the indexer exits comment points in that
 direction.
 
  Best,
  Erick
 
  On Thu, May 28, 2015 at 11:43 PM, Modassar Ather 
 modather1...@gmail.com
  wrote:
   I have not added any timeout in the indexer except zk client time out
  which
   is 30 seconds. I am simply calling client.close() at the end of
  indexing.
   The same code was not running in background for optimize with
  solr-4.10.3
   and org.apache.solr.client.solrj.impl.CloudSolrServer.
  
   On Fri, May 29, 2015 at 11:13 AM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
   Are you timing out on the client request? The theory here is that
 it's
   still a synchronous call, but you're just timing out at the client
   level. At that point, the optimize is still running it's just the
   connection has been dropped
  
   Shot in the dark.
   Erick
  
   On Thu, May 28, 2015 at 10:31 PM, Modassar Ather 
  modather1...@gmail.com
   wrote:
I could not notice it but with my past experience of commit which
  used to
take around 2 minutes is now taking around 8 seconds. I think
 this is
   also
running as background.
   
On Fri, May 29, 2015 at 10:52 AM, Modassar Ather 
  modather1...@gmail.com
   
wrote:
   
The indexer takes almost 2 hours to optimize. It has a
  multi-threaded
   add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and
 optimize. I
   have
seen that the optimize goes into background after 10 minutes and
  indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This
  behavior I
have seen in multiple iteration of the indexing of same data.
   
There is nothing significant I found in log which I can share. I
  can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start
   
  
 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
   
On Wed, May 27, 2015 at 10:59 PM, Erick Erickson 
   erickerick...@gmail.com
wrote:
   
All strange of course. What do your Solr logs show when this
  happens?
And how reproducible is this?
   
Best,
Erick
   
On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk
 wrote:
 In this case, optimising makes sense, once the index is
  generated,
   you
 are not updating It.

 Upayavira

 On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
 Our index has almost 100M documents running on SolrCloud of 5
  shards
and
 each shard has an index size of about 170+GB (for the record,
  we are
not
 using stored fields - our documents are pretty large). We
  perform a
full
 indexing every weekend and during the week there are no
 updates
   made to
 the
 index. Most of the queries that we run are pretty complex
 with
   hundreds
 of
 terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
  boosts
etc.
 and take many minutes to execute. A difference of 10-20% is
  also a
   big
  

Re: Index optimize runs in background.

2015-06-10 Thread Modassar Ather
Hi,

There are 5 cores and a separate server for indexing on this solrcloud. Can
you please share your suggestions on:
  How can indexer know that the optimize has completed even if the
commit/optimize runs in background without going to the solr servers may be
by using any solrj or other API?

I tried but could not find any API/handler to check if the optimizations is
completed. Kindly share your inputs.

Thanks,
Modassar

On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Can't get any failures to happen on my end so I really haven't a clue.

 Best,
 Erick

 On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com
 wrote:
  Hi,
 
  Please provide your inputs on optimize and commit running as background.
  Your suggestion will be really helpful.
 
  Thanks,
  Modassar
 
  On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
  wrote:
 
  Erick! I could not find any underlying setting of 10 minutes.
  It is not only optimize but commit is also behaving in the same fashion
  and is taking lesser time than usually had taken.
  As per my observation both are running in background.
 
  On Fri, May 29, 2015 at 7:21 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  I'm not talking about you setting a timeout, but the underlying
  connection timing out...
 
  The 10 minutes then the indexer exits comment points in that
 direction.
 
  Best,
  Erick
 
  On Thu, May 28, 2015 at 11:43 PM, Modassar Ather 
 modather1...@gmail.com
  wrote:
   I have not added any timeout in the indexer except zk client time out
  which
   is 30 seconds. I am simply calling client.close() at the end of
  indexing.
   The same code was not running in background for optimize with
  solr-4.10.3
   and org.apache.solr.client.solrj.impl.CloudSolrServer.
  
   On Fri, May 29, 2015 at 11:13 AM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
   Are you timing out on the client request? The theory here is that
 it's
   still a synchronous call, but you're just timing out at the client
   level. At that point, the optimize is still running it's just the
   connection has been dropped
  
   Shot in the dark.
   Erick
  
   On Thu, May 28, 2015 at 10:31 PM, Modassar Ather 
  modather1...@gmail.com
   wrote:
I could not notice it but with my past experience of commit which
  used to
take around 2 minutes is now taking around 8 seconds. I think
 this is
   also
running as background.
   
On Fri, May 29, 2015 at 10:52 AM, Modassar Ather 
  modather1...@gmail.com
   
wrote:
   
The indexer takes almost 2 hours to optimize. It has a
  multi-threaded
   add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and
 optimize. I
   have
seen that the optimize goes into background after 10 minutes and
  indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This
  behavior I
have seen in multiple iteration of the indexing of same data.
   
There is nothing significant I found in log which I can share. I
  can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start
   
  
 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
   
On Wed, May 27, 2015 at 10:59 PM, Erick Erickson 
   erickerick...@gmail.com
wrote:
   
All strange of course. What do your Solr logs show when this
  happens?
And how reproducible is this?
   
Best,
Erick
   
On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk
 wrote:
 In this case, optimising makes sense, once the index is
  generated,
   you
 are not updating It.

 Upayavira

 On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
 Our index has almost 100M documents running on SolrCloud of 5
  shards
and
 each shard has an index size of about 170+GB (for the record,
  we are
not
 using stored fields - our documents are pretty large). We
  perform a
full
 indexing every weekend and during the week there are no
 updates
   made to
 the
 index. Most of the queries that we run are pretty complex
 with
   hundreds
 of
 terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
  boosts
etc.
 and take many minutes to execute. A difference of 10-20% is
  also a
   big
 advantage for us.

 We have been optimizing the index after indexing for years
 and
  it
   has
 worked well for us. Every once in a while, we upgrade Solr to
  the
latest
 version and try without optimizing so that we can save the
 many
   hours
it
 take to optimize such a huge index, but find optimized index
  work
   well
 for
 us.

 Erick I was indexing today the documents and saw the optimize
   happening
 in
 background.

 On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 

Re: Index optimize runs in background.

2015-06-04 Thread Erick Erickson
Can't get any failures to happen on my end so I really haven't a clue.

Best,
Erick

On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com wrote:
 Hi,

 Please provide your inputs on optimize and commit running as background.
 Your suggestion will be really helpful.

 Thanks,
 Modassar

 On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
 wrote:

 Erick! I could not find any underlying setting of 10 minutes.
 It is not only optimize but commit is also behaving in the same fashion
 and is taking lesser time than usually had taken.
 As per my observation both are running in background.

 On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 I'm not talking about you setting a timeout, but the underlying
 connection timing out...

 The 10 minutes then the indexer exits comment points in that direction.

 Best,
 Erick

 On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com
 wrote:
  I have not added any timeout in the indexer except zk client time out
 which
  is 30 seconds. I am simply calling client.close() at the end of
 indexing.
  The same code was not running in background for optimize with
 solr-4.10.3
  and org.apache.solr.client.solrj.impl.CloudSolrServer.
 
  On Fri, May 29, 2015 at 11:13 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Are you timing out on the client request? The theory here is that it's
  still a synchronous call, but you're just timing out at the client
  level. At that point, the optimize is still running it's just the
  connection has been dropped
 
  Shot in the dark.
  Erick
 
  On Thu, May 28, 2015 at 10:31 PM, Modassar Ather 
 modather1...@gmail.com
  wrote:
   I could not notice it but with my past experience of commit which
 used to
   take around 2 minutes is now taking around 8 seconds. I think this is
  also
   running as background.
  
   On Fri, May 29, 2015 at 10:52 AM, Modassar Ather 
 modather1...@gmail.com
  
   wrote:
  
   The indexer takes almost 2 hours to optimize. It has a
 multi-threaded
  add
   of batches of documents to
   org.apache.solr.client.solrj.impl.CloudSolrClient.
   Once all the documents are indexed it invokes commit and optimize. I
  have
   seen that the optimize goes into background after 10 minutes and
 indexer
   exits.
   I am not sure why this 10 minutes it hangs on indexer. This
 behavior I
   have seen in multiple iteration of the indexing of same data.
  
   There is nothing significant I found in log which I can share. I
 can see
   following in log.
   org.apache.solr.update.DirectUpdateHandler2; start
  
 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
  
   On Wed, May 27, 2015 at 10:59 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
   All strange of course. What do your Solr logs show when this
 happens?
   And how reproducible is this?
  
   Best,
   Erick
  
   On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is
 generated,
  you
are not updating It.
   
Upayavira
   
On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5
 shards
   and
each shard has an index size of about 170+GB (for the record,
 we are
   not
using stored fields - our documents are pretty large). We
 perform a
   full
indexing every weekend and during the week there are no updates
  made to
the
index. Most of the queries that we run are pretty complex with
  hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
 boosts
   etc.
and take many minutes to execute. A difference of 10-20% is
 also a
  big
advantage for us.
   
We have been optimizing the index after indexing for years and
 it
  has
worked well for us. Every once in a while, we upgrade Solr to
 the
   latest
version and try without optimizing so that we can save the many
  hours
   it
take to optimize such a huge index, but find optimized index
 work
  well
for
us.
   
Erick I was indexing today the documents and saw the optimize
  happening
in
background.
   
On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 
   erickerick...@gmail.com
wrote:
   
 No results yet. I finished the test harness last night (not
  really a
 unit test, a stand-alone program that endlessly adds stuff and
  tests
 that every commit returns the correct number of docs).

 8,000 cycles later there aren't any problems reported.

 Siiigh.


 On Tue, May 26, 2015 at 1:51 AM, Modassar Ather 
   modather1...@gmail.com
 wrote:
  Hi,
 
  Erick you mentioned about a unit test to test the optimize
  running
   in
  background. Kindly share your findings if any.
 
  Thanks,
  Modassar
 
  On Mon, May 25, 2015 at 11:47 AM, 

Re: Index optimize runs in background.

2015-06-04 Thread Modassar Ather
Hi,

Please provide your inputs on optimize and commit running as background.
Your suggestion will be really helpful.

Thanks,
Modassar

On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
wrote:

 Erick! I could not find any underlying setting of 10 minutes.
 It is not only optimize but commit is also behaving in the same fashion
 and is taking lesser time than usually had taken.
 As per my observation both are running in background.

 On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 I'm not talking about you setting a timeout, but the underlying
 connection timing out...

 The 10 minutes then the indexer exits comment points in that direction.

 Best,
 Erick

 On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com
 wrote:
  I have not added any timeout in the indexer except zk client time out
 which
  is 30 seconds. I am simply calling client.close() at the end of
 indexing.
  The same code was not running in background for optimize with
 solr-4.10.3
  and org.apache.solr.client.solrj.impl.CloudSolrServer.
 
  On Fri, May 29, 2015 at 11:13 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Are you timing out on the client request? The theory here is that it's
  still a synchronous call, but you're just timing out at the client
  level. At that point, the optimize is still running it's just the
  connection has been dropped
 
  Shot in the dark.
  Erick
 
  On Thu, May 28, 2015 at 10:31 PM, Modassar Ather 
 modather1...@gmail.com
  wrote:
   I could not notice it but with my past experience of commit which
 used to
   take around 2 minutes is now taking around 8 seconds. I think this is
  also
   running as background.
  
   On Fri, May 29, 2015 at 10:52 AM, Modassar Ather 
 modather1...@gmail.com
  
   wrote:
  
   The indexer takes almost 2 hours to optimize. It has a
 multi-threaded
  add
   of batches of documents to
   org.apache.solr.client.solrj.impl.CloudSolrClient.
   Once all the documents are indexed it invokes commit and optimize. I
  have
   seen that the optimize goes into background after 10 minutes and
 indexer
   exits.
   I am not sure why this 10 minutes it hangs on indexer. This
 behavior I
   have seen in multiple iteration of the indexing of same data.
  
   There is nothing significant I found in log which I can share. I
 can see
   following in log.
   org.apache.solr.update.DirectUpdateHandler2; start
  
 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
  
   On Wed, May 27, 2015 at 10:59 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
   All strange of course. What do your Solr logs show when this
 happens?
   And how reproducible is this?
  
   Best,
   Erick
  
   On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is
 generated,
  you
are not updating It.
   
Upayavira
   
On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5
 shards
   and
each shard has an index size of about 170+GB (for the record,
 we are
   not
using stored fields - our documents are pretty large). We
 perform a
   full
indexing every weekend and during the week there are no updates
  made to
the
index. Most of the queries that we run are pretty complex with
  hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
 boosts
   etc.
and take many minutes to execute. A difference of 10-20% is
 also a
  big
advantage for us.
   
We have been optimizing the index after indexing for years and
 it
  has
worked well for us. Every once in a while, we upgrade Solr to
 the
   latest
version and try without optimizing so that we can save the many
  hours
   it
take to optimize such a huge index, but find optimized index
 work
  well
for
us.
   
Erick I was indexing today the documents and saw the optimize
  happening
in
background.
   
On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 
   erickerick...@gmail.com
wrote:
   
 No results yet. I finished the test harness last night (not
  really a
 unit test, a stand-alone program that endlessly adds stuff and
  tests
 that every commit returns the correct number of docs).

 8,000 cycles later there aren't any problems reported.

 Siiigh.


 On Tue, May 26, 2015 at 1:51 AM, Modassar Ather 
   modather1...@gmail.com
 wrote:
  Hi,
 
  Erick you mentioned about a unit test to test the optimize
  running
   in
  background. Kindly share your findings if any.
 
  Thanks,
  Modassar
 
  On Mon, May 25, 2015 at 11:47 AM, Modassar Ather 
   modather1...@gmail.com
 
  wrote:
 
  Thanks everybody for your replies.
 
  I have noticed the optimization running in background 

Re: Index optimize runs in background.

2015-06-02 Thread Modassar Ather
Erick! I could not find any underlying setting of 10 minutes.
It is not only optimize but commit is also behaving in the same fashion and
is taking lesser time than usually had taken.
As per my observation both are running in background.

On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com
wrote:

 I'm not talking about you setting a timeout, but the underlying
 connection timing out...

 The 10 minutes then the indexer exits comment points in that direction.

 Best,
 Erick

 On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com
 wrote:
  I have not added any timeout in the indexer except zk client time out
 which
  is 30 seconds. I am simply calling client.close() at the end of indexing.
  The same code was not running in background for optimize with solr-4.10.3
  and org.apache.solr.client.solrj.impl.CloudSolrServer.
 
  On Fri, May 29, 2015 at 11:13 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Are you timing out on the client request? The theory here is that it's
  still a synchronous call, but you're just timing out at the client
  level. At that point, the optimize is still running it's just the
  connection has been dropped
 
  Shot in the dark.
  Erick
 
  On Thu, May 28, 2015 at 10:31 PM, Modassar Ather 
 modather1...@gmail.com
  wrote:
   I could not notice it but with my past experience of commit which
 used to
   take around 2 minutes is now taking around 8 seconds. I think this is
  also
   running as background.
  
   On Fri, May 29, 2015 at 10:52 AM, Modassar Ather 
 modather1...@gmail.com
  
   wrote:
  
   The indexer takes almost 2 hours to optimize. It has a multi-threaded
  add
   of batches of documents to
   org.apache.solr.client.solrj.impl.CloudSolrClient.
   Once all the documents are indexed it invokes commit and optimize. I
  have
   seen that the optimize goes into background after 10 minutes and
 indexer
   exits.
   I am not sure why this 10 minutes it hangs on indexer. This behavior
 I
   have seen in multiple iteration of the indexing of same data.
  
   There is nothing significant I found in log which I can share. I can
 see
   following in log.
   org.apache.solr.update.DirectUpdateHandler2; start
  
 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
  
   On Wed, May 27, 2015 at 10:59 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
   All strange of course. What do your Solr logs show when this
 happens?
   And how reproducible is this?
  
   Best,
   Erick
  
   On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is generated,
  you
are not updating It.
   
Upayavira
   
On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5
 shards
   and
each shard has an index size of about 170+GB (for the record, we
 are
   not
using stored fields - our documents are pretty large). We
 perform a
   full
indexing every weekend and during the week there are no updates
  made to
the
index. Most of the queries that we run are pretty complex with
  hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
 boosts
   etc.
and take many minutes to execute. A difference of 10-20% is also
 a
  big
advantage for us.
   
We have been optimizing the index after indexing for years and it
  has
worked well for us. Every once in a while, we upgrade Solr to the
   latest
version and try without optimizing so that we can save the many
  hours
   it
take to optimize such a huge index, but find optimized index work
  well
for
us.
   
Erick I was indexing today the documents and saw the optimize
  happening
in
background.
   
On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 
   erickerick...@gmail.com
wrote:
   
 No results yet. I finished the test harness last night (not
  really a
 unit test, a stand-alone program that endlessly adds stuff and
  tests
 that every commit returns the correct number of docs).

 8,000 cycles later there aren't any problems reported.

 Siiigh.


 On Tue, May 26, 2015 at 1:51 AM, Modassar Ather 
   modather1...@gmail.com
 wrote:
  Hi,
 
  Erick you mentioned about a unit test to test the optimize
  running
   in
  background. Kindly share your findings if any.
 
  Thanks,
  Modassar
 
  On Mon, May 25, 2015 at 11:47 AM, Modassar Ather 
   modather1...@gmail.com
 
  wrote:
 
  Thanks everybody for your replies.
 
  I have noticed the optimization running in background every
  time I
  indexed. This is 5 node cluster with solr-5.1.0 and uses the
  CloudSolrClient. Kindly share your findings on this issue.
 
  Our index has almost 100M documents running on SolrCloud. We
  have
   

Re: Index optimize runs in background.

2015-05-29 Thread Modassar Ather
I have not added any timeout in the indexer except zk client time out which
is 30 seconds. I am simply calling client.close() at the end of indexing.
The same code was not running in background for optimize with solr-4.10.3
and org.apache.solr.client.solrj.impl.CloudSolrServer.

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Are you timing out on the client request? The theory here is that it's
 still a synchronous call, but you're just timing out at the client
 level. At that point, the optimize is still running it's just the
 connection has been dropped

 Shot in the dark.
 Erick

 On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com
 wrote:
  I could not notice it but with my past experience of commit which used to
  take around 2 minutes is now taking around 8 seconds. I think this is
 also
  running as background.
 
  On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com
 
  wrote:
 
  The indexer takes almost 2 hours to optimize. It has a multi-threaded
 add
  of batches of documents to
  org.apache.solr.client.solrj.impl.CloudSolrClient.
  Once all the documents are indexed it invokes commit and optimize. I
 have
  seen that the optimize goes into background after 10 minutes and indexer
  exits.
  I am not sure why this 10 minutes it hangs on indexer. This behavior I
  have seen in multiple iteration of the indexing of same data.
 
  There is nothing significant I found in log which I can share. I can see
  following in log.
  org.apache.solr.update.DirectUpdateHandler2; start
 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 
  On Wed, May 27, 2015 at 10:59 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  All strange of course. What do your Solr logs show when this happens?
  And how reproducible is this?
 
  Best,
  Erick
 
  On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
   In this case, optimising makes sense, once the index is generated,
 you
   are not updating It.
  
   Upayavira
  
   On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
   Our index has almost 100M documents running on SolrCloud of 5 shards
  and
   each shard has an index size of about 170+GB (for the record, we are
  not
   using stored fields - our documents are pretty large). We perform a
  full
   indexing every weekend and during the week there are no updates
 made to
   the
   index. Most of the queries that we run are pretty complex with
 hundreds
   of
   terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
  etc.
   and take many minutes to execute. A difference of 10-20% is also a
 big
   advantage for us.
  
   We have been optimizing the index after indexing for years and it
 has
   worked well for us. Every once in a while, we upgrade Solr to the
  latest
   version and try without optimizing so that we can save the many
 hours
  it
   take to optimize such a huge index, but find optimized index work
 well
   for
   us.
  
   Erick I was indexing today the documents and saw the optimize
 happening
   in
   background.
  
   On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
No results yet. I finished the test harness last night (not
 really a
unit test, a stand-alone program that endlessly adds stuff and
 tests
that every commit returns the correct number of docs).
   
8,000 cycles later there aren't any problems reported.
   
Siiigh.
   
   
On Tue, May 26, 2015 at 1:51 AM, Modassar Ather 
  modather1...@gmail.com
wrote:
 Hi,

 Erick you mentioned about a unit test to test the optimize
 running
  in
 background. Kindly share your findings if any.

 Thanks,
 Modassar

 On Mon, May 25, 2015 at 11:47 AM, Modassar Ather 
  modather1...@gmail.com

 wrote:

 Thanks everybody for your replies.

 I have noticed the optimization running in background every
 time I
 indexed. This is 5 node cluster with solr-5.1.0 and uses the
 CloudSolrClient. Kindly share your findings on this issue.

 Our index has almost 100M documents running on SolrCloud. We
 have
  been
 optimizing the index after indexing for years and it has worked
  well for
 us.

 Thanks,
 Modassar

 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
erickerick...@gmail.com
 wrote:

 Actually, I've recently seen very similar behavior in Solr
  4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I
  can't
 reproduce this at will, sii.

 A unit test should be very simple to write though, maybe I can
  get to
it
 today.

 Erick



 On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
  wrote:
 
 
  On Fri, May 22, 2015, at 03:55 PM, Shawn 

Re: Index optimize runs in background.

2015-05-29 Thread Erick Erickson
I'm not talking about you setting a timeout, but the underlying
connection timing out...

The 10 minutes then the indexer exits comment points in that direction.

Best,
Erick

On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote:
 I have not added any timeout in the indexer except zk client time out which
 is 30 seconds. I am simply calling client.close() at the end of indexing.
 The same code was not running in background for optimize with solr-4.10.3
 and org.apache.solr.client.solrj.impl.CloudSolrServer.

 On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 Are you timing out on the client request? The theory here is that it's
 still a synchronous call, but you're just timing out at the client
 level. At that point, the optimize is still running it's just the
 connection has been dropped

 Shot in the dark.
 Erick

 On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com
 wrote:
  I could not notice it but with my past experience of commit which used to
  take around 2 minutes is now taking around 8 seconds. I think this is
 also
  running as background.
 
  On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com
 
  wrote:
 
  The indexer takes almost 2 hours to optimize. It has a multi-threaded
 add
  of batches of documents to
  org.apache.solr.client.solrj.impl.CloudSolrClient.
  Once all the documents are indexed it invokes commit and optimize. I
 have
  seen that the optimize goes into background after 10 minutes and indexer
  exits.
  I am not sure why this 10 minutes it hangs on indexer. This behavior I
  have seen in multiple iteration of the indexing of same data.
 
  There is nothing significant I found in log which I can share. I can see
  following in log.
  org.apache.solr.update.DirectUpdateHandler2; start
 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 
  On Wed, May 27, 2015 at 10:59 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  All strange of course. What do your Solr logs show when this happens?
  And how reproducible is this?
 
  Best,
  Erick
 
  On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
   In this case, optimising makes sense, once the index is generated,
 you
   are not updating It.
  
   Upayavira
  
   On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
   Our index has almost 100M documents running on SolrCloud of 5 shards
  and
   each shard has an index size of about 170+GB (for the record, we are
  not
   using stored fields - our documents are pretty large). We perform a
  full
   indexing every weekend and during the week there are no updates
 made to
   the
   index. Most of the queries that we run are pretty complex with
 hundreds
   of
   terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
  etc.
   and take many minutes to execute. A difference of 10-20% is also a
 big
   advantage for us.
  
   We have been optimizing the index after indexing for years and it
 has
   worked well for us. Every once in a while, we upgrade Solr to the
  latest
   version and try without optimizing so that we can save the many
 hours
  it
   take to optimize such a huge index, but find optimized index work
 well
   for
   us.
  
   Erick I was indexing today the documents and saw the optimize
 happening
   in
   background.
  
   On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
No results yet. I finished the test harness last night (not
 really a
unit test, a stand-alone program that endlessly adds stuff and
 tests
that every commit returns the correct number of docs).
   
8,000 cycles later there aren't any problems reported.
   
Siiigh.
   
   
On Tue, May 26, 2015 at 1:51 AM, Modassar Ather 
  modather1...@gmail.com
wrote:
 Hi,

 Erick you mentioned about a unit test to test the optimize
 running
  in
 background. Kindly share your findings if any.

 Thanks,
 Modassar

 On Mon, May 25, 2015 at 11:47 AM, Modassar Ather 
  modather1...@gmail.com

 wrote:

 Thanks everybody for your replies.

 I have noticed the optimization running in background every
 time I
 indexed. This is 5 node cluster with solr-5.1.0 and uses the
 CloudSolrClient. Kindly share your findings on this issue.

 Our index has almost 100M documents running on SolrCloud. We
 have
  been
 optimizing the index after indexing for years and it has worked
  well for
 us.

 Thanks,
 Modassar

 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
erickerick...@gmail.com
 wrote:

 Actually, I've recently seen very similar behavior in Solr
  4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I
  can't
 reproduce this at will, sii.

   

Re: Index optimize runs in background.

2015-05-28 Thread Modassar Ather
The indexer takes almost 2 hours to optimize. It has a multi-threaded add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I have
seen that the optimize goes into background after 10 minutes and indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior I have
seen in multiple iteration of the indexing of same data.

There is nothing significant I found in log which I can share. I can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com
wrote:

 All strange of course. What do your Solr logs show when this happens?
 And how reproducible is this?

 Best,
 Erick

 On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
  In this case, optimising makes sense, once the index is generated, you
  are not updating It.
 
  Upayavira
 
  On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
  Our index has almost 100M documents running on SolrCloud of 5 shards and
  each shard has an index size of about 170+GB (for the record, we are not
  using stored fields - our documents are pretty large). We perform a full
  indexing every weekend and during the week there are no updates made to
  the
  index. Most of the queries that we run are pretty complex with hundreds
  of
  terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
  and take many minutes to execute. A difference of 10-20% is also a big
  advantage for us.
 
  We have been optimizing the index after indexing for years and it has
  worked well for us. Every once in a while, we upgrade Solr to the latest
  version and try without optimizing so that we can save the many hours it
  take to optimize such a huge index, but find optimized index work well
  for
  us.
 
  Erick I was indexing today the documents and saw the optimize happening
  in
  background.
 
  On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   No results yet. I finished the test harness last night (not really a
   unit test, a stand-alone program that endlessly adds stuff and tests
   that every commit returns the correct number of docs).
  
   8,000 cycles later there aren't any problems reported.
  
   Siiigh.
  
  
   On Tue, May 26, 2015 at 1:51 AM, Modassar Ather 
 modather1...@gmail.com
   wrote:
Hi,
   
Erick you mentioned about a unit test to test the optimize running
 in
background. Kindly share your findings if any.
   
Thanks,
Modassar
   
On Mon, May 25, 2015 at 11:47 AM, Modassar Ather 
 modather1...@gmail.com
   
wrote:
   
Thanks everybody for your replies.
   
I have noticed the optimization running in background every time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.
   
Our index has almost 100M documents running on SolrCloud. We have
 been
optimizing the index after indexing for years and it has worked
 well for
us.
   
Thanks,
Modassar
   
On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
   erickerick...@gmail.com
wrote:
   
Actually, I've recently seen very similar behavior in Solr
 4.10.3, but
involving hard commits openSearcher=true, see:
https://issues.apache.org/jira/browse/SOLR-7572. Of course I
 can't
reproduce this at will, sii.
   
A unit test should be very simple to write though, maybe I can
 get to
   it
today.
   
Erick
   
   
   
On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
 wrote:


 On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
 On 5/21/2015 6:21 AM, Modassar Ather wrote:
  I am using Solr-5.1.0. I have an indexer class which invokes
  cloudSolrClient.optimize(true, true, 1). My indexer exits
 after
   the
  invocation of optimize and the optimization keeps on running
 in
   the
  background.
  Kindly let me know if it is per design and how can I make my
   indexer
to
  wait until the optimization is over. Is there a
configuration/parameter I
  need to set for the same.
 
  Please note that the same indexer with
cloudSolrServer.optimize(true, true,
  1) on Solr-4.10 used to wait till the optimize was over
 before
exiting.

 This is very odd, because I could not get HttpSolrServer to
   optimize in
 the background, even when that was what I wanted.

 I wondered if maybe the Cloud object behaves differently with
   regard to
 blocking until an optimize is finished ... except that there
 is no
   code
 for optimizing in CloudSolrClient at all ... so I don't know
 where
   the
 different behavior would actually be happening.

 A more important 

Re: Index optimize runs in background.

2015-05-28 Thread Erick Erickson
Are you timing out on the client request? The theory here is that it's
still a synchronous call, but you're just timing out at the client
level. At that point, the optimize is still running it's just the
connection has been dropped

Shot in the dark.
Erick

On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote:
 I could not notice it but with my past experience of commit which used to
 take around 2 minutes is now taking around 8 seconds. I think this is also
 running as background.

 On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com
 wrote:

 The indexer takes almost 2 hours to optimize. It has a multi-threaded add
 of batches of documents to
 org.apache.solr.client.solrj.impl.CloudSolrClient.
 Once all the documents are indexed it invokes commit and optimize. I have
 seen that the optimize goes into background after 10 minutes and indexer
 exits.
 I am not sure why this 10 minutes it hangs on indexer. This behavior I
 have seen in multiple iteration of the indexing of same data.

 There is nothing significant I found in log which I can share. I can see
 following in log.
 org.apache.solr.update.DirectUpdateHandler2; start
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

 On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 All strange of course. What do your Solr logs show when this happens?
 And how reproducible is this?

 Best,
 Erick

 On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
  In this case, optimising makes sense, once the index is generated, you
  are not updating It.
 
  Upayavira
 
  On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
  Our index has almost 100M documents running on SolrCloud of 5 shards
 and
  each shard has an index size of about 170+GB (for the record, we are
 not
  using stored fields - our documents are pretty large). We perform a
 full
  indexing every weekend and during the week there are no updates made to
  the
  index. Most of the queries that we run are pretty complex with hundreds
  of
  terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
 etc.
  and take many minutes to execute. A difference of 10-20% is also a big
  advantage for us.
 
  We have been optimizing the index after indexing for years and it has
  worked well for us. Every once in a while, we upgrade Solr to the
 latest
  version and try without optimizing so that we can save the many hours
 it
  take to optimize such a huge index, but find optimized index work well
  for
  us.
 
  Erick I was indexing today the documents and saw the optimize happening
  in
  background.
 
  On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   No results yet. I finished the test harness last night (not really a
   unit test, a stand-alone program that endlessly adds stuff and tests
   that every commit returns the correct number of docs).
  
   8,000 cycles later there aren't any problems reported.
  
   Siiigh.
  
  
   On Tue, May 26, 2015 at 1:51 AM, Modassar Ather 
 modather1...@gmail.com
   wrote:
Hi,
   
Erick you mentioned about a unit test to test the optimize running
 in
background. Kindly share your findings if any.
   
Thanks,
Modassar
   
On Mon, May 25, 2015 at 11:47 AM, Modassar Ather 
 modather1...@gmail.com
   
wrote:
   
Thanks everybody for your replies.
   
I have noticed the optimization running in background every time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.
   
Our index has almost 100M documents running on SolrCloud. We have
 been
optimizing the index after indexing for years and it has worked
 well for
us.
   
Thanks,
Modassar
   
On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
   erickerick...@gmail.com
wrote:
   
Actually, I've recently seen very similar behavior in Solr
 4.10.3, but
involving hard commits openSearcher=true, see:
https://issues.apache.org/jira/browse/SOLR-7572. Of course I
 can't
reproduce this at will, sii.
   
A unit test should be very simple to write though, maybe I can
 get to
   it
today.
   
Erick
   
   
   
On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
 wrote:


 On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
 On 5/21/2015 6:21 AM, Modassar Ather wrote:
  I am using Solr-5.1.0. I have an indexer class which invokes
  cloudSolrClient.optimize(true, true, 1). My indexer exits
 after
   the
  invocation of optimize and the optimization keeps on
 running in
   the
  background.
  Kindly let me know if it is per design and how can I make my
   indexer
to
  wait until the optimization is over. Is there a
configuration/parameter I
  need to set for the same.
 
  Please note 

Re: Index optimize runs in background.

2015-05-28 Thread Modassar Ather
I could not notice it but with my past experience of commit which used to
take around 2 minutes is now taking around 8 seconds. I think this is also
running as background.

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com
wrote:

 The indexer takes almost 2 hours to optimize. It has a multi-threaded add
 of batches of documents to
 org.apache.solr.client.solrj.impl.CloudSolrClient.
 Once all the documents are indexed it invokes commit and optimize. I have
 seen that the optimize goes into background after 10 minutes and indexer
 exits.
 I am not sure why this 10 minutes it hangs on indexer. This behavior I
 have seen in multiple iteration of the indexing of same data.

 There is nothing significant I found in log which I can share. I can see
 following in log.
 org.apache.solr.update.DirectUpdateHandler2; start
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

 On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 All strange of course. What do your Solr logs show when this happens?
 And how reproducible is this?

 Best,
 Erick

 On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
  In this case, optimising makes sense, once the index is generated, you
  are not updating It.
 
  Upayavira
 
  On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
  Our index has almost 100M documents running on SolrCloud of 5 shards
 and
  each shard has an index size of about 170+GB (for the record, we are
 not
  using stored fields - our documents are pretty large). We perform a
 full
  indexing every weekend and during the week there are no updates made to
  the
  index. Most of the queries that we run are pretty complex with hundreds
  of
  terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
 etc.
  and take many minutes to execute. A difference of 10-20% is also a big
  advantage for us.
 
  We have been optimizing the index after indexing for years and it has
  worked well for us. Every once in a while, we upgrade Solr to the
 latest
  version and try without optimizing so that we can save the many hours
 it
  take to optimize such a huge index, but find optimized index work well
  for
  us.
 
  Erick I was indexing today the documents and saw the optimize happening
  in
  background.
 
  On Tue, May 26, 2015 at 9:12 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   No results yet. I finished the test harness last night (not really a
   unit test, a stand-alone program that endlessly adds stuff and tests
   that every commit returns the correct number of docs).
  
   8,000 cycles later there aren't any problems reported.
  
   Siiigh.
  
  
   On Tue, May 26, 2015 at 1:51 AM, Modassar Ather 
 modather1...@gmail.com
   wrote:
Hi,
   
Erick you mentioned about a unit test to test the optimize running
 in
background. Kindly share your findings if any.
   
Thanks,
Modassar
   
On Mon, May 25, 2015 at 11:47 AM, Modassar Ather 
 modather1...@gmail.com
   
wrote:
   
Thanks everybody for your replies.
   
I have noticed the optimization running in background every time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.
   
Our index has almost 100M documents running on SolrCloud. We have
 been
optimizing the index after indexing for years and it has worked
 well for
us.
   
Thanks,
Modassar
   
On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
   erickerick...@gmail.com
wrote:
   
Actually, I've recently seen very similar behavior in Solr
 4.10.3, but
involving hard commits openSearcher=true, see:
https://issues.apache.org/jira/browse/SOLR-7572. Of course I
 can't
reproduce this at will, sii.
   
A unit test should be very simple to write though, maybe I can
 get to
   it
today.
   
Erick
   
   
   
On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
 wrote:


 On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
 On 5/21/2015 6:21 AM, Modassar Ather wrote:
  I am using Solr-5.1.0. I have an indexer class which invokes
  cloudSolrClient.optimize(true, true, 1). My indexer exits
 after
   the
  invocation of optimize and the optimization keeps on
 running in
   the
  background.
  Kindly let me know if it is per design and how can I make my
   indexer
to
  wait until the optimization is over. Is there a
configuration/parameter I
  need to set for the same.
 
  Please note that the same indexer with
cloudSolrServer.optimize(true, true,
  1) on Solr-4.10 used to wait till the optimize was over
 before
exiting.

 This is very odd, because I could not get HttpSolrServer to
   optimize in
 the background, even when that was what I wanted.

 I wondered if maybe the Cloud object behaves 

Re: Index optimize runs in background.

2015-05-27 Thread Upayavira
In this case, optimising makes sense, once the index is generated, you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
 Our index has almost 100M documents running on SolrCloud of 5 shards and
 each shard has an index size of about 170+GB (for the record, we are not
 using stored fields - our documents are pretty large). We perform a full
 indexing every weekend and during the week there are no updates made to
 the
 index. Most of the queries that we run are pretty complex with hundreds
 of
 terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
 and take many minutes to execute. A difference of 10-20% is also a big
 advantage for us.
 
 We have been optimizing the index after indexing for years and it has
 worked well for us. Every once in a while, we upgrade Solr to the latest
 version and try without optimizing so that we can save the many hours it
 take to optimize such a huge index, but find optimized index work well
 for
 us.
 
 Erick I was indexing today the documents and saw the optimize happening
 in
 background.
 
 On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  No results yet. I finished the test harness last night (not really a
  unit test, a stand-alone program that endlessly adds stuff and tests
  that every commit returns the correct number of docs).
 
  8,000 cycles later there aren't any problems reported.
 
  Siiigh.
 
 
  On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com
  wrote:
   Hi,
  
   Erick you mentioned about a unit test to test the optimize running in
   background. Kindly share your findings if any.
  
   Thanks,
   Modassar
  
   On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
  
   wrote:
  
   Thanks everybody for your replies.
  
   I have noticed the optimization running in background every time I
   indexed. This is 5 node cluster with solr-5.1.0 and uses the
   CloudSolrClient. Kindly share your findings on this issue.
  
   Our index has almost 100M documents running on SolrCloud. We have been
   optimizing the index after indexing for years and it has worked well for
   us.
  
   Thanks,
   Modassar
  
   On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
   Actually, I've recently seen very similar behavior in Solr 4.10.3, but
   involving hard commits openSearcher=true, see:
   https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
   reproduce this at will, sii.
  
   A unit test should be very simple to write though, maybe I can get to
  it
   today.
  
   Erick
  
  
  
   On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
   
   
On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
On 5/21/2015 6:21 AM, Modassar Ather wrote:
 I am using Solr-5.1.0. I have an indexer class which invokes
 cloudSolrClient.optimize(true, true, 1). My indexer exits after
  the
 invocation of optimize and the optimization keeps on running in
  the
 background.
 Kindly let me know if it is per design and how can I make my
  indexer
   to
 wait until the optimization is over. Is there a
   configuration/parameter I
 need to set for the same.

 Please note that the same indexer with
   cloudSolrServer.optimize(true, true,
 1) on Solr-4.10 used to wait till the optimize was over before
   exiting.
   
This is very odd, because I could not get HttpSolrServer to
  optimize in
the background, even when that was what I wanted.
   
I wondered if maybe the Cloud object behaves differently with
  regard to
blocking until an optimize is finished ... except that there is no
  code
for optimizing in CloudSolrClient at all ... so I don't know where
  the
different behavior would actually be happening.
   
A more important question is, why are you optimising? Generally it
  isn't
recommended anymore as it reduces the natural distribution of
  documents
amongst segments and makes future merges more costly.
   
Upayavira
  
  
  
 


Re: Index optimize runs in background.

2015-05-27 Thread Erick Erickson
All strange of course. What do your Solr logs show when this happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
 In this case, optimising makes sense, once the index is generated, you
 are not updating It.

 Upayavira

 On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
 Our index has almost 100M documents running on SolrCloud of 5 shards and
 each shard has an index size of about 170+GB (for the record, we are not
 using stored fields - our documents are pretty large). We perform a full
 indexing every weekend and during the week there are no updates made to
 the
 index. Most of the queries that we run are pretty complex with hundreds
 of
 terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
 and take many minutes to execute. A difference of 10-20% is also a big
 advantage for us.

 We have been optimizing the index after indexing for years and it has
 worked well for us. Every once in a while, we upgrade Solr to the latest
 version and try without optimizing so that we can save the many hours it
 take to optimize such a huge index, but find optimized index work well
 for
 us.

 Erick I was indexing today the documents and saw the optimize happening
 in
 background.

 On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  No results yet. I finished the test harness last night (not really a
  unit test, a stand-alone program that endlessly adds stuff and tests
  that every commit returns the correct number of docs).
 
  8,000 cycles later there aren't any problems reported.
 
  Siiigh.
 
 
  On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com
  wrote:
   Hi,
  
   Erick you mentioned about a unit test to test the optimize running in
   background. Kindly share your findings if any.
  
   Thanks,
   Modassar
  
   On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
  
   wrote:
  
   Thanks everybody for your replies.
  
   I have noticed the optimization running in background every time I
   indexed. This is 5 node cluster with solr-5.1.0 and uses the
   CloudSolrClient. Kindly share your findings on this issue.
  
   Our index has almost 100M documents running on SolrCloud. We have been
   optimizing the index after indexing for years and it has worked well for
   us.
  
   Thanks,
   Modassar
  
   On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
   Actually, I've recently seen very similar behavior in Solr 4.10.3, but
   involving hard commits openSearcher=true, see:
   https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
   reproduce this at will, sii.
  
   A unit test should be very simple to write though, maybe I can get to
  it
   today.
  
   Erick
  
  
  
   On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
   
   
On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
On 5/21/2015 6:21 AM, Modassar Ather wrote:
 I am using Solr-5.1.0. I have an indexer class which invokes
 cloudSolrClient.optimize(true, true, 1). My indexer exits after
  the
 invocation of optimize and the optimization keeps on running in
  the
 background.
 Kindly let me know if it is per design and how can I make my
  indexer
   to
 wait until the optimization is over. Is there a
   configuration/parameter I
 need to set for the same.

 Please note that the same indexer with
   cloudSolrServer.optimize(true, true,
 1) on Solr-4.10 used to wait till the optimize was over before
   exiting.
   
This is very odd, because I could not get HttpSolrServer to
  optimize in
the background, even when that was what I wanted.
   
I wondered if maybe the Cloud object behaves differently with
  regard to
blocking until an optimize is finished ... except that there is no
  code
for optimizing in CloudSolrClient at all ... so I don't know where
  the
different behavior would actually be happening.
   
A more important question is, why are you optimising? Generally it
  isn't
recommended anymore as it reduces the natural distribution of
  documents
amongst segments and makes future merges more costly.
   
Upayavira
  
  
  
 


Re: Index optimize runs in background.

2015-05-26 Thread Modassar Ather
Our index has almost 100M documents running on SolrCloud of 5 shards and
each shard has an index size of about 170+GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to the
index. Most of the queries that we run are pretty complex with hundreds of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the latest
version and try without optimizing so that we can save the many hours it
take to optimize such a huge index, but find optimized index work well for
us.

Erick I was indexing today the documents and saw the optimize happening in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com
wrote:

 No results yet. I finished the test harness last night (not really a
 unit test, a stand-alone program that endlessly adds stuff and tests
 that every commit returns the correct number of docs).

 8,000 cycles later there aren't any problems reported.

 Siiigh.


 On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com
 wrote:
  Hi,
 
  Erick you mentioned about a unit test to test the optimize running in
  background. Kindly share your findings if any.
 
  Thanks,
  Modassar
 
  On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
 
  wrote:
 
  Thanks everybody for your replies.
 
  I have noticed the optimization running in background every time I
  indexed. This is 5 node cluster with solr-5.1.0 and uses the
  CloudSolrClient. Kindly share your findings on this issue.
 
  Our index has almost 100M documents running on SolrCloud. We have been
  optimizing the index after indexing for years and it has worked well for
  us.
 
  Thanks,
  Modassar
 
  On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Actually, I've recently seen very similar behavior in Solr 4.10.3, but
  involving hard commits openSearcher=true, see:
  https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
  reproduce this at will, sii.
 
  A unit test should be very simple to write though, maybe I can get to
 it
  today.
 
  Erick
 
 
 
  On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
  
  
   On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
   On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits after
 the
invocation of optimize and the optimization keeps on running in
 the
background.
Kindly let me know if it is per design and how can I make my
 indexer
  to
wait until the optimization is over. Is there a
  configuration/parameter I
need to set for the same.
   
Please note that the same indexer with
  cloudSolrServer.optimize(true, true,
1) on Solr-4.10 used to wait till the optimize was over before
  exiting.
  
   This is very odd, because I could not get HttpSolrServer to
 optimize in
   the background, even when that was what I wanted.
  
   I wondered if maybe the Cloud object behaves differently with
 regard to
   blocking until an optimize is finished ... except that there is no
 code
   for optimizing in CloudSolrClient at all ... so I don't know where
 the
   different behavior would actually be happening.
  
   A more important question is, why are you optimising? Generally it
 isn't
   recommended anymore as it reduces the natural distribution of
 documents
   amongst segments and makes future merges more costly.
  
   Upayavira
 
 
 



Re: Index optimize runs in background.

2015-05-26 Thread Modassar Ather
Hi,

Erick you mentioned about a unit test to test the optimize running in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
wrote:

 Thanks everybody for your replies.

 I have noticed the optimization running in background every time I
 indexed. This is 5 node cluster with solr-5.1.0 and uses the
 CloudSolrClient. Kindly share your findings on this issue.

 Our index has almost 100M documents running on SolrCloud. We have been
 optimizing the index after indexing for years and it has worked well for
 us.

 Thanks,
 Modassar

 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Actually, I've recently seen very similar behavior in Solr 4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
 reproduce this at will, sii.

 A unit test should be very simple to write though, maybe I can get to it
 today.

 Erick



 On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
 
 
  On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
  On 5/21/2015 6:21 AM, Modassar Ather wrote:
   I am using Solr-5.1.0. I have an indexer class which invokes
   cloudSolrClient.optimize(true, true, 1). My indexer exits after the
   invocation of optimize and the optimization keeps on running in the
   background.
   Kindly let me know if it is per design and how can I make my indexer
 to
   wait until the optimization is over. Is there a
 configuration/parameter I
   need to set for the same.
  
   Please note that the same indexer with
 cloudSolrServer.optimize(true, true,
   1) on Solr-4.10 used to wait till the optimize was over before
 exiting.
 
  This is very odd, because I could not get HttpSolrServer to optimize in
  the background, even when that was what I wanted.
 
  I wondered if maybe the Cloud object behaves differently with regard to
  blocking until an optimize is finished ... except that there is no code
  for optimizing in CloudSolrClient at all ... so I don't know where the
  different behavior would actually be happening.
 
  A more important question is, why are you optimising? Generally it isn't
  recommended anymore as it reduces the natural distribution of documents
  amongst segments and makes future merges more costly.
 
  Upayavira





Re: Index optimize runs in background.

2015-05-26 Thread Upayavira
Modassar,

Are you saying that the reason you are optimising is because you have
been doing it for years? If this is the only reason, you should stop
doing it immediately. 

The one scenario in which optimisation still makes some sense is when
you reindex every night and optimise straight after. This will leave you
with a single segment which will search faster.

However, if you are doing a lot of indexing, especially with
deletes/updates, you will have merged your content into a single segment
which will later need to be merged. That merge will be costly as it will
involve copying the entire content of your large segment, which will
impact performance.

Before Solr 3.6, Optimisation was necessary and recommended. At that
point (or a little before) the TieredMergePolicy became the default, and
this made optimisation generally unnecessary.

Upayavira

On Mon, May 25, 2015, at 07:17 AM, Modassar Ather wrote:
 Thanks everybody for your replies.
 
 I have noticed the optimization running in background every time I
 indexed.
 This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient.
 Kindly
 share your findings on this issue.
 
 Our index has almost 100M documents running on SolrCloud. We have been
 optimizing the index after indexing for years and it has worked well for
 us.
 
 Thanks,
 Modassar
 
 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
 erickerick...@gmail.com
 wrote:
 
  Actually, I've recently seen very similar behavior in Solr 4.10.3, but
  involving hard commits openSearcher=true, see:
  https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
  reproduce this at will, sii.
 
  A unit test should be very simple to write though, maybe I can get to it
  today.
 
  Erick
 
 
 
  On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
  
  
   On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
   On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits after the
invocation of optimize and the optimization keeps on running in the
background.
Kindly let me know if it is per design and how can I make my indexer
  to
wait until the optimization is over. Is there a
  configuration/parameter I
need to set for the same.
   
Please note that the same indexer with cloudSolrServer.optimize(true,
  true,
1) on Solr-4.10 used to wait till the optimize was over before
  exiting.
  
   This is very odd, because I could not get HttpSolrServer to optimize in
   the background, even when that was what I wanted.
  
   I wondered if maybe the Cloud object behaves differently with regard to
   blocking until an optimize is finished ... except that there is no code
   for optimizing in CloudSolrClient at all ... so I don't know where the
   different behavior would actually be happening.
  
   A more important question is, why are you optimising? Generally it isn't
   recommended anymore as it reduces the natural distribution of documents
   amongst segments and makes future merges more costly.
  
   Upayavira
 


Re: Index optimize runs in background.

2015-05-26 Thread Erick Erickson
No results yet. I finished the test harness last night (not really a
unit test, a stand-alone program that endlessly adds stuff and tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.


On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote:
 Hi,

 Erick you mentioned about a unit test to test the optimize running in
 background. Kindly share your findings if any.

 Thanks,
 Modassar

 On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
 wrote:

 Thanks everybody for your replies.

 I have noticed the optimization running in background every time I
 indexed. This is 5 node cluster with solr-5.1.0 and uses the
 CloudSolrClient. Kindly share your findings on this issue.

 Our index has almost 100M documents running on SolrCloud. We have been
 optimizing the index after indexing for years and it has worked well for
 us.

 Thanks,
 Modassar

 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Actually, I've recently seen very similar behavior in Solr 4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
 reproduce this at will, sii.

 A unit test should be very simple to write though, maybe I can get to it
 today.

 Erick



 On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
 
 
  On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
  On 5/21/2015 6:21 AM, Modassar Ather wrote:
   I am using Solr-5.1.0. I have an indexer class which invokes
   cloudSolrClient.optimize(true, true, 1). My indexer exits after the
   invocation of optimize and the optimization keeps on running in the
   background.
   Kindly let me know if it is per design and how can I make my indexer
 to
   wait until the optimization is over. Is there a
 configuration/parameter I
   need to set for the same.
  
   Please note that the same indexer with
 cloudSolrServer.optimize(true, true,
   1) on Solr-4.10 used to wait till the optimize was over before
 exiting.
 
  This is very odd, because I could not get HttpSolrServer to optimize in
  the background, even when that was what I wanted.
 
  I wondered if maybe the Cloud object behaves differently with regard to
  blocking until an optimize is finished ... except that there is no code
  for optimizing in CloudSolrClient at all ... so I don't know where the
  different behavior would actually be happening.
 
  A more important question is, why are you optimising? Generally it isn't
  recommended anymore as it reduces the natural distribution of documents
  amongst segments and makes future merges more costly.
 
  Upayavira





Re: Index optimize runs in background.

2015-05-26 Thread Shawn Heisey
On 5/26/2015 6:29 AM, Upayavira wrote:
 Are you saying that the reason you are optimising is because you have
 been doing it for years? If this is the only reason, you should stop
 doing it immediately. 

 The one scenario in which optimisation still makes some sense is when
 you reindex every night and optimise straight after. This will leave you
 with a single segment which will search faster.

 However, if you are doing a lot of indexing, especially with
 deletes/updates, you will have merged your content into a single segment
 which will later need to be merged. That merge will be costly as it will
 involve copying the entire content of your large segment, which will
 impact performance.

 Before Solr 3.6, Optimisation was necessary and recommended. At that
 point (or a little before) the TieredMergePolicy became the default, and
 this made optimisation generally unnecessary.

In general, I concur with this advice about optimizing.  Historically,
optimize was done for increased performance.  In older versions, an
unoptimized index performed *MUCH* worse than an index with a single
segment.  This is no longer the case today, mostly due to so many Lucene
features working on a per-segment basis.  A single segment does perform
faster, but the difference is much smaller than it used to be.

A full optimize on a large index requires a LOT of CPU and I/O resources
-- while the optimize is underway, performance is not very good.

There are,however, still times when running optimize is appropriate:

1) The index is mostly static, not receiving very frequent updates.
2) There is a large percentage of deleted documents in the index.

With modern Lucene/Solr and these use cases, the reasons for optimizing
are still performance-related, but the only time you should do an
optimize is when the benefit outweighs the cost.

For the 1) use case, the index will likely remain mostly-optimized for a
long period of time after the optimize is done, so the resources
required for the optimize are worth spending.

For the 2) use case, optimizing will reduce the size of the index
significantly, so general performance gets better.  That makes the cost
worthwhile.

Thanks,
Shawn



Re: Index optimize runs in background.

2015-05-26 Thread Alessandro Benedetti
I completely agree with Upayavira and Shawn.
Modassar, can you explain us how often do you index ?
Have you ever played with the merge Factor ?
I hardly think you need to optimise at all.
Simply a tuning of the merge Factor should solve all your issues .
I assume you were optimising only to have fast search, weren't you ?

Cheers

2015-05-26 16:07 GMT+01:00 Shawn Heisey apa...@elyograg.org:

 On 5/26/2015 6:29 AM, Upayavira wrote:
  Are you saying that the reason you are optimising is because you have
  been doing it for years? If this is the only reason, you should stop
  doing it immediately.
 
  The one scenario in which optimisation still makes some sense is when
  you reindex every night and optimise straight after. This will leave you
  with a single segment which will search faster.
 
  However, if you are doing a lot of indexing, especially with
  deletes/updates, you will have merged your content into a single segment
  which will later need to be merged. That merge will be costly as it will
  involve copying the entire content of your large segment, which will
  impact performance.
 
  Before Solr 3.6, Optimisation was necessary and recommended. At that
  point (or a little before) the TieredMergePolicy became the default, and
  this made optimisation generally unnecessary.

 In general, I concur with this advice about optimizing.  Historically,
 optimize was done for increased performance.  In older versions, an
 unoptimized index performed *MUCH* worse than an index with a single
 segment.  This is no longer the case today, mostly due to so many Lucene
 features working on a per-segment basis.  A single segment does perform
 faster, but the difference is much smaller than it used to be.

 A full optimize on a large index requires a LOT of CPU and I/O resources
 -- while the optimize is underway, performance is not very good.

 There are,however, still times when running optimize is appropriate:

 1) The index is mostly static, not receiving very frequent updates.
 2) There is a large percentage of deleted documents in the index.

 With modern Lucene/Solr and these use cases, the reasons for optimizing
 are still performance-related, but the only time you should do an
 optimize is when the benefit outweighs the cost.

 For the 1) use case, the index will likely remain mostly-optimized for a
 long period of time after the optimize is done, so the resources
 required for the optimize are worth spending.

 For the 2) use case, optimizing will reduce the size of the index
 significantly, so general performance gets better.  That makes the cost
 worthwhile.

 Thanks,
 Shawn




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Index optimize runs in background.

2015-05-25 Thread Modassar Ather
Thanks everybody for your replies.

I have noticed the optimization running in background every time I indexed.
This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly
share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We have been
optimizing the index after indexing for years and it has worked well for
us.

Thanks,
Modassar

On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Actually, I've recently seen very similar behavior in Solr 4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
 reproduce this at will, sii.

 A unit test should be very simple to write though, maybe I can get to it
 today.

 Erick



 On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
 
 
  On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
  On 5/21/2015 6:21 AM, Modassar Ather wrote:
   I am using Solr-5.1.0. I have an indexer class which invokes
   cloudSolrClient.optimize(true, true, 1). My indexer exits after the
   invocation of optimize and the optimization keeps on running in the
   background.
   Kindly let me know if it is per design and how can I make my indexer
 to
   wait until the optimization is over. Is there a
 configuration/parameter I
   need to set for the same.
  
   Please note that the same indexer with cloudSolrServer.optimize(true,
 true,
   1) on Solr-4.10 used to wait till the optimize was over before
 exiting.
 
  This is very odd, because I could not get HttpSolrServer to optimize in
  the background, even when that was what I wanted.
 
  I wondered if maybe the Cloud object behaves differently with regard to
  blocking until an optimize is finished ... except that there is no code
  for optimizing in CloudSolrClient at all ... so I don't know where the
  different behavior would actually be happening.
 
  A more important question is, why are you optimising? Generally it isn't
  recommended anymore as it reduces the natural distribution of documents
  amongst segments and makes future merges more costly.
 
  Upayavira



Re: Index optimize runs in background.

2015-05-22 Thread Upayavira


On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
 On 5/21/2015 6:21 AM, Modassar Ather wrote:
  I am using Solr-5.1.0. I have an indexer class which invokes
  cloudSolrClient.optimize(true, true, 1). My indexer exits after the
  invocation of optimize and the optimization keeps on running in the
  background.
  Kindly let me know if it is per design and how can I make my indexer to
  wait until the optimization is over. Is there a configuration/parameter I
  need to set for the same.
  
  Please note that the same indexer with cloudSolrServer.optimize(true, true,
  1) on Solr-4.10 used to wait till the optimize was over before exiting.
 
 This is very odd, because I could not get HttpSolrServer to optimize in
 the background, even when that was what I wanted.
 
 I wondered if maybe the Cloud object behaves differently with regard to
 blocking until an optimize is finished ... except that there is no code
 for optimizing in CloudSolrClient at all ... so I don't know where the
 different behavior would actually be happening.

A more important question is, why are you optimising? Generally it isn't
recommended anymore as it reduces the natural distribution of documents
amongst segments and makes future merges more costly.

Upayavira


Re: Index optimize runs in background.

2015-05-22 Thread Shawn Heisey
On 5/21/2015 6:21 AM, Modassar Ather wrote:
 I am using Solr-5.1.0. I have an indexer class which invokes
 cloudSolrClient.optimize(true, true, 1). My indexer exits after the
 invocation of optimize and the optimization keeps on running in the
 background.
 Kindly let me know if it is per design and how can I make my indexer to
 wait until the optimization is over. Is there a configuration/parameter I
 need to set for the same.
 
 Please note that the same indexer with cloudSolrServer.optimize(true, true,
 1) on Solr-4.10 used to wait till the optimize was over before exiting.

This is very odd, because I could not get HttpSolrServer to optimize in
the background, even when that was what I wanted.

I wondered if maybe the Cloud object behaves differently with regard to
blocking until an optimize is finished ... except that there is no code
for optimizing in CloudSolrClient at all ... so I don't know where the
different behavior would actually be happening.

Thanks,
Shawn



Re: Index optimize runs in background.

2015-05-22 Thread Erick Erickson
Actually, I've recently seen very similar behavior in Solr 4.10.3, but
involving hard commits openSearcher=true, see:
https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
reproduce this at will, sii.

A unit test should be very simple to write though, maybe I can get to it today.

Erick



On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:


 On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
 On 5/21/2015 6:21 AM, Modassar Ather wrote:
  I am using Solr-5.1.0. I have an indexer class which invokes
  cloudSolrClient.optimize(true, true, 1). My indexer exits after the
  invocation of optimize and the optimization keeps on running in the
  background.
  Kindly let me know if it is per design and how can I make my indexer to
  wait until the optimization is over. Is there a configuration/parameter I
  need to set for the same.
 
  Please note that the same indexer with cloudSolrServer.optimize(true, true,
  1) on Solr-4.10 used to wait till the optimize was over before exiting.

 This is very odd, because I could not get HttpSolrServer to optimize in
 the background, even when that was what I wanted.

 I wondered if maybe the Cloud object behaves differently with regard to
 blocking until an optimize is finished ... except that there is no code
 for optimizing in CloudSolrClient at all ... so I don't know where the
 different behavior would actually be happening.

 A more important question is, why are you optimising? Generally it isn't
 recommended anymore as it reduces the natural distribution of documents
 amongst segments and makes future merges more costly.

 Upayavira


Index optimize runs in background.

2015-05-21 Thread Modassar Ather
Hi,

I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits after the
invocation of optimize and the optimization keeps on running in the
background.
Kindly let me know if it is per design and how can I make my indexer to
wait until the optimization is over. Is there a configuration/parameter I
need to set for the same.

Please note that the same indexer with cloudSolrServer.optimize(true, true,
1) on Solr-4.10 used to wait till the optimize was over before exiting.

Thanks,
Modassar


Re: Index optimize runs in background.

2015-05-21 Thread Modassar Ather
Hi

An insight on the question will be really helpful.

Thanks,
Modassar

On Thu, May 21, 2015 at 5:51 PM, Modassar Ather modather1...@gmail.com
wrote:

 Hi,

 I am using Solr-5.1.0. I have an indexer class which invokes
 cloudSolrClient.optimize(true, true, 1). My indexer exits after the
 invocation of optimize and the optimization keeps on running in the
 background.
 Kindly let me know if it is per design and how can I make my indexer to
 wait until the optimization is over. Is there a configuration/parameter I
 need to set for the same.

 Please note that the same indexer with cloudSolrServer.optimize(true,
 true, 1) on Solr-4.10 used to wait till the optimize was over before
 exiting.

 Thanks,
 Modassar