Re: Index optimize runs in background.
Why would you care when the forced merge (not an “optimize”) is done? Start it and get back to work. Or even better, never force merge and let the algorithm take care of it. Seriously, I’ve been giving this advice since before Lucene was written, because Ultraseek had the same approach for managing index segments. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 10, 2015, at 10:35 PM, Erick Erickson erickerick...@gmail.com wrote: If I knew, I would fix it ;). The sub-optimizes (i.e. the ones sent out to each replica) should be sent in parallel and then each thread should wait for completion from the replicas. There is no real check for optimize, I believe that the return from the call is considered sufficient. If we can track down if there are conditions under which this is not true we can fix it. But until there's a way to reproduce it, it's pretty much speculation. Best, Erick On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather modather1...@gmail.com wrote: Hi, There are 5 cores and a separate server for indexing on this solrcloud. Can you please share your suggestions on: How can indexer know that the optimize has completed even if the commit/optimize runs in background without going to the solr servers may be by using any solrj or other API? I tried but could not find any API/handler to check if the optimizations is completed. Kindly share your inputs. Thanks, Modassar On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com wrote: Can't get any failures to happen on my end so I really haven't a clue. Best, Erick On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Please provide your inputs on optimize and commit running as background. Your suggestion will be really helpful. Thanks, Modassar On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com wrote: Erick! I could not find any underlying setting of 10 minutes. It is not only optimize but commit is also behaving in the same fashion and is taking lesser time than usually had taken. As per my observation both are running in background. On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com wrote: I'm not talking about you setting a timeout, but the underlying connection timing out... The 10 minutes then the indexer exits comment points in that direction. Best, Erick On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote: I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week
Re: Index optimize runs in background.
Until somewhere around Lucene 3.5, you needed to optimise, because the merge strategy used wasn't that clever and left lots of deletes in your largest segment. Around that point, the TieredMergePolicy became the default. Because its algorithm is much more sophisticated, it took away the need to optimize in the majority of scenarios. In fact, it transformed optimizing from being a necessary thing to being a bad thing in most cases. So yes, let the algorithm take care of it, so long as you are using the TieredMergePolicy, which has been the default for over 2 years. Upayavira On Thu, Jun 11, 2015, at 07:01 AM, Walter Underwood wrote: Why would you care when the forced merge (not an “optimize”) is done? Start it and get back to work. Or even better, never force merge and let the algorithm take care of it. Seriously, I’ve been giving this advice since before Lucene was written, because Ultraseek had the same approach for managing index segments. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 10, 2015, at 10:35 PM, Erick Erickson erickerick...@gmail.com wrote: If I knew, I would fix it ;). The sub-optimizes (i.e. the ones sent out to each replica) should be sent in parallel and then each thread should wait for completion from the replicas. There is no real check for optimize, I believe that the return from the call is considered sufficient. If we can track down if there are conditions under which this is not true we can fix it. But until there's a way to reproduce it, it's pretty much speculation. Best, Erick On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather modather1...@gmail.com wrote: Hi, There are 5 cores and a separate server for indexing on this solrcloud. Can you please share your suggestions on: How can indexer know that the optimize has completed even if the commit/optimize runs in background without going to the solr servers may be by using any solrj or other API? I tried but could not find any API/handler to check if the optimizations is completed. Kindly share your inputs. Thanks, Modassar On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com wrote: Can't get any failures to happen on my end so I really haven't a clue. Best, Erick On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Please provide your inputs on optimize and commit running as background. Your suggestion will be really helpful. Thanks, Modassar On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com wrote: Erick! I could not find any underlying setting of 10 minutes. It is not only optimize but commit is also behaving in the same fashion and is taking lesser time than usually had taken. As per my observation both are running in background. On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com wrote: I'm not talking about you setting a timeout, but the underlying connection timing out... The 10 minutes then the indexer exits comment points in that direction. Best, Erick On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote: I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start
Re: Index optimize runs in background.
If I knew, I would fix it ;). The sub-optimizes (i.e. the ones sent out to each replica) should be sent in parallel and then each thread should wait for completion from the replicas. There is no real check for optimize, I believe that the return from the call is considered sufficient. If we can track down if there are conditions under which this is not true we can fix it. But until there's a way to reproduce it, it's pretty much speculation. Best, Erick On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather modather1...@gmail.com wrote: Hi, There are 5 cores and a separate server for indexing on this solrcloud. Can you please share your suggestions on: How can indexer know that the optimize has completed even if the commit/optimize runs in background without going to the solr servers may be by using any solrj or other API? I tried but could not find any API/handler to check if the optimizations is completed. Kindly share your inputs. Thanks, Modassar On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com wrote: Can't get any failures to happen on my end so I really haven't a clue. Best, Erick On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Please provide your inputs on optimize and commit running as background. Your suggestion will be really helpful. Thanks, Modassar On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com wrote: Erick! I could not find any underlying setting of 10 minutes. It is not only optimize but commit is also behaving in the same fashion and is taking lesser time than usually had taken. As per my observation both are running in background. On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com wrote: I'm not talking about you setting a timeout, but the underlying connection timing out... The 10 minutes then the indexer exits comment points in that direction. Best, Erick On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote: I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big
Re: Index optimize runs in background.
Hi, There are 5 cores and a separate server for indexing on this solrcloud. Can you please share your suggestions on: How can indexer know that the optimize has completed even if the commit/optimize runs in background without going to the solr servers may be by using any solrj or other API? I tried but could not find any API/handler to check if the optimizations is completed. Kindly share your inputs. Thanks, Modassar On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com wrote: Can't get any failures to happen on my end so I really haven't a clue. Best, Erick On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Please provide your inputs on optimize and commit running as background. Your suggestion will be really helpful. Thanks, Modassar On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com wrote: Erick! I could not find any underlying setting of 10 minutes. It is not only optimize but commit is also behaving in the same fashion and is taking lesser time than usually had taken. As per my observation both are running in background. On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com wrote: I'm not talking about you setting a timeout, but the underlying connection timing out... The 10 minutes then the indexer exits comment points in that direction. Best, Erick On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote: I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
Re: Index optimize runs in background.
Can't get any failures to happen on my end so I really haven't a clue. Best, Erick On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Please provide your inputs on optimize and commit running as background. Your suggestion will be really helpful. Thanks, Modassar On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com wrote: Erick! I could not find any underlying setting of 10 minutes. It is not only optimize but commit is also behaving in the same fashion and is taking lesser time than usually had taken. As per my observation both are running in background. On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com wrote: I'm not talking about you setting a timeout, but the underlying connection timing out... The 10 minutes then the indexer exits comment points in that direction. Best, Erick On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote: I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM,
Re: Index optimize runs in background.
Hi, Please provide your inputs on optimize and commit running as background. Your suggestion will be really helpful. Thanks, Modassar On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com wrote: Erick! I could not find any underlying setting of 10 minutes. It is not only optimize but commit is also behaving in the same fashion and is taking lesser time than usually had taken. As per my observation both are running in background. On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com wrote: I'm not talking about you setting a timeout, but the underlying connection timing out... The 10 minutes then the indexer exits comment points in that direction. Best, Erick On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote: I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background
Re: Index optimize runs in background.
Erick! I could not find any underlying setting of 10 minutes. It is not only optimize but commit is also behaving in the same fashion and is taking lesser time than usually had taken. As per my observation both are running in background. On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com wrote: I'm not talking about you setting a timeout, but the underlying connection timing out... The 10 minutes then the indexer exits comment points in that direction. Best, Erick On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote: I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have
Re: Index optimize runs in background.
I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn
Re: Index optimize runs in background.
I'm not talking about you setting a timeout, but the underlying connection timing out... The 10 minutes then the indexer exits comment points in that direction. Best, Erick On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote: I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer. On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com wrote: Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii.
Re: Index optimize runs in background.
The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important
Re: Index optimize runs in background.
Are you timing out on the client request? The theory here is that it's still a synchronous call, but you're just timing out at the client level. At that point, the optimize is still running it's just the connection has been dropped Shot in the dark. Erick On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote: I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note
Re: Index optimize runs in background.
I could not notice it but with my past experience of commit which used to take around 2 minutes is now taking around 8 seconds. I think this is also running as background. On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com wrote: The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data. There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com wrote: All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves
Re: Index optimize runs in background.
In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: Index optimize runs in background.
All strange of course. What do your Solr logs show when this happens? And how reproducible is this? Best, Erick On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote: In this case, optimising makes sense, once the index is generated, you are not updating It. Upayavira On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: Index optimize runs in background.
Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: Index optimize runs in background.
Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: Index optimize runs in background.
Modassar, Are you saying that the reason you are optimising is because you have been doing it for years? If this is the only reason, you should stop doing it immediately. The one scenario in which optimisation still makes some sense is when you reindex every night and optimise straight after. This will leave you with a single segment which will search faster. However, if you are doing a lot of indexing, especially with deletes/updates, you will have merged your content into a single segment which will later need to be merged. That merge will be costly as it will involve copying the entire content of your large segment, which will impact performance. Before Solr 3.6, Optimisation was necessary and recommended. At that point (or a little before) the TieredMergePolicy became the default, and this made optimisation generally unnecessary. Upayavira On Mon, May 25, 2015, at 07:17 AM, Modassar Ather wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: Index optimize runs in background.
No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: Index optimize runs in background.
On 5/26/2015 6:29 AM, Upayavira wrote: Are you saying that the reason you are optimising is because you have been doing it for years? If this is the only reason, you should stop doing it immediately. The one scenario in which optimisation still makes some sense is when you reindex every night and optimise straight after. This will leave you with a single segment which will search faster. However, if you are doing a lot of indexing, especially with deletes/updates, you will have merged your content into a single segment which will later need to be merged. That merge will be costly as it will involve copying the entire content of your large segment, which will impact performance. Before Solr 3.6, Optimisation was necessary and recommended. At that point (or a little before) the TieredMergePolicy became the default, and this made optimisation generally unnecessary. In general, I concur with this advice about optimizing. Historically, optimize was done for increased performance. In older versions, an unoptimized index performed *MUCH* worse than an index with a single segment. This is no longer the case today, mostly due to so many Lucene features working on a per-segment basis. A single segment does perform faster, but the difference is much smaller than it used to be. A full optimize on a large index requires a LOT of CPU and I/O resources -- while the optimize is underway, performance is not very good. There are,however, still times when running optimize is appropriate: 1) The index is mostly static, not receiving very frequent updates. 2) There is a large percentage of deleted documents in the index. With modern Lucene/Solr and these use cases, the reasons for optimizing are still performance-related, but the only time you should do an optimize is when the benefit outweighs the cost. For the 1) use case, the index will likely remain mostly-optimized for a long period of time after the optimize is done, so the resources required for the optimize are worth spending. For the 2) use case, optimizing will reduce the size of the index significantly, so general performance gets better. That makes the cost worthwhile. Thanks, Shawn
Re: Index optimize runs in background.
I completely agree with Upayavira and Shawn. Modassar, can you explain us how often do you index ? Have you ever played with the merge Factor ? I hardly think you need to optimise at all. Simply a tuning of the merge Factor should solve all your issues . I assume you were optimising only to have fast search, weren't you ? Cheers 2015-05-26 16:07 GMT+01:00 Shawn Heisey apa...@elyograg.org: On 5/26/2015 6:29 AM, Upayavira wrote: Are you saying that the reason you are optimising is because you have been doing it for years? If this is the only reason, you should stop doing it immediately. The one scenario in which optimisation still makes some sense is when you reindex every night and optimise straight after. This will leave you with a single segment which will search faster. However, if you are doing a lot of indexing, especially with deletes/updates, you will have merged your content into a single segment which will later need to be merged. That merge will be costly as it will involve copying the entire content of your large segment, which will impact performance. Before Solr 3.6, Optimisation was necessary and recommended. At that point (or a little before) the TieredMergePolicy became the default, and this made optimisation generally unnecessary. In general, I concur with this advice about optimizing. Historically, optimize was done for increased performance. In older versions, an unoptimized index performed *MUCH* worse than an index with a single segment. This is no longer the case today, mostly due to so many Lucene features working on a per-segment basis. A single segment does perform faster, but the difference is much smaller than it used to be. A full optimize on a large index requires a LOT of CPU and I/O resources -- while the optimize is underway, performance is not very good. There are,however, still times when running optimize is appropriate: 1) The index is mostly static, not receiving very frequent updates. 2) There is a large percentage of deleted documents in the index. With modern Lucene/Solr and these use cases, the reasons for optimizing are still performance-related, but the only time you should do an optimize is when the benefit outweighs the cost. For the 1) use case, the index will likely remain mostly-optimized for a long period of time after the optimize is done, so the resources required for the optimize are worth spending. For the 2) use case, optimizing will reduce the size of the index significantly, so general performance gets better. That makes the cost worthwhile. Thanks, Shawn -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Index optimize runs in background.
Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: Index optimize runs in background.
On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: Index optimize runs in background.
On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. Thanks, Shawn
Re: Index optimize runs in background.
Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Index optimize runs in background.
Hi, I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. Thanks, Modassar
Re: Index optimize runs in background.
Hi An insight on the question will be really helpful. Thanks, Modassar On Thu, May 21, 2015 at 5:51 PM, Modassar Ather modather1...@gmail.com wrote: Hi, I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. Thanks, Modassar