subject:"Index optimize runs in background."

Re: Index optimize runs in background.

2015-06-11 Thread Walter Underwood

Why would you care when the forced merge (not an “optimize”) is done? Start it
and get back to work.

Or even better, never force merge and let the algorithm take care of it.
Seriously, I’ve been giving this advice since before Lucene was written,
because Ultraseek had the same approach for managing index segments.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)

On Jun 10, 2015, at 10:35 PM, Erick Erickson erickerick...@gmail.com wrote:

If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
sent out to each replica) should be sent in parallel and then
each thread should wait for completion from the replicas. There
is no real check for optimize, I believe that the return from the
call is considered sufficient. If we can track down if there are
conditions under which this is not true we can fix it.

But until there's a way to reproduce it, it's pretty much speculation.

Best,
Erick

On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather modather1...@gmail.com
wrote:
Hi,

There are 5 cores and a separate server for indexing on this solrcloud. Can
you please share your suggestions on:
How can indexer know that the optimize has completed even if the
commit/optimize runs in background without going to the solr servers may be
by using any solrj or other API?

I tried but could not find any API/handler to check if the optimizations is
completed. Kindly share your inputs.

Thanks,
Modassar

On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com
wrote:

Can't get any failures to happen on my end so I really haven't a clue.

Best,
Erick

On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com
wrote:
Hi,

Please provide your inputs on optimize and commit running as background.
Your suggestion will be really helpful.

Thanks,
Modassar

On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
wrote:

Erick! I could not find any underlying setting of 10 minutes.
It is not only optimize but commit is also behaving in the same fashion
and is taking lesser time than usually had taken.
As per my observation both are running in background.

On Fri, May 29, 2015 at 7:21 PM, Erick Erickson
erickerick...@gmail.com
wrote:

I'm not talking about you setting a timeout, but the underlying
connection timing out...

The 10 minutes then the indexer exits comment points in that
direction.

Best,
Erick

On Thu, May 28, 2015 at 11:43 PM, Modassar Ather
modather1...@gmail.com
wrote:
I have not added any timeout in the indexer except zk client time out
which
is 30 seconds. I am simply calling client.close() at the end of
indexing.
The same code was not running in background for optimize with
solr-4.10.3
and org.apache.solr.client.solrj.impl.CloudSolrServer.

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson
erickerick...@gmail.com
wrote:

Are you timing out on the client request? The theory here is that
it's
still a synchronous call, but you're just timing out at the client
level. At that point, the optimize is still running it's just the
connection has been dropped

Shot in the dark.
Erick

On Thu, May 28, 2015 at 10:31 PM, Modassar Ather
modather1...@gmail.com
wrote:
I could not notice it but with my past experience of commit which
used to
take around 2 minutes is now taking around 8 seconds. I think
this is
also
running as background.

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather
modather1...@gmail.com

wrote:

The indexer takes almost 2 hours to optimize. It has a
multi-threaded
add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and
optimize. I
have
seen that the optimize goes into background after 10 minutes and
indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This
behavior I
have seen in multiple iteration of the indexing of same data.

There is nothing significant I found in log which I can share. I
can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start

commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson
erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this
happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk
wrote:
In this case, optimising makes sense, once the index is
generated,
you
are not updating It.

Upayavira

Re: Index optimize runs in background.

2015-06-11 Thread Upayavira

Until somewhere around Lucene 3.5, you needed to optimise, because the
merge strategy used wasn't that clever and left lots of deletes in your
largest segment. Around that point, the TieredMergePolicy became the
default. Because its algorithm is much more sophisticated, it took away
the need to optimize in the majority of scenarios. In fact, it
transformed optimizing from being a necessary thing to being a bad
thing in most cases.

So yes, let the algorithm take care of it, so long as you are using the
TieredMergePolicy, which has been the default for over 2 years.

Upayavira

On Thu, Jun 11, 2015, at 07:01 AM, Walter Underwood wrote:
 Why would you care when the forced merge (not an “optimize”) is done?
 Start it and get back to work.
 
 Or even better, never force merge and let the algorithm take care of it.
 Seriously, I’ve been giving this advice since before Lucene was written,
 because Ultraseek had the same approach for managing index segments.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
 
 On Jun 10, 2015, at 10:35 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
  sent out to each replica) should be sent in parallel and then
  each thread should wait for completion from the replicas. There
  is no real check for optimize, I believe that the return from the
  call is considered sufficient. If we can track down if there are
  conditions under which this is not true we can fix it.
  
  But until there's a way to reproduce it, it's pretty much speculation.
  
  Best,
  Erick
  
  On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather modather1...@gmail.com 
  wrote:
  Hi,
  
  There are 5 cores and a separate server for indexing on this solrcloud. Can
  you please share your suggestions on:
   How can indexer know that the optimize has completed even if the
  commit/optimize runs in background without going to the solr servers may be
  by using any solrj or other API?
  
  I tried but could not find any API/handler to check if the optimizations is
  completed. Kindly share your inputs.
  
  Thanks,
  Modassar
  
  On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com
  wrote:
  
  Can't get any failures to happen on my end so I really haven't a clue.
  
  Best,
  Erick
  
  On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com
  wrote:
  Hi,
  
  Please provide your inputs on optimize and commit running as background.
  Your suggestion will be really helpful.
  
  Thanks,
  Modassar
  
  On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
  wrote:
  
  Erick! I could not find any underlying setting of 10 minutes.
  It is not only optimize but commit is also behaving in the same fashion
  and is taking lesser time than usually had taken.
  As per my observation both are running in background.
  
  On Fri, May 29, 2015 at 7:21 PM, Erick Erickson 
  erickerick...@gmail.com
  wrote:
  
  I'm not talking about you setting a timeout, but the underlying
  connection timing out...
  
  The 10 minutes then the indexer exits comment points in that
  direction.
  
  Best,
  Erick
  
  On Thu, May 28, 2015 at 11:43 PM, Modassar Ather 
  modather1...@gmail.com
  wrote:
  I have not added any timeout in the indexer except zk client time out
  which
  is 30 seconds. I am simply calling client.close() at the end of
  indexing.
  The same code was not running in background for optimize with
  solr-4.10.3
  and org.apache.solr.client.solrj.impl.CloudSolrServer.
  
  On Fri, May 29, 2015 at 11:13 AM, Erick Erickson 
  erickerick...@gmail.com
  wrote:
  
  Are you timing out on the client request? The theory here is that
  it's
  still a synchronous call, but you're just timing out at the client
  level. At that point, the optimize is still running it's just the
  connection has been dropped
  
  Shot in the dark.
  Erick
  
  On Thu, May 28, 2015 at 10:31 PM, Modassar Ather 
  modather1...@gmail.com
  wrote:
  I could not notice it but with my past experience of commit which
  used to
  take around 2 minutes is now taking around 8 seconds. I think
  this is
  also
  running as background.
  
  On Fri, May 29, 2015 at 10:52 AM, Modassar Ather 
  modather1...@gmail.com
  
  wrote:
  
  The indexer takes almost 2 hours to optimize. It has a
  multi-threaded
  add
  of batches of documents to
  org.apache.solr.client.solrj.impl.CloudSolrClient.
  Once all the documents are indexed it invokes commit and
  optimize. I
  have
  seen that the optimize goes into background after 10 minutes and
  indexer
  exits.
  I am not sure why this 10 minutes it hangs on indexer. This
  behavior I
  have seen in multiple iteration of the indexing of same data.
  
  There is nothing significant I found in log which I can share. I
  can see
  following in log.
  org.apache.solr.update.DirectUpdateHandler2; start

Re: Index optimize runs in background.

2015-06-10 Thread Erick Erickson

But until there's a way to reproduce it, it's pretty much speculation.

Best,
Erick

On Wed, Jun 10, 2015 at 10:14 PM, Modassar Ather modather1...@gmail.com wrote:
Hi,

I tried but could not find any API/handler to check if the optimizations is
completed. Kindly share your inputs.

Thanks,
Modassar

On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com
wrote:

Can't get any failures to happen on my end so I really haven't a clue.

Best,
Erick

On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com
wrote:
Hi,

Please provide your inputs on optimize and commit running as background.
Your suggestion will be really helpful.

Thanks,
Modassar

On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
wrote:

On Fri, May 29, 2015 at 7:21 PM, Erick Erickson
erickerick...@gmail.com
wrote:

I'm not talking about you setting a timeout, but the underlying
connection timing out...

The 10 minutes then the indexer exits comment points in that
direction.

Best,
Erick

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson
erickerick...@gmail.com
wrote:

Shot in the dark.
Erick

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather
modather1...@gmail.com

wrote:

There is nothing significant I found in log which I can share. I
can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start

commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson
erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this
happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk
wrote:
In this case, optimising makes sense, once the index is
generated,
you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5
shards
and
each shard has an index size of about 170+GB (for the record,
we are
not
using stored fields - our documents are pretty large). We
perform a
full
indexing every weekend and during the week there are no
updates
made to
the
index. Most of the queries that we run are pretty complex
with
hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
boosts
etc.
and take many minutes to execute. A difference of 10-20% is
also a
big

Re: Index optimize runs in background.

2015-06-10 Thread Modassar Ather

Hi,

I tried but could not find any API/handler to check if the optimizations is
completed. Kindly share your inputs.

Thanks,
Modassar

On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson erickerick...@gmail.com
wrote:

Can't get any failures to happen on my end so I really haven't a clue.

Best,
Erick

On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com
wrote:
Hi,

Please provide your inputs on optimize and commit running as background.
Your suggestion will be really helpful.

Thanks,
Modassar

On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
wrote:

On Fri, May 29, 2015 at 7:21 PM, Erick Erickson
erickerick...@gmail.com
wrote:

I'm not talking about you setting a timeout, but the underlying
connection timing out...

The 10 minutes then the indexer exits comment points in that
direction.

Best,
Erick

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson
erickerick...@gmail.com
wrote:

Shot in the dark.
Erick

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather
modather1...@gmail.com

wrote:

There is nothing significant I found in log which I can share. I
can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start

commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson
erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this
happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk
wrote:
In this case, optimising makes sense, once the index is
generated,
you
are not updating It.

Upayavira

We have been optimizing the index after indexing for years
and
it
has
worked well for us. Every once in a while, we upgrade Solr to
the
latest
version and try without optimizing so that we can save the
many
hours
it
take to optimize such a huge index, but find optimized index
work
well
for
us.

Erick I was indexing today the documents and saw the optimize
happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson

Re: Index optimize runs in background.

2015-06-04 Thread Erick Erickson

Can't get any failures to happen on my end so I really haven't a clue.

Best,
Erick

On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather modather1...@gmail.com wrote:
Hi,

Please provide your inputs on optimize and commit running as background.
Your suggestion will be really helpful.

Thanks,
Modassar

On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
wrote:

On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com
wrote:

I'm not talking about you setting a timeout, but the underlying
connection timing out...

The 10 minutes then the indexer exits comment points in that direction.

Best,
Erick

On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com
wrote:
I have not added any timeout in the indexer except zk client time out
which
is 30 seconds. I am simply calling client.close() at the end of
indexing.
The same code was not running in background for optimize with
solr-4.10.3
and org.apache.solr.client.solrj.impl.CloudSolrServer.

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson
erickerick...@gmail.com
wrote:

Are you timing out on the client request? The theory here is that it's
still a synchronous call, but you're just timing out at the client
level. At that point, the optimize is still running it's just the
connection has been dropped

Shot in the dark.
Erick

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather
modather1...@gmail.com

wrote:

The indexer takes almost 2 hours to optimize. It has a
multi-threaded
add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I
have
seen that the optimize goes into background after 10 minutes and
indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This
behavior I
have seen in multiple iteration of the indexing of same data.

There is nothing significant I found in log which I can share. I
can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start

commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson
erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this
happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is
generated,
you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5
shards
and
each shard has an index size of about 170+GB (for the record,
we are
not
using stored fields - our documents are pretty large). We
perform a
full
indexing every weekend and during the week there are no updates
made to
the
index. Most of the queries that we run are pretty complex with
hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
boosts
etc.
and take many minutes to execute. A difference of 10-20% is
also a
big
advantage for us.

We have been optimizing the index after indexing for years and
it
has
worked well for us. Every once in a while, we upgrade Solr to
the
latest
version and try without optimizing so that we can save the many
hours
it
take to optimize such a huge index, but find optimized index
work
well
for
us.

Erick I was indexing today the documents and saw the optimize
happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not
really a
unit test, a stand-alone program that endlessly adds stuff and
tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize
running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM,

Re: Index optimize runs in background.

2015-06-04 Thread Modassar Ather

Hi,

Please provide your inputs on optimize and commit running as background.
Your suggestion will be really helpful.

Thanks,
Modassar

On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather modather1...@gmail.com
wrote:

On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com
wrote:

I'm not talking about you setting a timeout, but the underlying
connection timing out...

The 10 minutes then the indexer exits comment points in that direction.

Best,
Erick

On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com
wrote:
I have not added any timeout in the indexer except zk client time out
which
is 30 seconds. I am simply calling client.close() at the end of
indexing.
The same code was not running in background for optimize with
solr-4.10.3
and org.apache.solr.client.solrj.impl.CloudSolrServer.

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson
erickerick...@gmail.com
wrote:

Are you timing out on the client request? The theory here is that it's
still a synchronous call, but you're just timing out at the client
level. At that point, the optimize is still running it's just the
connection has been dropped

Shot in the dark.
Erick

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather
modather1...@gmail.com

wrote:

The indexer takes almost 2 hours to optimize. It has a
multi-threaded
add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I
have
seen that the optimize goes into background after 10 minutes and
indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This
behavior I
have seen in multiple iteration of the indexing of same data.

There is nothing significant I found in log which I can share. I
can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start

commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson
erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this
happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is
generated,
you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5
shards
and
each shard has an index size of about 170+GB (for the record,
we are
not
using stored fields - our documents are pretty large). We
perform a
full
indexing every weekend and during the week there are no updates
made to
the
index. Most of the queries that we run are pretty complex with
hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
boosts
etc.
and take many minutes to execute. A difference of 10-20% is
also a
big
advantage for us.

We have been optimizing the index after indexing for years and
it
has
worked well for us. Every once in a while, we upgrade Solr to
the
latest
version and try without optimizing so that we can save the many
hours
it
take to optimize such a huge index, but find optimized index
work
well
for
us.

Erick I was indexing today the documents and saw the optimize
happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not
really a
unit test, a stand-alone program that endlessly adds stuff and
tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize
running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
modather1...@gmail.com

wrote:

Thanks everybody for your replies.

I have noticed the optimization running in background

Re: Index optimize runs in background.

2015-06-02 Thread Modassar Ather

Erick! I could not find any underlying setting of 10 minutes.
It is not only optimize but commit is also behaving in the same fashion and
is taking lesser time than usually had taken.
As per my observation both are running in background.

On Fri, May 29, 2015 at 7:21 PM, Erick Erickson erickerick...@gmail.com
wrote:

I'm not talking about you setting a timeout, but the underlying
connection timing out...

The 10 minutes then the indexer exits comment points in that direction.

Best,
Erick

On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com
wrote:
I have not added any timeout in the indexer except zk client time out
which
is 30 seconds. I am simply calling client.close() at the end of indexing.
The same code was not running in background for optimize with solr-4.10.3
and org.apache.solr.client.solrj.impl.CloudSolrServer.

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson
erickerick...@gmail.com
wrote:

Are you timing out on the client request? The theory here is that it's
still a synchronous call, but you're just timing out at the client
level. At that point, the optimize is still running it's just the
connection has been dropped

Shot in the dark.
Erick

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather
modather1...@gmail.com

wrote:

The indexer takes almost 2 hours to optimize. It has a multi-threaded
add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I
have
seen that the optimize goes into background after 10 minutes and
indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior
I
have seen in multiple iteration of the indexing of same data.

There is nothing significant I found in log which I can share. I can
see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start

commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson
erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this
happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is generated,
you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5
shards
and
each shard has an index size of about 170+GB (for the record, we
are
not
using stored fields - our documents are pretty large). We
perform a
full
indexing every weekend and during the week there are no updates
made to
the
index. Most of the queries that we run are pretty complex with
hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
boosts
etc.
and take many minutes to execute. A difference of 10-20% is also
a
big
advantage for us.

We have been optimizing the index after indexing for years and it
has
worked well for us. Every once in a while, we upgrade Solr to the
latest
version and try without optimizing so that we can save the many
hours
it
take to optimize such a huge index, but find optimized index work
well
for
us.

Erick I was indexing today the documents and saw the optimize
happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not
really a
unit test, a stand-alone program that endlessly adds stuff and
tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize
running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
modather1...@gmail.com

wrote:

Thanks everybody for your replies.

I have noticed the optimization running in background every
time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We
have

Re: Index optimize runs in background.

2015-05-29 Thread Modassar Ather

I have not added any timeout in the indexer except zk client time out which
is 30 seconds. I am simply calling client.close() at the end of indexing.
The same code was not running in background for optimize with solr-4.10.3
and org.apache.solr.client.solrj.impl.CloudSolrServer.

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com
wrote:

Are you timing out on the client request? The theory here is that it's
still a synchronous call, but you're just timing out at the client
level. At that point, the optimize is still running it's just the
connection has been dropped

Shot in the dark.
Erick

On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com
wrote:
I could not notice it but with my past experience of commit which used to
take around 2 minutes is now taking around 8 seconds. I think this is
also
running as background.

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com

wrote:

The indexer takes almost 2 hours to optimize. It has a multi-threaded
add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I
have
seen that the optimize goes into background after 10 minutes and indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior I
have seen in multiple iteration of the indexing of same data.

There is nothing significant I found in log which I can share. I can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start

commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson
erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is generated,
you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5 shards
and
each shard has an index size of about 170+GB (for the record, we are
not
using stored fields - our documents are pretty large). We perform a
full
indexing every weekend and during the week there are no updates
made to
the
index. Most of the queries that we run are pretty complex with
hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
etc.
and take many minutes to execute. A difference of 10-20% is also a
big
advantage for us.

We have been optimizing the index after indexing for years and it
has
worked well for us. Every once in a while, we upgrade Solr to the
latest
version and try without optimizing so that we can save the many
hours
it
take to optimize such a huge index, but find optimized index work
well
for
us.

Erick I was indexing today the documents and saw the optimize
happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not
really a
unit test, a stand-alone program that endlessly adds stuff and
tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize
running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
modather1...@gmail.com

wrote:

Thanks everybody for your replies.

I have noticed the optimization running in background every
time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We
have
been
optimizing the index after indexing for years and it has worked
well for
us.

Thanks,
Modassar

On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
erickerick...@gmail.com
wrote:

Actually, I've recently seen very similar behavior in Solr
4.10.3, but
involving hard commits openSearcher=true, see:
https://issues.apache.org/jira/browse/SOLR-7572. Of course I
can't
reproduce this at will, sii.

A unit test should be very simple to write though, maybe I can
get to
it
today.

Erick

On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
wrote:

On Fri, May 22, 2015, at 03:55 PM, Shawn

Re: Index optimize runs in background.

2015-05-29 Thread Erick Erickson

I'm not talking about you setting a timeout, but the underlying
connection timing out...

The 10 minutes then the indexer exits comment points in that direction.

Best,
Erick

On Thu, May 28, 2015 at 11:43 PM, Modassar Ather modather1...@gmail.com wrote:
I have not added any timeout in the indexer except zk client time out which
is 30 seconds. I am simply calling client.close() at the end of indexing.
The same code was not running in background for optimize with solr-4.10.3
and org.apache.solr.client.solrj.impl.CloudSolrServer.

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson erickerick...@gmail.com
wrote:

Are you timing out on the client request? The theory here is that it's
still a synchronous call, but you're just timing out at the client
level. At that point, the optimize is still running it's just the
connection has been dropped

Shot in the dark.
Erick

On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com
wrote:
I could not notice it but with my past experience of commit which used to
take around 2 minutes is now taking around 8 seconds. I think this is
also
running as background.

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com

wrote:

The indexer takes almost 2 hours to optimize. It has a multi-threaded
add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I
have
seen that the optimize goes into background after 10 minutes and indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior I
have seen in multiple iteration of the indexing of same data.

There is nothing significant I found in log which I can share. I can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start

commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson
erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is generated,
you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5 shards
and
each shard has an index size of about 170+GB (for the record, we are
not
using stored fields - our documents are pretty large). We perform a
full
indexing every weekend and during the week there are no updates
made to
the
index. Most of the queries that we run are pretty complex with
hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
etc.
and take many minutes to execute. A difference of 10-20% is also a
big
advantage for us.

We have been optimizing the index after indexing for years and it
has
worked well for us. Every once in a while, we upgrade Solr to the
latest
version and try without optimizing so that we can save the many
hours
it
take to optimize such a huge index, but find optimized index work
well
for
us.

Erick I was indexing today the documents and saw the optimize
happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not
really a
unit test, a stand-alone program that endlessly adds stuff and
tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize
running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
modather1...@gmail.com

wrote:

Thanks everybody for your replies.

I have noticed the optimization running in background every
time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We
have
been
optimizing the index after indexing for years and it has worked
well for
us.

Thanks,
Modassar

On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
erickerick...@gmail.com
wrote:

Re: Index optimize runs in background.

2015-05-28 Thread Modassar Ather

The indexer takes almost 2 hours to optimize. It has a multi-threaded add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I have
seen that the optimize goes into background after 10 minutes and indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior I have
seen in multiple iteration of the indexing of same data.

There is nothing significant I found in log which I can share. I can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is generated, you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5 shards and
each shard has an index size of about 170+GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to
the
index. Most of the queries that we run are pretty complex with hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the latest
version and try without optimizing so that we can save the many hours it
take to optimize such a huge index, but find optimized index work well
for
us.

Erick I was indexing today the documents and saw the optimize happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not really a
unit test, a stand-alone program that endlessly adds stuff and tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
modather1...@gmail.com

wrote:

Thanks everybody for your replies.

I have noticed the optimization running in background every time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We have
been
optimizing the index after indexing for years and it has worked
well for
us.

Thanks,
Modassar

On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
erickerick...@gmail.com
wrote:

A unit test should be very simple to write though, maybe I can
get to
it
today.

Erick

On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
wrote:

On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits
after
the
invocation of optimize and the optimization keeps on running
in
the
background.
Kindly let me know if it is per design and how can I make my
indexer
to
wait until the optimization is over. Is there a
configuration/parameter I
need to set for the same.

Please note that the same indexer with
cloudSolrServer.optimize(true, true,
1) on Solr-4.10 used to wait till the optimize was over
before
exiting.

This is very odd, because I could not get HttpSolrServer to
optimize in
the background, even when that was what I wanted.

I wondered if maybe the Cloud object behaves differently with
regard to
blocking until an optimize is finished ... except that there
is no
code
for optimizing in CloudSolrClient at all ... so I don't know
where
the
different behavior would actually be happening.

A more important

Re: Index optimize runs in background.

2015-05-28 Thread Erick Erickson

Are you timing out on the client request? The theory here is that it's
still a synchronous call, but you're just timing out at the client
level. At that point, the optimize is still running it's just the
connection has been dropped

Shot in the dark.
Erick

On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote:
I could not notice it but with my past experience of commit which used to
take around 2 minutes is now taking around 8 seconds. I think this is also
running as background.

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com
wrote:

The indexer takes almost 2 hours to optimize. It has a multi-threaded add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I have
seen that the optimize goes into background after 10 minutes and indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior I
have seen in multiple iteration of the indexing of same data.

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is generated, you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5 shards
and
each shard has an index size of about 170+GB (for the record, we are
not
using stored fields - our documents are pretty large). We perform a
full
indexing every weekend and during the week there are no updates made to
the
index. Most of the queries that we run are pretty complex with hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the
latest
version and try without optimizing so that we can save the many hours
it
take to optimize such a huge index, but find optimized index work well
for
us.

Erick I was indexing today the documents and saw the optimize happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not really a
unit test, a stand-alone program that endlessly adds stuff and tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
modather1...@gmail.com

wrote:

Thanks everybody for your replies.

I have noticed the optimization running in background every time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We have
been
optimizing the index after indexing for years and it has worked
well for
us.

Thanks,
Modassar

On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
erickerick...@gmail.com
wrote:

A unit test should be very simple to write though, maybe I can
get to
it
today.

Erick

On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
wrote:

On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits
after
the
invocation of optimize and the optimization keeps on
running in
the
background.
Kindly let me know if it is per design and how can I make my
indexer
to
wait until the optimization is over. Is there a
configuration/parameter I
need to set for the same.

Please note

Re: Index optimize runs in background.

2015-05-28 Thread Modassar Ather

I could not notice it but with my past experience of commit which used to
take around 2 minutes is now taking around 8 seconds. I think this is also
running as background.

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com
wrote:

The indexer takes almost 2 hours to optimize. It has a multi-threaded add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I have
seen that the optimize goes into background after 10 minutes and indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior I
have seen in multiple iteration of the indexing of same data.

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is generated, you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5 shards
and
each shard has an index size of about 170+GB (for the record, we are
not
using stored fields - our documents are pretty large). We perform a
full
indexing every weekend and during the week there are no updates made to
the
index. Most of the queries that we run are pretty complex with hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the
latest
version and try without optimizing so that we can save the many hours
it
take to optimize such a huge index, but find optimized index work well
for
us.

Erick I was indexing today the documents and saw the optimize happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not really a
unit test, a stand-alone program that endlessly adds stuff and tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
modather1...@gmail.com

wrote:

Thanks everybody for your replies.

I have noticed the optimization running in background every time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We have
been
optimizing the index after indexing for years and it has worked
well for
us.

Thanks,
Modassar

On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
erickerick...@gmail.com
wrote:

A unit test should be very simple to write though, maybe I can
get to
it
today.

Erick

On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
wrote:

On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits
after
the
invocation of optimize and the optimization keeps on
running in
the
background.
Kindly let me know if it is per design and how can I make my
indexer
to
wait until the optimization is over. Is there a
configuration/parameter I
need to set for the same.

Please note that the same indexer with
cloudSolrServer.optimize(true, true,
1) on Solr-4.10 used to wait till the optimize was over
before
exiting.

This is very odd, because I could not get HttpSolrServer to
optimize in
the background, even when that was what I wanted.

I wondered if maybe the Cloud object behaves

Re: Index optimize runs in background.

2015-05-27 Thread Upayavira

In this case, optimising makes sense, once the index is generated, you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
 Our index has almost 100M documents running on SolrCloud of 5 shards and
 each shard has an index size of about 170+GB (for the record, we are not
 using stored fields - our documents are pretty large). We perform a full
 indexing every weekend and during the week there are no updates made to
 the
 index. Most of the queries that we run are pretty complex with hundreds
 of
 terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
 and take many minutes to execute. A difference of 10-20% is also a big
 advantage for us.
 
 We have been optimizing the index after indexing for years and it has
 worked well for us. Every once in a while, we upgrade Solr to the latest
 version and try without optimizing so that we can save the many hours it
 take to optimize such a huge index, but find optimized index work well
 for
 us.
 
 Erick I was indexing today the documents and saw the optimize happening
 in
 background.
 
 On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  No results yet. I finished the test harness last night (not really a
  unit test, a stand-alone program that endlessly adds stuff and tests
  that every commit returns the correct number of docs).
 
  8,000 cycles later there aren't any problems reported.
 
  Siiigh.
 
 
  On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com
  wrote:
   Hi,
  
   Erick you mentioned about a unit test to test the optimize running in
   background. Kindly share your findings if any.
  
   Thanks,
   Modassar
  
   On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
  
   wrote:
  
   Thanks everybody for your replies.
  
   I have noticed the optimization running in background every time I
   indexed. This is 5 node cluster with solr-5.1.0 and uses the
   CloudSolrClient. Kindly share your findings on this issue.
  
   Our index has almost 100M documents running on SolrCloud. We have been
   optimizing the index after indexing for years and it has worked well for
   us.
  
   Thanks,
   Modassar
  
   On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
   Actually, I've recently seen very similar behavior in Solr 4.10.3, but
   involving hard commits openSearcher=true, see:
   https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
   reproduce this at will, sii.
  
   A unit test should be very simple to write though, maybe I can get to
  it
   today.
  
   Erick
  
  
  
   On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
   
   
On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
On 5/21/2015 6:21 AM, Modassar Ather wrote:
 I am using Solr-5.1.0. I have an indexer class which invokes
 cloudSolrClient.optimize(true, true, 1). My indexer exits after
  the
 invocation of optimize and the optimization keeps on running in
  the
 background.
 Kindly let me know if it is per design and how can I make my
  indexer
   to
 wait until the optimization is over. Is there a
   configuration/parameter I
 need to set for the same.

 Please note that the same indexer with
   cloudSolrServer.optimize(true, true,
 1) on Solr-4.10 used to wait till the optimize was over before
   exiting.
   
This is very odd, because I could not get HttpSolrServer to
  optimize in
the background, even when that was what I wanted.
   
I wondered if maybe the Cloud object behaves differently with
  regard to
blocking until an optimize is finished ... except that there is no
  code
for optimizing in CloudSolrClient at all ... so I don't know where
  the
different behavior would actually be happening.
   
A more important question is, why are you optimising? Generally it
  isn't
recommended anymore as it reduces the natural distribution of
  documents
amongst segments and makes future merges more costly.
   
Upayavira

Re: Index optimize runs in background.

2015-05-27 Thread Erick Erickson

All strange of course. What do your Solr logs show when this happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
 In this case, optimising makes sense, once the index is generated, you
 are not updating It.

 Upayavira

 On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
 Our index has almost 100M documents running on SolrCloud of 5 shards and
 each shard has an index size of about 170+GB (for the record, we are not
 using stored fields - our documents are pretty large). We perform a full
 indexing every weekend and during the week there are no updates made to
 the
 index. Most of the queries that we run are pretty complex with hundreds
 of
 terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
 and take many minutes to execute. A difference of 10-20% is also a big
 advantage for us.

 We have been optimizing the index after indexing for years and it has
 worked well for us. Every once in a while, we upgrade Solr to the latest
 version and try without optimizing so that we can save the many hours it
 take to optimize such a huge index, but find optimized index work well
 for
 us.

 Erick I was indexing today the documents and saw the optimize happening
 in
 background.

 On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  No results yet. I finished the test harness last night (not really a
  unit test, a stand-alone program that endlessly adds stuff and tests
  that every commit returns the correct number of docs).
 
  8,000 cycles later there aren't any problems reported.
 
  Siiigh.
 
 
  On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com
  wrote:
   Hi,
  
   Erick you mentioned about a unit test to test the optimize running in
   background. Kindly share your findings if any.
  
   Thanks,
   Modassar
  
   On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
  
   wrote:
  
   Thanks everybody for your replies.
  
   I have noticed the optimization running in background every time I
   indexed. This is 5 node cluster with solr-5.1.0 and uses the
   CloudSolrClient. Kindly share your findings on this issue.
  
   Our index has almost 100M documents running on SolrCloud. We have been
   optimizing the index after indexing for years and it has worked well for
   us.
  
   Thanks,
   Modassar
  
   On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
   Actually, I've recently seen very similar behavior in Solr 4.10.3, but
   involving hard commits openSearcher=true, see:
   https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
   reproduce this at will, sii.
  
   A unit test should be very simple to write though, maybe I can get to
  it
   today.
  
   Erick
  
  
  
   On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
   
   
On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
On 5/21/2015 6:21 AM, Modassar Ather wrote:
 I am using Solr-5.1.0. I have an indexer class which invokes
 cloudSolrClient.optimize(true, true, 1). My indexer exits after
  the
 invocation of optimize and the optimization keeps on running in
  the
 background.
 Kindly let me know if it is per design and how can I make my
  indexer
   to
 wait until the optimization is over. Is there a
   configuration/parameter I
 need to set for the same.

 Please note that the same indexer with
   cloudSolrServer.optimize(true, true,
 1) on Solr-4.10 used to wait till the optimize was over before
   exiting.
   
This is very odd, because I could not get HttpSolrServer to
  optimize in
the background, even when that was what I wanted.
   
I wondered if maybe the Cloud object behaves differently with
  regard to
blocking until an optimize is finished ... except that there is no
  code
for optimizing in CloudSolrClient at all ... so I don't know where
  the
different behavior would actually be happening.
   
A more important question is, why are you optimising? Generally it
  isn't
recommended anymore as it reduces the natural distribution of
  documents
amongst segments and makes future merges more costly.
   
Upayavira

Re: Index optimize runs in background.

2015-05-26 Thread Modassar Ather

Our index has almost 100M documents running on SolrCloud of 5 shards and
each shard has an index size of about 170+GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to the
index. Most of the queries that we run are pretty complex with hundreds of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the latest
version and try without optimizing so that we can save the many hours it
take to optimize such a huge index, but find optimized index work well for
us.

Erick I was indexing today the documents and saw the optimize happening in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com
wrote:

 No results yet. I finished the test harness last night (not really a
 unit test, a stand-alone program that endlessly adds stuff and tests
 that every commit returns the correct number of docs).

 8,000 cycles later there aren't any problems reported.

 Siiigh.


 On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com
 wrote:
  Hi,
 
  Erick you mentioned about a unit test to test the optimize running in
  background. Kindly share your findings if any.
 
  Thanks,
  Modassar
 
  On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
 
  wrote:
 
  Thanks everybody for your replies.
 
  I have noticed the optimization running in background every time I
  indexed. This is 5 node cluster with solr-5.1.0 and uses the
  CloudSolrClient. Kindly share your findings on this issue.
 
  Our index has almost 100M documents running on SolrCloud. We have been
  optimizing the index after indexing for years and it has worked well for
  us.
 
  Thanks,
  Modassar
 
  On Fri, May 22, 2015 at 11:55 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  Actually, I've recently seen very similar behavior in Solr 4.10.3, but
  involving hard commits openSearcher=true, see:
  https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
  reproduce this at will, sii.
 
  A unit test should be very simple to write though, maybe I can get to
 it
  today.
 
  Erick
 
 
 
  On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
  
  
   On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
   On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits after
 the
invocation of optimize and the optimization keeps on running in
 the
background.
Kindly let me know if it is per design and how can I make my
 indexer
  to
wait until the optimization is over. Is there a
  configuration/parameter I
need to set for the same.
   
Please note that the same indexer with
  cloudSolrServer.optimize(true, true,
1) on Solr-4.10 used to wait till the optimize was over before
  exiting.
  
   This is very odd, because I could not get HttpSolrServer to
 optimize in
   the background, even when that was what I wanted.
  
   I wondered if maybe the Cloud object behaves differently with
 regard to
   blocking until an optimize is finished ... except that there is no
 code
   for optimizing in CloudSolrClient at all ... so I don't know where
 the
   different behavior would actually be happening.
  
   A more important question is, why are you optimising? Generally it
 isn't
   recommended anymore as it reduces the natural distribution of
 documents
   amongst segments and makes future merges more costly.
  
   Upayavira

Re: Index optimize runs in background.

2015-05-26 Thread Modassar Ather

Hi,

Erick you mentioned about a unit test to test the optimize running in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
wrote:

 Thanks everybody for your replies.

 I have noticed the optimization running in background every time I
 indexed. This is 5 node cluster with solr-5.1.0 and uses the
 CloudSolrClient. Kindly share your findings on this issue.

 Our index has almost 100M documents running on SolrCloud. We have been
 optimizing the index after indexing for years and it has worked well for
 us.

 Thanks,
 Modassar

 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Actually, I've recently seen very similar behavior in Solr 4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
 reproduce this at will, sii.

 A unit test should be very simple to write though, maybe I can get to it
 today.

 Erick



 On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
 
 
  On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
  On 5/21/2015 6:21 AM, Modassar Ather wrote:
   I am using Solr-5.1.0. I have an indexer class which invokes
   cloudSolrClient.optimize(true, true, 1). My indexer exits after the
   invocation of optimize and the optimization keeps on running in the
   background.
   Kindly let me know if it is per design and how can I make my indexer
 to
   wait until the optimization is over. Is there a
 configuration/parameter I
   need to set for the same.
  
   Please note that the same indexer with
 cloudSolrServer.optimize(true, true,
   1) on Solr-4.10 used to wait till the optimize was over before
 exiting.
 
  This is very odd, because I could not get HttpSolrServer to optimize in
  the background, even when that was what I wanted.
 
  I wondered if maybe the Cloud object behaves differently with regard to
  blocking until an optimize is finished ... except that there is no code
  for optimizing in CloudSolrClient at all ... so I don't know where the
  different behavior would actually be happening.
 
  A more important question is, why are you optimising? Generally it isn't
  recommended anymore as it reduces the natural distribution of documents
  amongst segments and makes future merges more costly.
 
  Upayavira

Re: Index optimize runs in background.

2015-05-26 Thread Upayavira

Modassar,

Are you saying that the reason you are optimising is because you have
been doing it for years? If this is the only reason, you should stop
doing it immediately. 

The one scenario in which optimisation still makes some sense is when
you reindex every night and optimise straight after. This will leave you
with a single segment which will search faster.

However, if you are doing a lot of indexing, especially with
deletes/updates, you will have merged your content into a single segment
which will later need to be merged. That merge will be costly as it will
involve copying the entire content of your large segment, which will
impact performance.

Before Solr 3.6, Optimisation was necessary and recommended. At that
point (or a little before) the TieredMergePolicy became the default, and
this made optimisation generally unnecessary.

Upayavira

On Mon, May 25, 2015, at 07:17 AM, Modassar Ather wrote:
 Thanks everybody for your replies.
 
 I have noticed the optimization running in background every time I
 indexed.
 This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient.
 Kindly
 share your findings on this issue.
 
 Our index has almost 100M documents running on SolrCloud. We have been
 optimizing the index after indexing for years and it has worked well for
 us.
 
 Thanks,
 Modassar
 
 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
 erickerick...@gmail.com
 wrote:
 
  Actually, I've recently seen very similar behavior in Solr 4.10.3, but
  involving hard commits openSearcher=true, see:
  https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
  reproduce this at will, sii.
 
  A unit test should be very simple to write though, maybe I can get to it
  today.
 
  Erick
 
 
 
  On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
  
  
   On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
   On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits after the
invocation of optimize and the optimization keeps on running in the
background.
Kindly let me know if it is per design and how can I make my indexer
  to
wait until the optimization is over. Is there a
  configuration/parameter I
need to set for the same.
   
Please note that the same indexer with cloudSolrServer.optimize(true,
  true,
1) on Solr-4.10 used to wait till the optimize was over before
  exiting.
  
   This is very odd, because I could not get HttpSolrServer to optimize in
   the background, even when that was what I wanted.
  
   I wondered if maybe the Cloud object behaves differently with regard to
   blocking until an optimize is finished ... except that there is no code
   for optimizing in CloudSolrClient at all ... so I don't know where the
   different behavior would actually be happening.
  
   A more important question is, why are you optimising? Generally it isn't
   recommended anymore as it reduces the natural distribution of documents
   amongst segments and makes future merges more costly.
  
   Upayavira

Re: Index optimize runs in background.

2015-05-26 Thread Erick Erickson

No results yet. I finished the test harness last night (not really a
unit test, a stand-alone program that endlessly adds stuff and tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.


On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote:
 Hi,

 Erick you mentioned about a unit test to test the optimize running in
 background. Kindly share your findings if any.

 Thanks,
 Modassar

 On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com
 wrote:

 Thanks everybody for your replies.

 I have noticed the optimization running in background every time I
 indexed. This is 5 node cluster with solr-5.1.0 and uses the
 CloudSolrClient. Kindly share your findings on this issue.

 Our index has almost 100M documents running on SolrCloud. We have been
 optimizing the index after indexing for years and it has worked well for
 us.

 Thanks,
 Modassar

 On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Actually, I've recently seen very similar behavior in Solr 4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
 reproduce this at will, sii.

 A unit test should be very simple to write though, maybe I can get to it
 today.

 Erick



 On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
 
 
  On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
  On 5/21/2015 6:21 AM, Modassar Ather wrote:
   I am using Solr-5.1.0. I have an indexer class which invokes
   cloudSolrClient.optimize(true, true, 1). My indexer exits after the
   invocation of optimize and the optimization keeps on running in the
   background.
   Kindly let me know if it is per design and how can I make my indexer
 to
   wait until the optimization is over. Is there a
 configuration/parameter I
   need to set for the same.
  
   Please note that the same indexer with
 cloudSolrServer.optimize(true, true,
   1) on Solr-4.10 used to wait till the optimize was over before
 exiting.
 
  This is very odd, because I could not get HttpSolrServer to optimize in
  the background, even when that was what I wanted.
 
  I wondered if maybe the Cloud object behaves differently with regard to
  blocking until an optimize is finished ... except that there is no code
  for optimizing in CloudSolrClient at all ... so I don't know where the
  different behavior would actually be happening.
 
  A more important question is, why are you optimising? Generally it isn't
  recommended anymore as it reduces the natural distribution of documents
  amongst segments and makes future merges more costly.
 
  Upayavira

Re: Index optimize runs in background.

2015-05-26 Thread Shawn Heisey

On 5/26/2015 6:29 AM, Upayavira wrote:
 Are you saying that the reason you are optimising is because you have
 been doing it for years? If this is the only reason, you should stop
 doing it immediately. 

 The one scenario in which optimisation still makes some sense is when
 you reindex every night and optimise straight after. This will leave you
 with a single segment which will search faster.

 However, if you are doing a lot of indexing, especially with
 deletes/updates, you will have merged your content into a single segment
 which will later need to be merged. That merge will be costly as it will
 involve copying the entire content of your large segment, which will
 impact performance.

 Before Solr 3.6, Optimisation was necessary and recommended. At that
 point (or a little before) the TieredMergePolicy became the default, and
 this made optimisation generally unnecessary.

In general, I concur with this advice about optimizing.  Historically,
optimize was done for increased performance.  In older versions, an
unoptimized index performed *MUCH* worse than an index with a single
segment.  This is no longer the case today, mostly due to so many Lucene
features working on a per-segment basis.  A single segment does perform
faster, but the difference is much smaller than it used to be.

A full optimize on a large index requires a LOT of CPU and I/O resources
-- while the optimize is underway, performance is not very good.

There are,however, still times when running optimize is appropriate:

1) The index is mostly static, not receiving very frequent updates.
2) There is a large percentage of deleted documents in the index.

With modern Lucene/Solr and these use cases, the reasons for optimizing
are still performance-related, but the only time you should do an
optimize is when the benefit outweighs the cost.

For the 1) use case, the index will likely remain mostly-optimized for a
long period of time after the optimize is done, so the resources
required for the optimize are worth spending.

For the 2) use case, optimizing will reduce the size of the index
significantly, so general performance gets better.  That makes the cost
worthwhile.

Thanks,
Shawn

Re: Index optimize runs in background.

2015-05-26 Thread Alessandro Benedetti

I completely agree with Upayavira and Shawn.
Modassar, can you explain us how often do you index ?
Have you ever played with the merge Factor ?
I hardly think you need to optimise at all.
Simply a tuning of the merge Factor should solve all your issues .
I assume you were optimising only to have fast search, weren't you ?

Cheers

2015-05-26 16:07 GMT+01:00 Shawn Heisey apa...@elyograg.org:

 On 5/26/2015 6:29 AM, Upayavira wrote:
  Are you saying that the reason you are optimising is because you have
  been doing it for years? If this is the only reason, you should stop
  doing it immediately.
 
  The one scenario in which optimisation still makes some sense is when
  you reindex every night and optimise straight after. This will leave you
  with a single segment which will search faster.
 
  However, if you are doing a lot of indexing, especially with
  deletes/updates, you will have merged your content into a single segment
  which will later need to be merged. That merge will be costly as it will
  involve copying the entire content of your large segment, which will
  impact performance.
 
  Before Solr 3.6, Optimisation was necessary and recommended. At that
  point (or a little before) the TieredMergePolicy became the default, and
  this made optimisation generally unnecessary.

 In general, I concur with this advice about optimizing.  Historically,
 optimize was done for increased performance.  In older versions, an
 unoptimized index performed *MUCH* worse than an index with a single
 segment.  This is no longer the case today, mostly due to so many Lucene
 features working on a per-segment basis.  A single segment does perform
 faster, but the difference is much smaller than it used to be.

 A full optimize on a large index requires a LOT of CPU and I/O resources
 -- while the optimize is underway, performance is not very good.

 There are,however, still times when running optimize is appropriate:

 1) The index is mostly static, not receiving very frequent updates.
 2) There is a large percentage of deleted documents in the index.

 With modern Lucene/Solr and these use cases, the reasons for optimizing
 are still performance-related, but the only time you should do an
 optimize is when the benefit outweighs the cost.

 For the 1) use case, the index will likely remain mostly-optimized for a
 long period of time after the optimize is done, so the resources
 required for the optimize are worth spending.

 For the 2) use case, optimizing will reduce the size of the index
 significantly, so general performance gets better.  That makes the cost
 worthwhile.

 Thanks,
 Shawn




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England

Re: Index optimize runs in background.

2015-05-25 Thread Modassar Ather

Thanks everybody for your replies.

I have noticed the optimization running in background every time I indexed.
This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly
share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We have been
optimizing the index after indexing for years and it has worked well for
us.

Thanks,
Modassar

On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Actually, I've recently seen very similar behavior in Solr 4.10.3, but
 involving hard commits openSearcher=true, see:
 https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
 reproduce this at will, sii.

 A unit test should be very simple to write though, maybe I can get to it
 today.

 Erick



 On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:
 
 
  On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
  On 5/21/2015 6:21 AM, Modassar Ather wrote:
   I am using Solr-5.1.0. I have an indexer class which invokes
   cloudSolrClient.optimize(true, true, 1). My indexer exits after the
   invocation of optimize and the optimization keeps on running in the
   background.
   Kindly let me know if it is per design and how can I make my indexer
 to
   wait until the optimization is over. Is there a
 configuration/parameter I
   need to set for the same.
  
   Please note that the same indexer with cloudSolrServer.optimize(true,
 true,
   1) on Solr-4.10 used to wait till the optimize was over before
 exiting.
 
  This is very odd, because I could not get HttpSolrServer to optimize in
  the background, even when that was what I wanted.
 
  I wondered if maybe the Cloud object behaves differently with regard to
  blocking until an optimize is finished ... except that there is no code
  for optimizing in CloudSolrClient at all ... so I don't know where the
  different behavior would actually be happening.
 
  A more important question is, why are you optimising? Generally it isn't
  recommended anymore as it reduces the natural distribution of documents
  amongst segments and makes future merges more costly.
 
  Upayavira

Re: Index optimize runs in background.

2015-05-22 Thread Upayavira



On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
 On 5/21/2015 6:21 AM, Modassar Ather wrote:
  I am using Solr-5.1.0. I have an indexer class which invokes
  cloudSolrClient.optimize(true, true, 1). My indexer exits after the
  invocation of optimize and the optimization keeps on running in the
  background.
  Kindly let me know if it is per design and how can I make my indexer to
  wait until the optimization is over. Is there a configuration/parameter I
  need to set for the same.
  
  Please note that the same indexer with cloudSolrServer.optimize(true, true,
  1) on Solr-4.10 used to wait till the optimize was over before exiting.
 
 This is very odd, because I could not get HttpSolrServer to optimize in
 the background, even when that was what I wanted.
 
 I wondered if maybe the Cloud object behaves differently with regard to
 blocking until an optimize is finished ... except that there is no code
 for optimizing in CloudSolrClient at all ... so I don't know where the
 different behavior would actually be happening.

A more important question is, why are you optimising? Generally it isn't
recommended anymore as it reduces the natural distribution of documents
amongst segments and makes future merges more costly.

Upayavira

Re: Index optimize runs in background.

2015-05-22 Thread Shawn Heisey

On 5/21/2015 6:21 AM, Modassar Ather wrote:
 I am using Solr-5.1.0. I have an indexer class which invokes
 cloudSolrClient.optimize(true, true, 1). My indexer exits after the
 invocation of optimize and the optimization keeps on running in the
 background.
 Kindly let me know if it is per design and how can I make my indexer to
 wait until the optimization is over. Is there a configuration/parameter I
 need to set for the same.
 
 Please note that the same indexer with cloudSolrServer.optimize(true, true,
 1) on Solr-4.10 used to wait till the optimize was over before exiting.

This is very odd, because I could not get HttpSolrServer to optimize in
the background, even when that was what I wanted.

I wondered if maybe the Cloud object behaves differently with regard to
blocking until an optimize is finished ... except that there is no code
for optimizing in CloudSolrClient at all ... so I don't know where the
different behavior would actually be happening.

Thanks,
Shawn

Re: Index optimize runs in background.

2015-05-22 Thread Erick Erickson

Actually, I've recently seen very similar behavior in Solr 4.10.3, but
involving hard commits openSearcher=true, see:
https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
reproduce this at will, sii.

A unit test should be very simple to write though, maybe I can get to it today.

Erick



On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote:


 On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
 On 5/21/2015 6:21 AM, Modassar Ather wrote:
  I am using Solr-5.1.0. I have an indexer class which invokes
  cloudSolrClient.optimize(true, true, 1). My indexer exits after the
  invocation of optimize and the optimization keeps on running in the
  background.
  Kindly let me know if it is per design and how can I make my indexer to
  wait until the optimization is over. Is there a configuration/parameter I
  need to set for the same.
 
  Please note that the same indexer with cloudSolrServer.optimize(true, true,
  1) on Solr-4.10 used to wait till the optimize was over before exiting.

 This is very odd, because I could not get HttpSolrServer to optimize in
 the background, even when that was what I wanted.

 I wondered if maybe the Cloud object behaves differently with regard to
 blocking until an optimize is finished ... except that there is no code
 for optimizing in CloudSolrClient at all ... so I don't know where the
 different behavior would actually be happening.

 A more important question is, why are you optimising? Generally it isn't
 recommended anymore as it reduces the natural distribution of documents
 amongst segments and makes future merges more costly.

 Upayavira

Index optimize runs in background.

2015-05-21 Thread Modassar Ather

Hi,

I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits after the
invocation of optimize and the optimization keeps on running in the
background.
Kindly let me know if it is per design and how can I make my indexer to
wait until the optimization is over. Is there a configuration/parameter I
need to set for the same.

Please note that the same indexer with cloudSolrServer.optimize(true, true,
1) on Solr-4.10 used to wait till the optimize was over before exiting.

Thanks,
Modassar

Re: Index optimize runs in background.

2015-05-21 Thread Modassar Ather

Hi

An insight on the question will be really helpful.

Thanks,
Modassar

On Thu, May 21, 2015 at 5:51 PM, Modassar Ather modather1...@gmail.com
wrote:

 Hi,

 I am using Solr-5.1.0. I have an indexer class which invokes
 cloudSolrClient.optimize(true, true, 1). My indexer exits after the
 invocation of optimize and the optimization keeps on running in the
 background.
 Kindly let me know if it is per design and how can I make my indexer to
 wait until the optimization is over. Is there a configuration/parameter I
 need to set for the same.

 Please note that the same indexer with cloudSolrServer.optimize(true,
 true, 1) on Solr-4.10 used to wait till the optimize was over before
 exiting.

 Thanks,
 Modassar

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Re: Index optimize runs in background.

Index optimize runs in background.

Re: Index optimize runs in background.

26 matches

Site Navigation

Mail list logo

Footer information