Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Walter Underwood
More memory or faster disks will make a much bigger improvement than a forced 
merge.

What are you measuring? If it is average query time, that is not a good 
measure. Look at 90th or 95th percentile. Test with queries from logs.

No user can see a 10% or 20% difference. If your managers are watching that, 
they are watching the wrong thing.

If you are indexing once per week, you don't really need the complexity of Solr 
Cloud. You can do manual sharding.

wunder

On Jul 8, 2014, at 10:55 PM, Modassar Ather modather1...@gmail.com wrote:

 Our index has almost 100M documents running on SolrCloud of 3 shards and
 each shard has an index size of about 700GB (for the record, we are not
 using stored fields - our documents are pretty large). We perform a full
 indexing every weekend and during the week there are no updates made to the
 index. Most of the queries that we run are pretty complex with hundreds of
 terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
 and take many minutes to execute. A difference of 10-20% is also a big
 advantage for us.
 
 We have been optimizing the index after indexing for years and it has
 worked well for us. Every once in a while, we upgrade Solr to the latest
 version and try without optimizing so that we can save the many hours it
 take to optimize such a huge index, but it does not work well.
 
 Kindly provide your suggestion.
 
 Thanks,
 Modassar
 
 
 On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood wun...@wunderwood.org
 wrote:
 
 I seriously doubt that you are required to force merge.
 
 How much improvement? And is the big performance cost also OK?
 
 I have worked on search engines that do automatic merges and offer forced
 merges for over fifteen years. For all that time, forced merges have
 usually caused problems.
 
 Stop doing forced merges.
 
 wunder
 
 On Jul 8, 2014, at 10:09 PM, Modassar Ather modather1...@gmail.com
 wrote:
 
 Thanks Walter for your inputs.
 
 Our use case and performance benchmark requires us to invoke optimize.
 
 Here we see a chance of improvement in performance of optimize() if
 invoked
 in parallel.
 I found that if* distrib=false *is used, the optimization will happen in
 parallel.
 
 But I could not find a way to set it using
 HttpSolrServer/CloudSolrServer.
 Also with the parameter setting as given in my mail above does not seems
 to
 work.
 
 Please let me know in what ways I can achieve the parallel optimize on
 SolrCloud.
 
 Thanks,
 Modassar
 
 On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood wun...@wunderwood.org
 wrote:
 
 You probably do not need to force merge (mistakenly called optimize)
 your index.
 
 Solr does automatic merges, which work just fine.
 
 There are only a few situations where a forced merge is even a good
 idea.
 The most common one is a replicated (non-cloud) setup with a full
 reindex
 every night.
 
 If you need Solr Cloud, I cannot think of a situation where you would
 want
 a forced merge.
 
 wunder
 
 On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com
 wrote:
 
 Hi,
 
 Need to optimize index created using CloudSolrServer APIs under
 SolrCloud
 setup of 3 instances on separate machines. Currently it optimizes
 sequentially if I invoke cloudSolrServer.optimize().
 
 To make it parallel I tried making three separate HttpSolrServer
 instances
 and invoked httpSolrServer.opimize() on them parallely but still it
 seems
 to be doing optimization sequentially.
 
 I tried invoking optimize directly using HttpPost with following url
 and
 parameters but still it seems to be sequential.
 *URL* : http://host:port/solr/collection/update
 
 *Parameters*:
 params.add(new BasicNameValuePair(optimize, true));
 params.add(new BasicNameValuePair(maxSegments, 1));
 params.add(new BasicNameValuePair(waitFlush, true));
 params.add(new BasicNameValuePair(distrib, false));
 
 Kindly provide your suggestion and help.
 
 Regards,
 Modassar
 
 
 
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Shalin Shekhar Mangar
Hi Walter,

I wonder why you think SolrCloud isn't necessary if you're indexing once
per week. Isn't the automatic failover and auto-sharding still useful? One
can also do custom sharding with SolrCloud if necessary.


On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood wun...@wunderwood.org
wrote:

 More memory or faster disks will make a much bigger improvement than a
 forced merge.

 What are you measuring? If it is average query time, that is not a good
 measure. Look at 90th or 95th percentile. Test with queries from logs.

 No user can see a 10% or 20% difference. If your managers are watching
 that, they are watching the wrong thing.

 If you are indexing once per week, you don't really need the complexity of
 Solr Cloud. You can do manual sharding.

 wunder

 On Jul 8, 2014, at 10:55 PM, Modassar Ather modather1...@gmail.com
 wrote:

  Our index has almost 100M documents running on SolrCloud of 3 shards and
  each shard has an index size of about 700GB (for the record, we are not
  using stored fields - our documents are pretty large). We perform a full
  indexing every weekend and during the week there are no updates made to
 the
  index. Most of the queries that we run are pretty complex with hundreds
 of
  terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
  and take many minutes to execute. A difference of 10-20% is also a big
  advantage for us.
 
  We have been optimizing the index after indexing for years and it has
  worked well for us. Every once in a while, we upgrade Solr to the latest
  version and try without optimizing so that we can save the many hours it
  take to optimize such a huge index, but it does not work well.
 
  Kindly provide your suggestion.
 
  Thanks,
  Modassar
 
 
  On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood wun...@wunderwood.org
 
  wrote:
 
  I seriously doubt that you are required to force merge.
 
  How much improvement? And is the big performance cost also OK?
 
  I have worked on search engines that do automatic merges and offer
 forced
  merges for over fifteen years. For all that time, forced merges have
  usually caused problems.
 
  Stop doing forced merges.
 
  wunder
 
  On Jul 8, 2014, at 10:09 PM, Modassar Ather modather1...@gmail.com
  wrote:
 
  Thanks Walter for your inputs.
 
  Our use case and performance benchmark requires us to invoke optimize.
 
  Here we see a chance of improvement in performance of optimize() if
  invoked
  in parallel.
  I found that if* distrib=false *is used, the optimization will happen
 in
  parallel.
 
  But I could not find a way to set it using
  HttpSolrServer/CloudSolrServer.
  Also with the parameter setting as given in my mail above does not
 seems
  to
  work.
 
  Please let me know in what ways I can achieve the parallel optimize on
  SolrCloud.
 
  Thanks,
  Modassar
 
  On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood 
 wun...@wunderwood.org
  wrote:
 
  You probably do not need to force merge (mistakenly called optimize)
  your index.
 
  Solr does automatic merges, which work just fine.
 
  There are only a few situations where a forced merge is even a good
  idea.
  The most common one is a replicated (non-cloud) setup with a full
  reindex
  every night.
 
  If you need Solr Cloud, I cannot think of a situation where you would
  want
  a forced merge.
 
  wunder
 
  On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com
  wrote:
 
  Hi,
 
  Need to optimize index created using CloudSolrServer APIs under
  SolrCloud
  setup of 3 instances on separate machines. Currently it optimizes
  sequentially if I invoke cloudSolrServer.optimize().
 
  To make it parallel I tried making three separate HttpSolrServer
  instances
  and invoked httpSolrServer.opimize() on them parallely but still it
  seems
  to be doing optimization sequentially.
 
  I tried invoking optimize directly using HttpPost with following url
  and
  parameters but still it seems to be sequential.
  *URL* : http://host:port/solr/collection/update
 
  *Parameters*:
  params.add(new BasicNameValuePair(optimize, true));
  params.add(new BasicNameValuePair(maxSegments, 1));
  params.add(new BasicNameValuePair(waitFlush, true));
  params.add(new BasicNameValuePair(distrib, false));
 
  Kindly provide your suggestion and help.
 
  Regards,
  Modassar
 
 
 
 
 
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






-- 
Regards,
Shalin Shekhar Mangar.


Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Modassar Ather
Hi All,

Thanks for your kind suggestions and inputs.

We have been going the optimize way and it has helped. There have been
testing and benchmarking already done around memory and performance.
So while optimizing we see a scope of improvement on it by doing it
parallel so kindly suggest in what way it can be achieved.

Thanks,
Modassar


On Wed, Jul 9, 2014 at 11:48 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Hi Walter,

 I wonder why you think SolrCloud isn't necessary if you're indexing once
 per week. Isn't the automatic failover and auto-sharding still useful? One
 can also do custom sharding with SolrCloud if necessary.


 On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood wun...@wunderwood.org
 wrote:

  More memory or faster disks will make a much bigger improvement than a
  forced merge.
 
  What are you measuring? If it is average query time, that is not a good
  measure. Look at 90th or 95th percentile. Test with queries from logs.
 
  No user can see a 10% or 20% difference. If your managers are watching
  that, they are watching the wrong thing.
 
  If you are indexing once per week, you don't really need the complexity
 of
  Solr Cloud. You can do manual sharding.
 
  wunder
 
  On Jul 8, 2014, at 10:55 PM, Modassar Ather modather1...@gmail.com
  wrote:
 
   Our index has almost 100M documents running on SolrCloud of 3 shards
 and
   each shard has an index size of about 700GB (for the record, we are not
   using stored fields - our documents are pretty large). We perform a
 full
   indexing every weekend and during the week there are no updates made to
  the
   index. Most of the queries that we run are pretty complex with hundreds
  of
   terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
 etc.
   and take many minutes to execute. A difference of 10-20% is also a big
   advantage for us.
  
   We have been optimizing the index after indexing for years and it has
   worked well for us. Every once in a while, we upgrade Solr to the
 latest
   version and try without optimizing so that we can save the many hours
 it
   take to optimize such a huge index, but it does not work well.
  
   Kindly provide your suggestion.
  
   Thanks,
   Modassar
  
  
   On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood 
 wun...@wunderwood.org
  
   wrote:
  
   I seriously doubt that you are required to force merge.
  
   How much improvement? And is the big performance cost also OK?
  
   I have worked on search engines that do automatic merges and offer
  forced
   merges for over fifteen years. For all that time, forced merges have
   usually caused problems.
  
   Stop doing forced merges.
  
   wunder
  
   On Jul 8, 2014, at 10:09 PM, Modassar Ather modather1...@gmail.com
   wrote:
  
   Thanks Walter for your inputs.
  
   Our use case and performance benchmark requires us to invoke
 optimize.
  
   Here we see a chance of improvement in performance of optimize() if
   invoked
   in parallel.
   I found that if* distrib=false *is used, the optimization will happen
  in
   parallel.
  
   But I could not find a way to set it using
   HttpSolrServer/CloudSolrServer.
   Also with the parameter setting as given in my mail above does not
  seems
   to
   work.
  
   Please let me know in what ways I can achieve the parallel optimize
 on
   SolrCloud.
  
   Thanks,
   Modassar
  
   On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood 
  wun...@wunderwood.org
   wrote:
  
   You probably do not need to force merge (mistakenly called
 optimize)
   your index.
  
   Solr does automatic merges, which work just fine.
  
   There are only a few situations where a forced merge is even a good
   idea.
   The most common one is a replicated (non-cloud) setup with a full
   reindex
   every night.
  
   If you need Solr Cloud, I cannot think of a situation where you
 would
   want
   a forced merge.
  
   wunder
  
   On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com
   wrote:
  
   Hi,
  
   Need to optimize index created using CloudSolrServer APIs under
   SolrCloud
   setup of 3 instances on separate machines. Currently it optimizes
   sequentially if I invoke cloudSolrServer.optimize().
  
   To make it parallel I tried making three separate HttpSolrServer
   instances
   and invoked httpSolrServer.opimize() on them parallely but still it
   seems
   to be doing optimization sequentially.
  
   I tried invoking optimize directly using HttpPost with following
 url
   and
   parameters but still it seems to be sequential.
   *URL* : http://host:port/solr/collection/update
  
   *Parameters*:
   params.add(new BasicNameValuePair(optimize, true));
   params.add(new BasicNameValuePair(maxSegments, 1));
   params.add(new BasicNameValuePair(waitFlush, true));
   params.add(new BasicNameValuePair(distrib, false));
  
   Kindly provide your suggestion and help.
  
   Regards,
   Modassar
  
  
  
  
  
  
   --
   Walter Underwood
   wun...@wunderwood.org
  
  
  
  
 
  --
  

Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Timothy Potter
Hi Modassar,

Have you tried hitting the cores for each replica directly (instead of
using the collection)? i.e. if you had col_shard1_replica1 on node1,
then send the optimize command to that core URL directly:

curl -i -v http://host:port/solr/col_shard1_replica1/update; -H
'Content-type:application/xml' \
  --data-binary optimize/

I haven't tried this myself but might work ;-)

Tim

On Wed, Jul 9, 2014 at 12:59 AM, Modassar Ather modather1...@gmail.com wrote:
 Hi All,

 Thanks for your kind suggestions and inputs.

 We have been going the optimize way and it has helped. There have been
 testing and benchmarking already done around memory and performance.
 So while optimizing we see a scope of improvement on it by doing it
 parallel so kindly suggest in what way it can be achieved.

 Thanks,
 Modassar


 On Wed, Jul 9, 2014 at 11:48 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 Hi Walter,

 I wonder why you think SolrCloud isn't necessary if you're indexing once
 per week. Isn't the automatic failover and auto-sharding still useful? One
 can also do custom sharding with SolrCloud if necessary.


 On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood wun...@wunderwood.org
 wrote:

  More memory or faster disks will make a much bigger improvement than a
  forced merge.
 
  What are you measuring? If it is average query time, that is not a good
  measure. Look at 90th or 95th percentile. Test with queries from logs.
 
  No user can see a 10% or 20% difference. If your managers are watching
  that, they are watching the wrong thing.
 
  If you are indexing once per week, you don't really need the complexity
 of
  Solr Cloud. You can do manual sharding.
 
  wunder
 
  On Jul 8, 2014, at 10:55 PM, Modassar Ather modather1...@gmail.com
  wrote:
 
   Our index has almost 100M documents running on SolrCloud of 3 shards
 and
   each shard has an index size of about 700GB (for the record, we are not
   using stored fields - our documents are pretty large). We perform a
 full
   indexing every weekend and during the week there are no updates made to
  the
   index. Most of the queries that we run are pretty complex with hundreds
  of
   terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
 etc.
   and take many minutes to execute. A difference of 10-20% is also a big
   advantage for us.
  
   We have been optimizing the index after indexing for years and it has
   worked well for us. Every once in a while, we upgrade Solr to the
 latest
   version and try without optimizing so that we can save the many hours
 it
   take to optimize such a huge index, but it does not work well.
  
   Kindly provide your suggestion.
  
   Thanks,
   Modassar
  
  
   On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood 
 wun...@wunderwood.org
  
   wrote:
  
   I seriously doubt that you are required to force merge.
  
   How much improvement? And is the big performance cost also OK?
  
   I have worked on search engines that do automatic merges and offer
  forced
   merges for over fifteen years. For all that time, forced merges have
   usually caused problems.
  
   Stop doing forced merges.
  
   wunder
  
   On Jul 8, 2014, at 10:09 PM, Modassar Ather modather1...@gmail.com
   wrote:
  
   Thanks Walter for your inputs.
  
   Our use case and performance benchmark requires us to invoke
 optimize.
  
   Here we see a chance of improvement in performance of optimize() if
   invoked
   in parallel.
   I found that if* distrib=false *is used, the optimization will happen
  in
   parallel.
  
   But I could not find a way to set it using
   HttpSolrServer/CloudSolrServer.
   Also with the parameter setting as given in my mail above does not
  seems
   to
   work.
  
   Please let me know in what ways I can achieve the parallel optimize
 on
   SolrCloud.
  
   Thanks,
   Modassar
  
   On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood 
  wun...@wunderwood.org
   wrote:
  
   You probably do not need to force merge (mistakenly called
 optimize)
   your index.
  
   Solr does automatic merges, which work just fine.
  
   There are only a few situations where a forced merge is even a good
   idea.
   The most common one is a replicated (non-cloud) setup with a full
   reindex
   every night.
  
   If you need Solr Cloud, I cannot think of a situation where you
 would
   want
   a forced merge.
  
   wunder
  
   On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com
   wrote:
  
   Hi,
  
   Need to optimize index created using CloudSolrServer APIs under
   SolrCloud
   setup of 3 instances on separate machines. Currently it optimizes
   sequentially if I invoke cloudSolrServer.optimize().
  
   To make it parallel I tried making three separate HttpSolrServer
   instances
   and invoked httpSolrServer.opimize() on them parallely but still it
   seems
   to be doing optimization sequentially.
  
   I tried invoking optimize directly using HttpPost with following
 url
   and
   parameters but still it seems 

Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Shawn Heisey
On 7/9/2014 8:49 AM, Timothy Potter wrote:
 Hi Modassar,

 Have you tried hitting the cores for each replica directly (instead of
 using the collection)? i.e. if you had col_shard1_replica1 on node1,
 then send the optimize command to that core URL directly:

 curl -i -v http://host:port/solr/col_shard1_replica1/update; -H
 'Content-type:application/xml' \
   --data-binary optimize/

 I haven't tried this myself but might work ;-)

That doesn't work.  It will optimize the whole collection, one core at a
time.  I thought that sending the optimize with distrib=false would
limit the optimize to just the called core, but that also doesn't work. 
I thought a bug had been filed on the distrib=false problem, but it's
been long enough that I'm no longer sure about that.

Thanks,
Shawn



Re: Parallel optimize of index on SolrCloud.

2014-07-09 Thread Mark Miller
I think that’s pretty much a search time param, though it might end being used 
on the update side as well. In any case, I know it doesn’t affect commit or 
optimize.

Also, to my knowledge, SolrCloud optimize support was never explicitly added or 
tested.

--  
Mark Miller
about.me/markrmiller

On July 9, 2014 at 12:00:27 PM, Shawn Heisey (s...@elyograg.org) wrote:
  I thought a bug had been filed on the distrib=false problem,  



Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Walter Underwood
You probably do not need to force merge (mistakenly called optimize) your 
index.

Solr does automatic merges, which work just fine.

There are only a few situations where a forced merge is even a good idea. The 
most common one is a replicated (non-cloud) setup with a full reindex every 
night.

If you need Solr Cloud, I cannot think of a situation where you would want a 
forced merge.

wunder

On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com wrote:

 Hi,
 
 Need to optimize index created using CloudSolrServer APIs under SolrCloud
 setup of 3 instances on separate machines. Currently it optimizes
 sequentially if I invoke cloudSolrServer.optimize().
 
 To make it parallel I tried making three separate HttpSolrServer instances
 and invoked httpSolrServer.opimize() on them parallely but still it seems
 to be doing optimization sequentially.
 
 I tried invoking optimize directly using HttpPost with following url and
 parameters but still it seems to be sequential.
 *URL* : http://host:port/solr/collection/update
 
 *Parameters*:
 params.add(new BasicNameValuePair(optimize, true));
 params.add(new BasicNameValuePair(maxSegments, 1));
 params.add(new BasicNameValuePair(waitFlush, true));
 params.add(new BasicNameValuePair(distrib, false));
 
 Kindly provide your suggestion and help.
 
 Regards,
 Modassar






Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather
Thanks Walter for your inputs.

Our use case and performance benchmark requires us to invoke optimize.

Here we see a chance of improvement in performance of optimize() if invoked
in parallel.
I found that if* distrib=false *is used, the optimization will happen in
parallel.

But I could not find a way to set it using HttpSolrServer/CloudSolrServer.
Also with the parameter setting as given in my mail above does not seems to
work.

Please let me know in what ways I can achieve the parallel optimize on
SolrCloud.

Thanks,
Modassar



On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood wun...@wunderwood.org
wrote:

 You probably do not need to force merge (mistakenly called optimize)
 your index.

 Solr does automatic merges, which work just fine.

 There are only a few situations where a forced merge is even a good idea.
 The most common one is a replicated (non-cloud) setup with a full reindex
 every night.

 If you need Solr Cloud, I cannot think of a situation where you would want
 a forced merge.

 wunder

 On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com wrote:

  Hi,
 
  Need to optimize index created using CloudSolrServer APIs under SolrCloud
  setup of 3 instances on separate machines. Currently it optimizes
  sequentially if I invoke cloudSolrServer.optimize().
 
  To make it parallel I tried making three separate HttpSolrServer
 instances
  and invoked httpSolrServer.opimize() on them parallely but still it seems
  to be doing optimization sequentially.
 
  I tried invoking optimize directly using HttpPost with following url and
  parameters but still it seems to be sequential.
  *URL* : http://host:port/solr/collection/update
 
  *Parameters*:
  params.add(new BasicNameValuePair(optimize, true));
  params.add(new BasicNameValuePair(maxSegments, 1));
  params.add(new BasicNameValuePair(waitFlush, true));
  params.add(new BasicNameValuePair(distrib, false));
 
  Kindly provide your suggestion and help.
 
  Regards,
  Modassar







Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Walter Underwood
I seriously doubt that you are required to force merge.

How much improvement? And is the big performance cost also OK?

I have worked on search engines that do automatic merges and offer forced 
merges for over fifteen years. For all that time, forced merges have usually 
caused problems.

Stop doing forced merges.

wunder

On Jul 8, 2014, at 10:09 PM, Modassar Ather modather1...@gmail.com wrote:

 Thanks Walter for your inputs.
 
 Our use case and performance benchmark requires us to invoke optimize.
 
 Here we see a chance of improvement in performance of optimize() if invoked
 in parallel.
 I found that if* distrib=false *is used, the optimization will happen in
 parallel.
 
 But I could not find a way to set it using HttpSolrServer/CloudSolrServer.
 Also with the parameter setting as given in my mail above does not seems to
 work.
 
 Please let me know in what ways I can achieve the parallel optimize on
 SolrCloud.
 
 Thanks,
 Modassar
 
 On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood wun...@wunderwood.org
 wrote:
 
 You probably do not need to force merge (mistakenly called optimize)
 your index.
 
 Solr does automatic merges, which work just fine.
 
 There are only a few situations where a forced merge is even a good idea.
 The most common one is a replicated (non-cloud) setup with a full reindex
 every night.
 
 If you need Solr Cloud, I cannot think of a situation where you would want
 a forced merge.
 
 wunder
 
 On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com wrote:
 
 Hi,
 
 Need to optimize index created using CloudSolrServer APIs under SolrCloud
 setup of 3 instances on separate machines. Currently it optimizes
 sequentially if I invoke cloudSolrServer.optimize().
 
 To make it parallel I tried making three separate HttpSolrServer
 instances
 and invoked httpSolrServer.opimize() on them parallely but still it seems
 to be doing optimization sequentially.
 
 I tried invoking optimize directly using HttpPost with following url and
 parameters but still it seems to be sequential.
 *URL* : http://host:port/solr/collection/update
 
 *Parameters*:
 params.add(new BasicNameValuePair(optimize, true));
 params.add(new BasicNameValuePair(maxSegments, 1));
 params.add(new BasicNameValuePair(waitFlush, true));
 params.add(new BasicNameValuePair(distrib, false));
 
 Kindly provide your suggestion and help.
 
 Regards,
 Modassar
 
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Parallel optimize of index on SolrCloud.

2014-07-08 Thread Modassar Ather
Our index has almost 100M documents running on SolrCloud of 3 shards and
each shard has an index size of about 700GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to the
index. Most of the queries that we run are pretty complex with hundreds of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the latest
version and try without optimizing so that we can save the many hours it
take to optimize such a huge index, but it does not work well.

Kindly provide your suggestion.

Thanks,
Modassar


On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood wun...@wunderwood.org
wrote:

 I seriously doubt that you are required to force merge.

 How much improvement? And is the big performance cost also OK?

 I have worked on search engines that do automatic merges and offer forced
 merges for over fifteen years. For all that time, forced merges have
 usually caused problems.

 Stop doing forced merges.

 wunder

 On Jul 8, 2014, at 10:09 PM, Modassar Ather modather1...@gmail.com
 wrote:

  Thanks Walter for your inputs.
 
  Our use case and performance benchmark requires us to invoke optimize.
 
  Here we see a chance of improvement in performance of optimize() if
 invoked
  in parallel.
  I found that if* distrib=false *is used, the optimization will happen in
  parallel.
 
  But I could not find a way to set it using
 HttpSolrServer/CloudSolrServer.
  Also with the parameter setting as given in my mail above does not seems
 to
  work.
 
  Please let me know in what ways I can achieve the parallel optimize on
  SolrCloud.
 
  Thanks,
  Modassar
 
  On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood wun...@wunderwood.org
  wrote:
 
  You probably do not need to force merge (mistakenly called optimize)
  your index.
 
  Solr does automatic merges, which work just fine.
 
  There are only a few situations where a forced merge is even a good
 idea.
  The most common one is a replicated (non-cloud) setup with a full
 reindex
  every night.
 
  If you need Solr Cloud, I cannot think of a situation where you would
 want
  a forced merge.
 
  wunder
 
  On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com
 wrote:
 
  Hi,
 
  Need to optimize index created using CloudSolrServer APIs under
 SolrCloud
  setup of 3 instances on separate machines. Currently it optimizes
  sequentially if I invoke cloudSolrServer.optimize().
 
  To make it parallel I tried making three separate HttpSolrServer
  instances
  and invoked httpSolrServer.opimize() on them parallely but still it
 seems
  to be doing optimization sequentially.
 
  I tried invoking optimize directly using HttpPost with following url
 and
  parameters but still it seems to be sequential.
  *URL* : http://host:port/solr/collection/update
 
  *Parameters*:
  params.add(new BasicNameValuePair(optimize, true));
  params.add(new BasicNameValuePair(maxSegments, 1));
  params.add(new BasicNameValuePair(waitFlush, true));
  params.add(new BasicNameValuePair(distrib, false));
 
  Kindly provide your suggestion and help.
 
  Regards,
  Modassar
 
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org