Re: Using solr during optimization

2011-11-15 Thread Isan Fulia
Hi Mark,

Thanks for the reply.

You are right.We need to test first by decreasing the mergefactor and see
the indexing as well as searching performance and have some numbers in hand.
Also after partial optimize with the same mergefactor how long the
performance lasts(both searching and indexing)  by continuously adding more
documents.

Thanks,
Isan Fulia,.

On 14 November 2011 19:41, Mark Miller markrmil...@gmail.com wrote:


 On Nov 14, 2011, at 8:27 AM, Isan Fulia wrote:

  Hi Mark,
 
  In the above case , what if  the index is optimized partly ie. by
  specifying the max no of segments we want.
  It has been observed that after optimizing(even partly optimization), the
  indexing as well as searching had been faster than in case of an
  unoptimized one.

 Yes, this remains true - searching against fewer segments is faster than
 searching against many segments. Unless you have a really high merge
 factor, this is just generally not a big deal IMO.

 It tends to be something like, a given query is say 10-30% slower. If you
 have good performance though, this should often be something like a 50ms
 query goes to 80 or 90ms. You really have to decide/test if there is a
 practical difference to your users.

 You should also pay attention to how long that perf improvement lasts
 while you are continuously adding more documents. Is it a super high cost
 for a short perf boost?

  Decreasing the merge factor will affect  the performance as it will
  increase the indexing time due to the frequent merges.

 True - it will essentially amortize the cost of reducing segments. Have
 you tested lower merge factors though? Does it really slow down indexing to
 the point where you find it unacceptable? I've been surprised in the past.
 Usually you can find a pretty nice balance.

  So is it good that we optimize partly(let say once in a month), rather
 than
  decreasing the merge factor and affect  the indexing speed.Also since we
  will be sharding, that 100 GB index will be divided in different shards.

 Partial optimize is a good option, and optimize is an option. They both
 exist for a reason ;) Many people pay the price because they assume they
 have to though, when they really have no practical need.

 Generally, the best way to manage the number of segments in your index is
 through the merge policy IMO - not necessarily optimize calls.

 I'm pretty sure optimize also blocks adds in previous version of Solr as
 well - it grabs the commit lock. It won't do that in Solr 4, but that is
 another reason I wouldn't recommend it under normal circumstances.

 I look at optimize as a last option, or when creating a static index
 personally.

 
  Thanks,
  Isan Fulia.
 
 
 
  On 14 November 2011 11:28, Kalika Mishra kalika.mis...@germinait.com
 wrote:
 
  Hi Mark,
 
  Thanks for your reply.
 
  What you saying is interesting; so are you suggesting that optimizations
  should be done usually when there not many updates. Also can you please
  point out further under what conditions optimizations might be
 beneficial.
 
  Thanks.
 
  On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote:
 
  I would not optimize - it's very expensive. With 11,000 updates a day,
 I
  think it makes sense to completely avoid optimizing.
 
  That should be your default move in any case. If you notice performance
  suffers more than is acceptable (good chance you won't), then I'd use a
  lower merge factor. It defaults to 10 - lower numbers will lower the
  number
  of segments in your index, and essentially amortize the cost of an
  optimize.
 
  Optimize is generally only useful when you will have a mostly static
  index.
 
  - Mark Miller
  lucidimagination.com
 
 
  On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
 
  Hi Mark,
 
  We are performing almost 11,000 updates a day, we have around 50
  million
  docs in the index (i understand we will need to shard) the core seg
  will
  get fragmented over a period of time. We will need to do optimize
 every
  few
  days or once in a month; do you have any reason not to optimize the
  core.
  Please let me know.
 
  Thanks.
 
  On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote:
 
  Do a you have something forcing you to optimize, or are you just
 doing
  it
  for the heck of it?
 
  On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
 
  Hi,
 
  I would like to optimize solr core which is in Reader Writer mode.
  Since
  the Solr cores are huge in size (above 100 GB) the optimization
 takes
  hours
  to complete.
 
  When the optimization is going on say. on the Writer core, the
  application
  wants to continue using the indexes for both query and write
  purposes.
  What
  is the best approach to do this.
 
  I was thinking of using a temporary index (empty core) to write the
  documents and use the same Reader to read the documents. (Please
 note
  that
  temp index and the Reader cannot be made Reader Writer as Reader is
  already
  setup for the Writer on which 

Re: Using solr during optimization

2011-11-14 Thread Isan Fulia
Hi Mark,

In the above case , what if  the index is optimized partly ie. by
specifying the max no of segments we want.
It has been observed that after optimizing(even partly optimization), the
indexing as well as searching had been faster than in case of an
unoptimized one.
Decreasing the merge factor will affect  the performance as it will
increase the indexing time due to the frequent merges.
So is it good that we optimize partly(let say once in a month), rather than
decreasing the merge factor and affect  the indexing speed.Also since we
will be sharding, that 100 GB index will be divided in different shards.

Thanks,
Isan Fulia.



On 14 November 2011 11:28, Kalika Mishra kalika.mis...@germinait.comwrote:

 Hi Mark,

 Thanks for your reply.

 What you saying is interesting; so are you suggesting that optimizations
 should be done usually when there not many updates. Also can you please
 point out further under what conditions optimizations might be beneficial.

 Thanks.

 On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote:

  I would not optimize - it's very expensive. With 11,000 updates a day, I
  think it makes sense to completely avoid optimizing.
 
  That should be your default move in any case. If you notice performance
  suffers more than is acceptable (good chance you won't), then I'd use a
  lower merge factor. It defaults to 10 - lower numbers will lower the
 number
  of segments in your index, and essentially amortize the cost of an
 optimize.
 
  Optimize is generally only useful when you will have a mostly static
 index.
 
  - Mark Miller
  lucidimagination.com
 
 
  On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
 
   Hi Mark,
  
   We are performing almost 11,000 updates a day, we have around 50
 million
   docs in the index (i understand we will need to shard) the core seg
 will
   get fragmented over a period of time. We will need to do optimize every
  few
   days or once in a month; do you have any reason not to optimize the
 core.
   Please let me know.
  
   Thanks.
  
   On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote:
  
   Do a you have something forcing you to optimize, or are you just doing
  it
   for the heck of it?
  
   On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
  
   Hi,
  
   I would like to optimize solr core which is in Reader Writer mode.
  Since
   the Solr cores are huge in size (above 100 GB) the optimization takes
   hours
   to complete.
  
   When the optimization is going on say. on the Writer core, the
   application
   wants to continue using the indexes for both query and write
 purposes.
   What
   is the best approach to do this.
  
   I was thinking of using a temporary index (empty core) to write the
   documents and use the same Reader to read the documents. (Please note
   that
   temp index and the Reader cannot be made Reader Writer as Reader is
   already
   setup for the Writer on which optimization is taking place) But there
   could
   be some updates to the temp index which I would like to get reflected
  in
   the Reader. Whats the best setup to support this.
  
   Thanks,
   Kalika
  
   - Mark Miller
   lucidimagination.com
  
  
  
  
  
  
  
  
  
  
  
  
  
  
   --
   Thanks  Regards,
   Kalika
 
 
 
 
 
 
 
 
 
 
 
 
 


 --
 Thanks  Regards,
 Kalika




-- 
Thanks  Regards,
Isan Fulia.


Re: Using solr during optimization

2011-11-14 Thread Mark Miller

On Nov 14, 2011, at 8:27 AM, Isan Fulia wrote:

 Hi Mark,
 
 In the above case , what if  the index is optimized partly ie. by
 specifying the max no of segments we want.
 It has been observed that after optimizing(even partly optimization), the
 indexing as well as searching had been faster than in case of an
 unoptimized one.

Yes, this remains true - searching against fewer segments is faster than 
searching against many segments. Unless you have a really high merge factor, 
this is just generally not a big deal IMO.

It tends to be something like, a given query is say 10-30% slower. If you have 
good performance though, this should often be something like a 50ms query goes 
to 80 or 90ms. You really have to decide/test if there is a practical 
difference to your users.

You should also pay attention to how long that perf improvement lasts while you 
are continuously adding more documents. Is it a super high cost for a short 
perf boost?

 Decreasing the merge factor will affect  the performance as it will
 increase the indexing time due to the frequent merges.

True - it will essentially amortize the cost of reducing segments. Have you 
tested lower merge factors though? Does it really slow down indexing to the 
point where you find it unacceptable? I've been surprised in the past. Usually 
you can find a pretty nice balance.

 So is it good that we optimize partly(let say once in a month), rather than
 decreasing the merge factor and affect  the indexing speed.Also since we
 will be sharding, that 100 GB index will be divided in different shards.

Partial optimize is a good option, and optimize is an option. They both exist 
for a reason ;) Many people pay the price because they assume they have to 
though, when they really have no practical need.

Generally, the best way to manage the number of segments in your index is 
through the merge policy IMO - not necessarily optimize calls.

I'm pretty sure optimize also blocks adds in previous version of Solr as well - 
it grabs the commit lock. It won't do that in Solr 4, but that is another 
reason I wouldn't recommend it under normal circumstances.

I look at optimize as a last option, or when creating a static index personally.

 
 Thanks,
 Isan Fulia.
 
 
 
 On 14 November 2011 11:28, Kalika Mishra kalika.mis...@germinait.comwrote:
 
 Hi Mark,
 
 Thanks for your reply.
 
 What you saying is interesting; so are you suggesting that optimizations
 should be done usually when there not many updates. Also can you please
 point out further under what conditions optimizations might be beneficial.
 
 Thanks.
 
 On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote:
 
 I would not optimize - it's very expensive. With 11,000 updates a day, I
 think it makes sense to completely avoid optimizing.
 
 That should be your default move in any case. If you notice performance
 suffers more than is acceptable (good chance you won't), then I'd use a
 lower merge factor. It defaults to 10 - lower numbers will lower the
 number
 of segments in your index, and essentially amortize the cost of an
 optimize.
 
 Optimize is generally only useful when you will have a mostly static
 index.
 
 - Mark Miller
 lucidimagination.com
 
 
 On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
 
 Hi Mark,
 
 We are performing almost 11,000 updates a day, we have around 50
 million
 docs in the index (i understand we will need to shard) the core seg
 will
 get fragmented over a period of time. We will need to do optimize every
 few
 days or once in a month; do you have any reason not to optimize the
 core.
 Please let me know.
 
 Thanks.
 
 On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote:
 
 Do a you have something forcing you to optimize, or are you just doing
 it
 for the heck of it?
 
 On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
 
 Hi,
 
 I would like to optimize solr core which is in Reader Writer mode.
 Since
 the Solr cores are huge in size (above 100 GB) the optimization takes
 hours
 to complete.
 
 When the optimization is going on say. on the Writer core, the
 application
 wants to continue using the indexes for both query and write
 purposes.
 What
 is the best approach to do this.
 
 I was thinking of using a temporary index (empty core) to write the
 documents and use the same Reader to read the documents. (Please note
 that
 temp index and the Reader cannot be made Reader Writer as Reader is
 already
 setup for the Writer on which optimization is taking place) But there
 could
 be some updates to the temp index which I would like to get reflected
 in
 the Reader. Whats the best setup to support this.
 
 Thanks,
 Kalika
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Thanks  Regards,
 Kalika
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Thanks  Regards,
 Kalika
 
 
 
 
 -- 
 Thanks  Regards,
 Isan Fulia.

- Mark Miller
lucidimagination.com













Re: Using solr during optimization

2011-11-13 Thread Kalika Mishra
Hi Mark,

Thanks for your reply.

What you saying is interesting; so are you suggesting that optimizations
should be done usually when there not many updates. Also can you please
point out further under what conditions optimizations might be beneficial.

Thanks.

On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote:

 I would not optimize - it's very expensive. With 11,000 updates a day, I
 think it makes sense to completely avoid optimizing.

 That should be your default move in any case. If you notice performance
 suffers more than is acceptable (good chance you won't), then I'd use a
 lower merge factor. It defaults to 10 - lower numbers will lower the number
 of segments in your index, and essentially amortize the cost of an optimize.

 Optimize is generally only useful when you will have a mostly static index.

 - Mark Miller
 lucidimagination.com


 On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:

  Hi Mark,
 
  We are performing almost 11,000 updates a day, we have around 50 million
  docs in the index (i understand we will need to shard) the core seg will
  get fragmented over a period of time. We will need to do optimize every
 few
  days or once in a month; do you have any reason not to optimize the core.
  Please let me know.
 
  Thanks.
 
  On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote:
 
  Do a you have something forcing you to optimize, or are you just doing
 it
  for the heck of it?
 
  On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
 
  Hi,
 
  I would like to optimize solr core which is in Reader Writer mode.
 Since
  the Solr cores are huge in size (above 100 GB) the optimization takes
  hours
  to complete.
 
  When the optimization is going on say. on the Writer core, the
  application
  wants to continue using the indexes for both query and write purposes.
  What
  is the best approach to do this.
 
  I was thinking of using a temporary index (empty core) to write the
  documents and use the same Reader to read the documents. (Please note
  that
  temp index and the Reader cannot be made Reader Writer as Reader is
  already
  setup for the Writer on which optimization is taking place) But there
  could
  be some updates to the temp index which I would like to get reflected
 in
  the Reader. Whats the best setup to support this.
 
  Thanks,
  Kalika
 
  - Mark Miller
  lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  --
  Thanks  Regards,
  Kalika















-- 
Thanks  Regards,
Kalika


Using solr during optimization

2011-11-11 Thread Kalika Mishra
Hi,

I would like to optimize solr core which is in Reader Writer mode. Since
the Solr cores are huge in size (above 100 GB) the optimization takes hours
to complete.

When the optimization is going on say. on the Writer core, the application
wants to continue using the indexes for both query and write purposes. What
is the best approach to do this.

I was thinking of using a temporary index (empty core) to write the
documents and use the same Reader to read the documents. (Please note that
temp index and the Reader cannot be made Reader Writer as Reader is already
setup for the Writer on which optimization is taking place) But there could
be some updates to the temp index which I would like to get reflected in
the Reader. Whats the best setup to support this.

Thanks,
Kalika


Re: Using solr during optimization

2011-11-11 Thread Mark Miller
Do a you have something forcing you to optimize, or are you just doing it for 
the heck of it?

On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:

 Hi,
 
 I would like to optimize solr core which is in Reader Writer mode. Since
 the Solr cores are huge in size (above 100 GB) the optimization takes hours
 to complete.
 
 When the optimization is going on say. on the Writer core, the application
 wants to continue using the indexes for both query and write purposes. What
 is the best approach to do this.
 
 I was thinking of using a temporary index (empty core) to write the
 documents and use the same Reader to read the documents. (Please note that
 temp index and the Reader cannot be made Reader Writer as Reader is already
 setup for the Writer on which optimization is taking place) But there could
 be some updates to the temp index which I would like to get reflected in
 the Reader. Whats the best setup to support this.
 
 Thanks,
 Kalika

- Mark Miller
lucidimagination.com













Re: Using solr during optimization

2011-11-11 Thread Kalika Mishra
Hi Mark,

We are performing almost 11,000 updates a day, we have around 50 million
docs in the index (i understand we will need to shard) the core seg will
get fragmented over a period of time. We will need to do optimize every few
days or once in a month; do you have any reason not to optimize the core.
Please let me know.

Thanks.

On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote:

 Do a you have something forcing you to optimize, or are you just doing it
 for the heck of it?

 On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:

  Hi,
 
  I would like to optimize solr core which is in Reader Writer mode. Since
  the Solr cores are huge in size (above 100 GB) the optimization takes
 hours
  to complete.
 
  When the optimization is going on say. on the Writer core, the
 application
  wants to continue using the indexes for both query and write purposes.
 What
  is the best approach to do this.
 
  I was thinking of using a temporary index (empty core) to write the
  documents and use the same Reader to read the documents. (Please note
 that
  temp index and the Reader cannot be made Reader Writer as Reader is
 already
  setup for the Writer on which optimization is taking place) But there
 could
  be some updates to the temp index which I would like to get reflected in
  the Reader. Whats the best setup to support this.
 
  Thanks,
  Kalika

 - Mark Miller
 lucidimagination.com














-- 
Thanks  Regards,
Kalika


Re: Using solr during optimization

2011-11-11 Thread Mark Miller
I would not optimize - it's very expensive. With 11,000 updates a day, I think 
it makes sense to completely avoid optimizing.

That should be your default move in any case. If you notice performance suffers 
more than is acceptable (good chance you won't), then I'd use a lower merge 
factor. It defaults to 10 - lower numbers will lower the number of segments in 
your index, and essentially amortize the cost of an optimize.

Optimize is generally only useful when you will have a mostly static index.

- Mark Miller
lucidimagination.com


On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:

 Hi Mark,
 
 We are performing almost 11,000 updates a day, we have around 50 million
 docs in the index (i understand we will need to shard) the core seg will
 get fragmented over a period of time. We will need to do optimize every few
 days or once in a month; do you have any reason not to optimize the core.
 Please let me know.
 
 Thanks.
 
 On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote:
 
 Do a you have something forcing you to optimize, or are you just doing it
 for the heck of it?
 
 On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
 
 Hi,
 
 I would like to optimize solr core which is in Reader Writer mode. Since
 the Solr cores are huge in size (above 100 GB) the optimization takes
 hours
 to complete.
 
 When the optimization is going on say. on the Writer core, the
 application
 wants to continue using the indexes for both query and write purposes.
 What
 is the best approach to do this.
 
 I was thinking of using a temporary index (empty core) to write the
 documents and use the same Reader to read the documents. (Please note
 that
 temp index and the Reader cannot be made Reader Writer as Reader is
 already
 setup for the Writer on which optimization is taking place) But there
 could
 be some updates to the temp index which I would like to get reflected in
 the Reader. Whats the best setup to support this.
 
 Thanks,
 Kalika
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 -- 
 Thanks  Regards,
 Kalika