Re: Using solr during optimization
Hi Mark, Thanks for the reply. You are right.We need to test first by decreasing the mergefactor and see the indexing as well as searching performance and have some numbers in hand. Also after partial optimize with the same mergefactor how long the performance lasts(both searching and indexing) by continuously adding more documents. Thanks, Isan Fulia,. On 14 November 2011 19:41, Mark Miller markrmil...@gmail.com wrote: On Nov 14, 2011, at 8:27 AM, Isan Fulia wrote: Hi Mark, In the above case , what if the index is optimized partly ie. by specifying the max no of segments we want. It has been observed that after optimizing(even partly optimization), the indexing as well as searching had been faster than in case of an unoptimized one. Yes, this remains true - searching against fewer segments is faster than searching against many segments. Unless you have a really high merge factor, this is just generally not a big deal IMO. It tends to be something like, a given query is say 10-30% slower. If you have good performance though, this should often be something like a 50ms query goes to 80 or 90ms. You really have to decide/test if there is a practical difference to your users. You should also pay attention to how long that perf improvement lasts while you are continuously adding more documents. Is it a super high cost for a short perf boost? Decreasing the merge factor will affect the performance as it will increase the indexing time due to the frequent merges. True - it will essentially amortize the cost of reducing segments. Have you tested lower merge factors though? Does it really slow down indexing to the point where you find it unacceptable? I've been surprised in the past. Usually you can find a pretty nice balance. So is it good that we optimize partly(let say once in a month), rather than decreasing the merge factor and affect the indexing speed.Also since we will be sharding, that 100 GB index will be divided in different shards. Partial optimize is a good option, and optimize is an option. They both exist for a reason ;) Many people pay the price because they assume they have to though, when they really have no practical need. Generally, the best way to manage the number of segments in your index is through the merge policy IMO - not necessarily optimize calls. I'm pretty sure optimize also blocks adds in previous version of Solr as well - it grabs the commit lock. It won't do that in Solr 4, but that is another reason I wouldn't recommend it under normal circumstances. I look at optimize as a last option, or when creating a static index personally. Thanks, Isan Fulia. On 14 November 2011 11:28, Kalika Mishra kalika.mis...@germinait.com wrote: Hi Mark, Thanks for your reply. What you saying is interesting; so are you suggesting that optimizations should be done usually when there not many updates. Also can you please point out further under what conditions optimizations might be beneficial. Thanks. On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote: I would not optimize - it's very expensive. With 11,000 updates a day, I think it makes sense to completely avoid optimizing. That should be your default move in any case. If you notice performance suffers more than is acceptable (good chance you won't), then I'd use a lower merge factor. It defaults to 10 - lower numbers will lower the number of segments in your index, and essentially amortize the cost of an optimize. Optimize is generally only useful when you will have a mostly static index. - Mark Miller lucidimagination.com On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote: Hi Mark, We are performing almost 11,000 updates a day, we have around 50 million docs in the index (i understand we will need to shard) the core seg will get fragmented over a period of time. We will need to do optimize every few days or once in a month; do you have any reason not to optimize the core. Please let me know. Thanks. On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote: Do a you have something forcing you to optimize, or are you just doing it for the heck of it? On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote: Hi, I would like to optimize solr core which is in Reader Writer mode. Since the Solr cores are huge in size (above 100 GB) the optimization takes hours to complete. When the optimization is going on say. on the Writer core, the application wants to continue using the indexes for both query and write purposes. What is the best approach to do this. I was thinking of using a temporary index (empty core) to write the documents and use the same Reader to read the documents. (Please note that temp index and the Reader cannot be made Reader Writer as Reader is already setup for the Writer on which
Re: Using solr during optimization
Hi Mark, In the above case , what if the index is optimized partly ie. by specifying the max no of segments we want. It has been observed that after optimizing(even partly optimization), the indexing as well as searching had been faster than in case of an unoptimized one. Decreasing the merge factor will affect the performance as it will increase the indexing time due to the frequent merges. So is it good that we optimize partly(let say once in a month), rather than decreasing the merge factor and affect the indexing speed.Also since we will be sharding, that 100 GB index will be divided in different shards. Thanks, Isan Fulia. On 14 November 2011 11:28, Kalika Mishra kalika.mis...@germinait.comwrote: Hi Mark, Thanks for your reply. What you saying is interesting; so are you suggesting that optimizations should be done usually when there not many updates. Also can you please point out further under what conditions optimizations might be beneficial. Thanks. On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote: I would not optimize - it's very expensive. With 11,000 updates a day, I think it makes sense to completely avoid optimizing. That should be your default move in any case. If you notice performance suffers more than is acceptable (good chance you won't), then I'd use a lower merge factor. It defaults to 10 - lower numbers will lower the number of segments in your index, and essentially amortize the cost of an optimize. Optimize is generally only useful when you will have a mostly static index. - Mark Miller lucidimagination.com On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote: Hi Mark, We are performing almost 11,000 updates a day, we have around 50 million docs in the index (i understand we will need to shard) the core seg will get fragmented over a period of time. We will need to do optimize every few days or once in a month; do you have any reason not to optimize the core. Please let me know. Thanks. On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote: Do a you have something forcing you to optimize, or are you just doing it for the heck of it? On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote: Hi, I would like to optimize solr core which is in Reader Writer mode. Since the Solr cores are huge in size (above 100 GB) the optimization takes hours to complete. When the optimization is going on say. on the Writer core, the application wants to continue using the indexes for both query and write purposes. What is the best approach to do this. I was thinking of using a temporary index (empty core) to write the documents and use the same Reader to read the documents. (Please note that temp index and the Reader cannot be made Reader Writer as Reader is already setup for the Writer on which optimization is taking place) But there could be some updates to the temp index which I would like to get reflected in the Reader. Whats the best setup to support this. Thanks, Kalika - Mark Miller lucidimagination.com -- Thanks Regards, Kalika -- Thanks Regards, Kalika -- Thanks Regards, Isan Fulia.
Re: Using solr during optimization
On Nov 14, 2011, at 8:27 AM, Isan Fulia wrote: Hi Mark, In the above case , what if the index is optimized partly ie. by specifying the max no of segments we want. It has been observed that after optimizing(even partly optimization), the indexing as well as searching had been faster than in case of an unoptimized one. Yes, this remains true - searching against fewer segments is faster than searching against many segments. Unless you have a really high merge factor, this is just generally not a big deal IMO. It tends to be something like, a given query is say 10-30% slower. If you have good performance though, this should often be something like a 50ms query goes to 80 or 90ms. You really have to decide/test if there is a practical difference to your users. You should also pay attention to how long that perf improvement lasts while you are continuously adding more documents. Is it a super high cost for a short perf boost? Decreasing the merge factor will affect the performance as it will increase the indexing time due to the frequent merges. True - it will essentially amortize the cost of reducing segments. Have you tested lower merge factors though? Does it really slow down indexing to the point where you find it unacceptable? I've been surprised in the past. Usually you can find a pretty nice balance. So is it good that we optimize partly(let say once in a month), rather than decreasing the merge factor and affect the indexing speed.Also since we will be sharding, that 100 GB index will be divided in different shards. Partial optimize is a good option, and optimize is an option. They both exist for a reason ;) Many people pay the price because they assume they have to though, when they really have no practical need. Generally, the best way to manage the number of segments in your index is through the merge policy IMO - not necessarily optimize calls. I'm pretty sure optimize also blocks adds in previous version of Solr as well - it grabs the commit lock. It won't do that in Solr 4, but that is another reason I wouldn't recommend it under normal circumstances. I look at optimize as a last option, or when creating a static index personally. Thanks, Isan Fulia. On 14 November 2011 11:28, Kalika Mishra kalika.mis...@germinait.comwrote: Hi Mark, Thanks for your reply. What you saying is interesting; so are you suggesting that optimizations should be done usually when there not many updates. Also can you please point out further under what conditions optimizations might be beneficial. Thanks. On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote: I would not optimize - it's very expensive. With 11,000 updates a day, I think it makes sense to completely avoid optimizing. That should be your default move in any case. If you notice performance suffers more than is acceptable (good chance you won't), then I'd use a lower merge factor. It defaults to 10 - lower numbers will lower the number of segments in your index, and essentially amortize the cost of an optimize. Optimize is generally only useful when you will have a mostly static index. - Mark Miller lucidimagination.com On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote: Hi Mark, We are performing almost 11,000 updates a day, we have around 50 million docs in the index (i understand we will need to shard) the core seg will get fragmented over a period of time. We will need to do optimize every few days or once in a month; do you have any reason not to optimize the core. Please let me know. Thanks. On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote: Do a you have something forcing you to optimize, or are you just doing it for the heck of it? On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote: Hi, I would like to optimize solr core which is in Reader Writer mode. Since the Solr cores are huge in size (above 100 GB) the optimization takes hours to complete. When the optimization is going on say. on the Writer core, the application wants to continue using the indexes for both query and write purposes. What is the best approach to do this. I was thinking of using a temporary index (empty core) to write the documents and use the same Reader to read the documents. (Please note that temp index and the Reader cannot be made Reader Writer as Reader is already setup for the Writer on which optimization is taking place) But there could be some updates to the temp index which I would like to get reflected in the Reader. Whats the best setup to support this. Thanks, Kalika - Mark Miller lucidimagination.com -- Thanks Regards, Kalika -- Thanks Regards, Kalika -- Thanks Regards, Isan Fulia. - Mark Miller lucidimagination.com
Re: Using solr during optimization
Hi Mark, Thanks for your reply. What you saying is interesting; so are you suggesting that optimizations should be done usually when there not many updates. Also can you please point out further under what conditions optimizations might be beneficial. Thanks. On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote: I would not optimize - it's very expensive. With 11,000 updates a day, I think it makes sense to completely avoid optimizing. That should be your default move in any case. If you notice performance suffers more than is acceptable (good chance you won't), then I'd use a lower merge factor. It defaults to 10 - lower numbers will lower the number of segments in your index, and essentially amortize the cost of an optimize. Optimize is generally only useful when you will have a mostly static index. - Mark Miller lucidimagination.com On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote: Hi Mark, We are performing almost 11,000 updates a day, we have around 50 million docs in the index (i understand we will need to shard) the core seg will get fragmented over a period of time. We will need to do optimize every few days or once in a month; do you have any reason not to optimize the core. Please let me know. Thanks. On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote: Do a you have something forcing you to optimize, or are you just doing it for the heck of it? On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote: Hi, I would like to optimize solr core which is in Reader Writer mode. Since the Solr cores are huge in size (above 100 GB) the optimization takes hours to complete. When the optimization is going on say. on the Writer core, the application wants to continue using the indexes for both query and write purposes. What is the best approach to do this. I was thinking of using a temporary index (empty core) to write the documents and use the same Reader to read the documents. (Please note that temp index and the Reader cannot be made Reader Writer as Reader is already setup for the Writer on which optimization is taking place) But there could be some updates to the temp index which I would like to get reflected in the Reader. Whats the best setup to support this. Thanks, Kalika - Mark Miller lucidimagination.com -- Thanks Regards, Kalika -- Thanks Regards, Kalika
Using solr during optimization
Hi, I would like to optimize solr core which is in Reader Writer mode. Since the Solr cores are huge in size (above 100 GB) the optimization takes hours to complete. When the optimization is going on say. on the Writer core, the application wants to continue using the indexes for both query and write purposes. What is the best approach to do this. I was thinking of using a temporary index (empty core) to write the documents and use the same Reader to read the documents. (Please note that temp index and the Reader cannot be made Reader Writer as Reader is already setup for the Writer on which optimization is taking place) But there could be some updates to the temp index which I would like to get reflected in the Reader. Whats the best setup to support this. Thanks, Kalika
Re: Using solr during optimization
Do a you have something forcing you to optimize, or are you just doing it for the heck of it? On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote: Hi, I would like to optimize solr core which is in Reader Writer mode. Since the Solr cores are huge in size (above 100 GB) the optimization takes hours to complete. When the optimization is going on say. on the Writer core, the application wants to continue using the indexes for both query and write purposes. What is the best approach to do this. I was thinking of using a temporary index (empty core) to write the documents and use the same Reader to read the documents. (Please note that temp index and the Reader cannot be made Reader Writer as Reader is already setup for the Writer on which optimization is taking place) But there could be some updates to the temp index which I would like to get reflected in the Reader. Whats the best setup to support this. Thanks, Kalika - Mark Miller lucidimagination.com
Re: Using solr during optimization
Hi Mark, We are performing almost 11,000 updates a day, we have around 50 million docs in the index (i understand we will need to shard) the core seg will get fragmented over a period of time. We will need to do optimize every few days or once in a month; do you have any reason not to optimize the core. Please let me know. Thanks. On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote: Do a you have something forcing you to optimize, or are you just doing it for the heck of it? On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote: Hi, I would like to optimize solr core which is in Reader Writer mode. Since the Solr cores are huge in size (above 100 GB) the optimization takes hours to complete. When the optimization is going on say. on the Writer core, the application wants to continue using the indexes for both query and write purposes. What is the best approach to do this. I was thinking of using a temporary index (empty core) to write the documents and use the same Reader to read the documents. (Please note that temp index and the Reader cannot be made Reader Writer as Reader is already setup for the Writer on which optimization is taking place) But there could be some updates to the temp index which I would like to get reflected in the Reader. Whats the best setup to support this. Thanks, Kalika - Mark Miller lucidimagination.com -- Thanks Regards, Kalika
Re: Using solr during optimization
I would not optimize - it's very expensive. With 11,000 updates a day, I think it makes sense to completely avoid optimizing. That should be your default move in any case. If you notice performance suffers more than is acceptable (good chance you won't), then I'd use a lower merge factor. It defaults to 10 - lower numbers will lower the number of segments in your index, and essentially amortize the cost of an optimize. Optimize is generally only useful when you will have a mostly static index. - Mark Miller lucidimagination.com On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote: Hi Mark, We are performing almost 11,000 updates a day, we have around 50 million docs in the index (i understand we will need to shard) the core seg will get fragmented over a period of time. We will need to do optimize every few days or once in a month; do you have any reason not to optimize the core. Please let me know. Thanks. On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote: Do a you have something forcing you to optimize, or are you just doing it for the heck of it? On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote: Hi, I would like to optimize solr core which is in Reader Writer mode. Since the Solr cores are huge in size (above 100 GB) the optimization takes hours to complete. When the optimization is going on say. on the Writer core, the application wants to continue using the indexes for both query and write purposes. What is the best approach to do this. I was thinking of using a temporary index (empty core) to write the documents and use the same Reader to read the documents. (Please note that temp index and the Reader cannot be made Reader Writer as Reader is already setup for the Writer on which optimization is taking place) But there could be some updates to the temp index which I would like to get reflected in the Reader. Whats the best setup to support this. Thanks, Kalika - Mark Miller lucidimagination.com -- Thanks Regards, Kalika