Re: IndexWriter#setRAMBufferSizeMB removed in trunk
Is it really that hard to recreate IndexWriter if you have to change the settings?? Yeah, yeah, you lose all your precious reused buffers, and maybe there's a small indexing latency spike, when switching from old IW to new one, but people aren't changing their IW configs several times a second? I suggest banning as much runtime-mutable settings as humanely possible, and ask people to recreate objects for reconfiguration, be it IW, IR, Analyzers, whatnot. On Thu, Mar 10, 2011 at 23:07, Michael McCandless luc...@mikemccandless.com wrote: On Thu, Mar 10, 2011 at 7:28 AM, Robert Muir rcm...@gmail.com wrote: This should block the release: if IndexWriterConfig is a broken design then we need to revert this now before its released, not make users switch over and then undeprecate/revert in a future release. +1 I think we have to sort this out, one way or another, before releasing 3.1. I really don't like splitting setters across IWC vs IW. That'll just cause confusion, and noise over time as we change our minds about where things belong. Looking through IWC, it seems that most setters can be done live. In fact, setRAMBufferSizeMB is *almost* live: all places in IW that use this pull it from the config, except for DocumentsWriter. We could just push the config down to DW and have it pull live too... Other settings are not pulled live but for no good reason, eg termsIndexInterval is copied to a private field in IW but could just as easily be pulled when it's time to write a new segment... Maybe we should simply document which settings are live vs only take effect at init time? Mike -- Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter#setRAMBufferSizeMB removed in trunk
I agree. After IWC, the only setter left in IW is setInfoStream which makes sense. But the rest ... assuming these config change don't happen very often, recreating IW doesn't sound like a big thing to me. The alternative of complicating IWC to support runtime changes -- we need to be absolutely sure it's worth it. Also, if the solution is to allow changing IWC (runtime) settings, then I don't think this issue should block 3.1? We can anyway add other runtime settings following 3.1, and we won't undeprecate anything. So maybe mark that issue as a non-blocker? Shai On Fri, Mar 11, 2011 at 2:20 PM, Earwin Burrfoot ear...@gmail.com wrote: Is it really that hard to recreate IndexWriter if you have to change the settings?? Yeah, yeah, you lose all your precious reused buffers, and maybe there's a small indexing latency spike, when switching from old IW to new one, but people aren't changing their IW configs several times a second? I suggest banning as much runtime-mutable settings as humanely possible, and ask people to recreate objects for reconfiguration, be it IW, IR, Analyzers, whatnot. On Thu, Mar 10, 2011 at 23:07, Michael McCandless luc...@mikemccandless.com wrote: On Thu, Mar 10, 2011 at 7:28 AM, Robert Muir rcm...@gmail.com wrote: This should block the release: if IndexWriterConfig is a broken design then we need to revert this now before its released, not make users switch over and then undeprecate/revert in a future release. +1 I think we have to sort this out, one way or another, before releasing 3.1. I really don't like splitting setters across IWC vs IW. That'll just cause confusion, and noise over time as we change our minds about where things belong. Looking through IWC, it seems that most setters can be done live. In fact, setRAMBufferSizeMB is *almost* live: all places in IW that use this pull it from the config, except for DocumentsWriter. We could just push the config down to DW and have it pull live too... Other settings are not pulled live but for no good reason, eg termsIndexInterval is copied to a private field in IW but could just as easily be pulled when it's time to write a new segment... Maybe we should simply document which settings are live vs only take effect at init time? Mike -- Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter#setRAMBufferSizeMB removed in trunk
Thanks for your support, but I don't think setInfoStream makes any sense either : ) Do we /change/ infoStreams for IW @runtime? Why can't we pass it as constructor argument/IWC field? Ok, just maybe, I can imagine a case, where a certain app runs happily, then misbehaves, and then you, with some clever trickery supply it a fresh infoStream, to capture the problem live, without restarting. So, just maybe, we should leave setInfoStream asis. 2011/3/11 Shai Erera ser...@gmail.com: I agree. After IWC, the only setter left in IW is setInfoStream which makes sense. But the rest ... assuming these config change don't happen very often, recreating IW doesn't sound like a big thing to me. The alternative of complicating IWC to support runtime changes -- we need to be absolutely sure it's worth it. Also, if the solution is to allow changing IWC (runtime) settings, then I don't think this issue should block 3.1? We can anyway add other runtime settings following 3.1, and we won't undeprecate anything. So maybe mark that issue as a non-blocker? Shai On Fri, Mar 11, 2011 at 2:20 PM, Earwin Burrfoot ear...@gmail.com wrote: Is it really that hard to recreate IndexWriter if you have to change the settings?? Yeah, yeah, you lose all your precious reused buffers, and maybe there's a small indexing latency spike, when switching from old IW to new one, but people aren't changing their IW configs several times a second? I suggest banning as much runtime-mutable settings as humanely possible, and ask people to recreate objects for reconfiguration, be it IW, IR, Analyzers, whatnot. On Thu, Mar 10, 2011 at 23:07, Michael McCandless luc...@mikemccandless.com wrote: On Thu, Mar 10, 2011 at 7:28 AM, Robert Muir rcm...@gmail.com wrote: This should block the release: if IndexWriterConfig is a broken design then we need to revert this now before its released, not make users switch over and then undeprecate/revert in a future release. +1 I think we have to sort this out, one way or another, before releasing 3.1. I really don't like splitting setters across IWC vs IW. That'll just cause confusion, and noise over time as we change our minds about where things belong. Looking through IWC, it seems that most setters can be done live. In fact, setRAMBufferSizeMB is *almost* live: all places in IW that use this pull it from the config, except for DocumentsWriter. We could just push the config down to DW and have it pull live too... Other settings are not pulled live but for no good reason, eg termsIndexInterval is copied to a private field in IW but could just as easily be pulled when it's time to write a new segment... Maybe we should simply document which settings are live vs only take effect at init time? Mike -- Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter#setRAMBufferSizeMB removed in trunk
Hi Shay, It sounds like we should put this (ability to change RAM buffer on the fly) back. But, can you describe how/why you need this? Is it because you have many IWs open at once and you want to dynamically change which gets to use RAM? Are there other settings that were moved to IWC that you also dynamically change today...? Can you open an issue? Make sure it's marked fix 4.0/3.2! Thanks. Mike On Wed, Mar 9, 2011 at 1:01 AM, Shay Banon kim...@gmail.com wrote: Heya, I think the ability to change the RAMBufferSizeMB on a live IndexWriter (without the need to close and open it) is an important one, and it seems like tis deprecated on 3.1 and removed in trunk. Is there a chance to get it back? -shay.banon -- Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter#setRAMBufferSizeMB removed in trunk
On Thu, Mar 10, 2011 at 6:49 AM, Michael McCandless luc...@mikemccandless.com wrote: Can you open an issue? Make sure it's marked fix 4.0/3.2! Thanks. I'm not sure we should handle it this way: I really don't like deprecation in one release and undeprecation in another. So, I think we should open an issue for 3.1 and figure out if we want to do this for setters at all. If we decide to start moving setters out of IndexWriterConfig, then we need to start asking very hard questions about IndexWriterConfig as a whole, because I think it will be confusing if IndexWriter has two separate configuration APIs. This should block the release: if IndexWriterConfig is a broken design then we need to revert this now before its released, not make users switch over and then undeprecate/revert in a future release. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter#setRAMBufferSizeMB removed in trunk
Hi, On Thursday, March 10, 2011 at 1:49 PM, Michael McCandless wrote: Hi Shay, It sounds like we should put this (ability to change RAM buffer on the fly) back. But, can you describe how/why you need this? Is it because you have many IWs open at once and you want to dynamically change which gets to use RAM? Exactly. In elasticsearch, there can be several shards (each a Lucene index) running in the same VM. You can configure that you want 10% of the heap to be allocated to indexing, and it will automatically distribute it between all the shards by dynamically changing that value on each IndexWriter. Are there other settings that were moved to IWC that you also dynamically change today...? I think most can, and should, be set on the MergePolicy itself. The two that I miss as well are settings the term index interval, and reader terms divisor. Can you open an issue? Make sure it's marked fix 4.0/3.2! Thanks. Done: https://issues.apache.org/jira/browse/LUCENE-2960. Mike On Wed, Mar 9, 2011 at 1:01 AM, Shay Banon kim...@gmail.com wrote: Heya, I think the ability to change the RAMBufferSizeMB on a live IndexWriter (without the need to close and open it) is an important one, and it seems like tis deprecated on 3.1 and removed in trunk. Is there a chance to get it back? -shay.banon -- Mike http://blog.mikemccandless.com
Re: IndexWriter#setRAMBufferSizeMB removed in trunk
I am not sure that IndexWriterConfig is bad. Its nice to be able to set all the upfront configurations in a single object and pass it to the IndexWriter. And, have the IndexWriter allow for specific setters allowing for real time changes (those should not be done through the IndexWriterConfig). The question is which real time changes are allowed or not. The fact that they are separated (IndexWriterConfig, and real time setters) is good since it allows to distinguish between what can be set when setting up an IndexWriter, compared to what can be set in real time. We did not have this distinction before the IndexWriterConfig was introduced. This open the door for optimizations for things that can only be set when constructing an IndexWriter. Usually, supporting real time changes can hinder concurrency, while having parameters that are basically immutable allows to optimize in this case. -shay.banon On Thursday, March 10, 2011 at 2:28 PM, Robert Muir wrote: On Thu, Mar 10, 2011 at 6:49 AM, Michael McCandless luc...@mikemccandless.com wrote: Can you open an issue? Make sure it's marked fix 4.0/3.2! Thanks. I'm not sure we should handle it this way: I really don't like deprecation in one release and undeprecation in another. So, I think we should open an issue for 3.1 and figure out if we want to do this for setters at all. If we decide to start moving setters out of IndexWriterConfig, then we need to start asking very hard questions about IndexWriterConfig as a whole, because I think it will be confusing if IndexWriter has two separate configuration APIs. This should block the release: if IndexWriterConfig is a broken design then we need to revert this now before its released, not make users switch over and then undeprecate/revert in a future release. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter#setRAMBufferSizeMB removed in trunk
On Thu, Mar 10, 2011 at 7:41 AM, Shay Banon kim...@gmail.com wrote: I am not sure that IndexWriterConfig is bad. Its nice to be able to set all the upfront configurations in a single object and pass it to the IndexWriter. And, have the IndexWriter allow for specific setters allowing for real time changes (those should not be done through the IndexWriterConfig). The question is which real time changes are allowed or not. The fact that they are separated (IndexWriterConfig, and real time setters) is good since it allows to distinguish between what can be set when setting up an IndexWriter, compared to what can be set in real time. We did not have this distinction before the IndexWriterConfig was introduced. This open the door for optimizations for things that can only be set when constructing an IndexWriter. Usually, supporting real time changes can hinder concurrency, while having parameters that are basically immutable allows to optimize in this case. -shay.banon I disagree that its good if things are separate... Instead of API confusion I think I would prefer a single method on IW that best effort tries to apply any realtime setters This way we can avoid constant deprecation and undeprecation between these APIs. Instead, whether something can be changed on the fly is only a documentation issue. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter#setRAMBufferSizeMB removed in trunk
IWC simplified IW creation - now there is only one ctor, where before there were multiple ones, and some settings could only be changed after IW was created. With IWC, our code is (can become) simpler -- e.g. RAM buffer size, if specified up front is one thing, but if it's dynamic, we need to have code which dynamically increases or decreases it. Increasing is not the problem, but decreasing requires special code that flushes and discards the extra memory. Maybe the code already exists, I haven't checked. I don't like setters that are all over the place either. Having said that though, today the setters are inconsistent -- some are 'static' (meaning, cannot change after IW created) while some are dynamic, such as the MergePolicy settings. Because MP responds to those setters. One thing we can do is keep all the setters in IWC and have IW pass itself to IWC after creation. Then, we can modify certain settings in IWC to notify IW of these changes. But it's complicated. Another thing is separate some runtime settings from IWC and include them in IW, like we do for MP ... that's what's been suggested. But then, what is a 'runtime' setting? Someone can decide to have IndexDeletionPolicy change 'on-the-fly' in his app -- does it make sense that we make IDP a runtime setting? I don't think so. In fact, I don't think RAM buffer is changed that dynamically by applications (or any other setter). Elastic search may have a use case where it's needed, that's ok. If this setting does not change very often, it can still close IW and reopen it with the new settings, right? A third solution is to keep IWC for construction time, but introduce the setters back on IW for runtime changes.That way we keep IW ctor simple but still allow apps to change on-the-fly settings. We'll dup setters which I don't like ... Shai On Thu, Mar 10, 2011 at 2:47 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Mar 10, 2011 at 7:41 AM, Shay Banon kim...@gmail.com wrote: I am not sure that IndexWriterConfig is bad. Its nice to be able to set all the upfront configurations in a single object and pass it to the IndexWriter. And, have the IndexWriter allow for specific setters allowing for real time changes (those should not be done through the IndexWriterConfig). The question is which real time changes are allowed or not. The fact that they are separated (IndexWriterConfig, and real time setters) is good since it allows to distinguish between what can be set when setting up an IndexWriter, compared to what can be set in real time. We did not have this distinction before the IndexWriterConfig was introduced. This open the door for optimizations for things that can only be set when constructing an IndexWriter. Usually, supporting real time changes can hinder concurrency, while having parameters that are basically immutable allows to optimize in this case. -shay.banon I disagree that its good if things are separate... Instead of API confusion I think I would prefer a single method on IW that best effort tries to apply any realtime setters This way we can avoid constant deprecation and undeprecation between these APIs. Instead, whether something can be changed on the fly is only a documentation issue. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter#setRAMBufferSizeMB removed in trunk
On Thu, Mar 10, 2011 at 8:23 AM, Shai Erera ser...@gmail.com wrote: IWC simplified IW creation - now there is only one ctor, where before there were multiple ones, and some settings could only be changed after IW was created. With IWC, our code is (can become) simpler -- e.g. RAM buffer size, if specified up front is one thing, but if it's dynamic, we need to have code which dynamically increases or decreases it. Increasing is not the problem, but decreasing requires special code that flushes and discards the extra memory. Maybe the code already exists, I haven't checked. Actually IW handles this (RAM buffer grows or shrinks) today, or it did before the IWC change. Though I'm not sure it provoked a flush immediately (ie, it was probably on the next add/update/delete call that the flush happened); we should fix that. I don't like setters that are all over the place either. Having said that though, today the setters are inconsistent -- some are 'static' (meaning, cannot change after IW created) while some are dynamic, such as the MergePolicy settings. Because MP responds to those setters. One thing we can do is keep all the setters in IWC and have IW pass itself to IWC after creation. Then, we can modify certain settings in IWC to notify IW of these changes. But it's complicated. +1 -- I think this is the best option? Ie, we leave all setters/getters in IWC, but we make it clear (in javadocs) which settings are live and which must be done before init'ing the IW. If we want to be anal about it we can throw IllegalStateExc if you try to change a static setting after IW has bound to the IWC. -- Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter#setRAMBufferSizeMB removed in trunk
On Thu, Mar 10, 2011 at 7:28 AM, Robert Muir rcm...@gmail.com wrote: This should block the release: if IndexWriterConfig is a broken design then we need to revert this now before its released, not make users switch over and then undeprecate/revert in a future release. +1 I think we have to sort this out, one way or another, before releasing 3.1. I really don't like splitting setters across IWC vs IW. That'll just cause confusion, and noise over time as we change our minds about where things belong. Looking through IWC, it seems that most setters can be done live. In fact, setRAMBufferSizeMB is *almost* live: all places in IW that use this pull it from the config, except for DocumentsWriter. We could just push the config down to DW and have it pull live too... Other settings are not pulled live but for no good reason, eg termsIndexInterval is copied to a private field in IW but could just as easily be pulled when it's time to write a new segment... Maybe we should simply document which settings are live vs only take effect at init time? Mike -- Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
IndexWriter#setRAMBufferSizeMB removed in trunk
Heya, I think the ability to change the RAMBufferSizeMB on a live IndexWriter (without the need to close and open it) is an important one, and it seems like tis deprecated on 3.1 and removed in trunk. Is there a chance to get it back? -shay.banon