Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-11 Thread Earwin Burrfoot
Is it really that hard to recreate IndexWriter if you have to change
the settings??

Yeah, yeah, you lose all your precious reused buffers, and maybe
there's a small indexing latency spike, when switching from old IW to
new one, but people aren't changing their IW configs several times a
second?

I suggest banning as much runtime-mutable settings as humanely
possible, and ask people to recreate objects for reconfiguration, be
it IW, IR, Analyzers, whatnot.

On Thu, Mar 10, 2011 at 23:07, Michael McCandless
luc...@mikemccandless.com wrote:
 On Thu, Mar 10, 2011 at 7:28 AM, Robert Muir rcm...@gmail.com wrote:

 This should block the release: if IndexWriterConfig is a broken design
 then we need to revert this now before its released, not make users
 switch over and then undeprecate/revert in a future release.

 +1

 I think we have to sort this out, one way or another, before releasing 3.1.

 I really don't like splitting setters across IWC vs IW.  That'll just
 cause confusion, and noise over time as we change our minds about
 where things belong.

 Looking through IWC, it seems that most setters can be done live.
 In fact, setRAMBufferSizeMB is *almost* live: all places in IW that
 use this pull it from the config, except for DocumentsWriter.  We
 could just push the config down to DW and have it pull live too...

 Other settings are not pulled live but for no good reason, eg
 termsIndexInterval is copied to a private field in IW but could just
 as easily be pulled when it's time to write a new segment...

 Maybe we should simply document which settings are live vs only take
 effect at init time?

 Mike

 --
 Mike

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-11 Thread Shai Erera
I agree. After IWC, the only setter left in IW is setInfoStream which makes
sense. But the rest ... assuming these config change don't happen very
often, recreating IW doesn't sound like a big thing to me. The alternative
of complicating IWC to support runtime changes -- we need to be absolutely
sure it's worth it.

Also, if the solution is to allow changing IWC (runtime) settings, then I
don't think this issue should block 3.1? We can anyway add other runtime
settings following 3.1, and we won't undeprecate anything. So maybe mark
that issue as a non-blocker?

Shai

On Fri, Mar 11, 2011 at 2:20 PM, Earwin Burrfoot ear...@gmail.com wrote:

 Is it really that hard to recreate IndexWriter if you have to change
 the settings??

 Yeah, yeah, you lose all your precious reused buffers, and maybe
 there's a small indexing latency spike, when switching from old IW to
 new one, but people aren't changing their IW configs several times a
 second?

 I suggest banning as much runtime-mutable settings as humanely
 possible, and ask people to recreate objects for reconfiguration, be
 it IW, IR, Analyzers, whatnot.

 On Thu, Mar 10, 2011 at 23:07, Michael McCandless
 luc...@mikemccandless.com wrote:
  On Thu, Mar 10, 2011 at 7:28 AM, Robert Muir rcm...@gmail.com wrote:
 
  This should block the release: if IndexWriterConfig is a broken design
  then we need to revert this now before its released, not make users
  switch over and then undeprecate/revert in a future release.
 
  +1
 
  I think we have to sort this out, one way or another, before releasing
 3.1.
 
  I really don't like splitting setters across IWC vs IW.  That'll just
  cause confusion, and noise over time as we change our minds about
  where things belong.
 
  Looking through IWC, it seems that most setters can be done live.
  In fact, setRAMBufferSizeMB is *almost* live: all places in IW that
  use this pull it from the config, except for DocumentsWriter.  We
  could just push the config down to DW and have it pull live too...
 
  Other settings are not pulled live but for no good reason, eg
  termsIndexInterval is copied to a private field in IW but could just
  as easily be pulled when it's time to write a new segment...
 
  Maybe we should simply document which settings are live vs only take
  effect at init time?
 
  Mike
 
  --
  Mike
 
  http://blog.mikemccandless.com
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 



 --
 Kirill Zakharenko/Кирилл Захаренко
 E-Mail/Jabber: ear...@gmail.com
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-11 Thread Earwin Burrfoot
Thanks for your support, but I don't think setInfoStream makes any
sense either : )

Do we /change/ infoStreams for IW @runtime? Why can't we pass it as
constructor argument/IWC field?
Ok, just maybe, I can imagine a case, where a certain app runs
happily, then misbehaves, and then you, with some clever trickery
supply it a fresh infoStream, to capture the problem live, without
restarting.
So, just maybe, we should leave setInfoStream asis.

2011/3/11 Shai Erera ser...@gmail.com:
 I agree. After IWC, the only setter left in IW is setInfoStream which makes
 sense. But the rest ... assuming these config change don't happen very
 often, recreating IW doesn't sound like a big thing to me. The alternative
 of complicating IWC to support runtime changes -- we need to be absolutely
 sure it's worth it.

 Also, if the solution is to allow changing IWC (runtime) settings, then I
 don't think this issue should block 3.1? We can anyway add other runtime
 settings following 3.1, and we won't undeprecate anything. So maybe mark
 that issue as a non-blocker?

 Shai

 On Fri, Mar 11, 2011 at 2:20 PM, Earwin Burrfoot ear...@gmail.com wrote:

 Is it really that hard to recreate IndexWriter if you have to change
 the settings??

 Yeah, yeah, you lose all your precious reused buffers, and maybe
 there's a small indexing latency spike, when switching from old IW to
 new one, but people aren't changing their IW configs several times a
 second?

 I suggest banning as much runtime-mutable settings as humanely
 possible, and ask people to recreate objects for reconfiguration, be
 it IW, IR, Analyzers, whatnot.

 On Thu, Mar 10, 2011 at 23:07, Michael McCandless
 luc...@mikemccandless.com wrote:
  On Thu, Mar 10, 2011 at 7:28 AM, Robert Muir rcm...@gmail.com wrote:
 
  This should block the release: if IndexWriterConfig is a broken design
  then we need to revert this now before its released, not make users
  switch over and then undeprecate/revert in a future release.
 
  +1
 
  I think we have to sort this out, one way or another, before releasing
  3.1.
 
  I really don't like splitting setters across IWC vs IW.  That'll just
  cause confusion, and noise over time as we change our minds about
  where things belong.
 
  Looking through IWC, it seems that most setters can be done live.
  In fact, setRAMBufferSizeMB is *almost* live: all places in IW that
  use this pull it from the config, except for DocumentsWriter.  We
  could just push the config down to DW and have it pull live too...
 
  Other settings are not pulled live but for no good reason, eg
  termsIndexInterval is copied to a private field in IW but could just
  as easily be pulled when it's time to write a new segment...
 
  Maybe we should simply document which settings are live vs only take
  effect at init time?
 
  Mike
 
  --
  Mike
 
  http://blog.mikemccandless.com
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 



 --
 Kirill Zakharenko/Кирилл Захаренко
 E-Mail/Jabber: ear...@gmail.com
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org






-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-10 Thread Michael McCandless
Hi Shay,

It sounds like we should put this (ability to change RAM buffer on the
fly) back.

But, can you describe how/why you need this?  Is it because you have
many IWs open at once and you want to dynamically change which gets to
use RAM?

Are there other settings that were moved to IWC that you also
dynamically change today...?

Can you open an issue?  Make sure it's marked fix 4.0/3.2!  Thanks.

Mike

On Wed, Mar 9, 2011 at 1:01 AM, Shay Banon kim...@gmail.com wrote:
 Heya,
   I think the ability to change the RAMBufferSizeMB on a live IndexWriter
 (without the need to close and open it) is an important one, and it seems
 like tis deprecated on 3.1 and removed in trunk. Is there a chance to get it
 back?
 -shay.banon



-- 
Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-10 Thread Robert Muir
On Thu, Mar 10, 2011 at 6:49 AM, Michael McCandless
luc...@mikemccandless.com wrote:

 Can you open an issue?  Make sure it's marked fix 4.0/3.2!  Thanks.


I'm not sure we should handle it this way: I really don't like
deprecation in one release and undeprecation in another.
So, I think we should open an issue for 3.1 and figure out if we want
to do this for setters at all.
If we decide to start moving setters out of IndexWriterConfig, then we
need to start asking very hard questions about IndexWriterConfig as a
whole, because I think it will be confusing if IndexWriter has two
separate configuration APIs.
This should block the release: if IndexWriterConfig is a broken design
then we need to revert this now before its released, not make users
switch over and then undeprecate/revert in a future release.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-10 Thread Shay Banon
Hi,
On Thursday, March 10, 2011 at 1:49 PM, Michael McCandless wrote: 
 Hi Shay,
 
 It sounds like we should put this (ability to change RAM buffer on the
 fly) back.
 
 But, can you describe how/why you need this? Is it because you have
 many IWs open at once and you want to dynamically change which gets to
 use RAM?
Exactly. In elasticsearch, there can be several shards (each a Lucene index) 
running in the same VM. You can configure that you want 10% of the heap to be 
allocated to indexing, and it will automatically distribute it between all the 
shards by dynamically changing that value on each IndexWriter.
 
 
 Are there other settings that were moved to IWC that you also
 dynamically change today...?
I think most can, and should, be set on the MergePolicy itself. The two that I 
miss as well are settings the term index interval, and reader terms divisor. 
 
 
 Can you open an issue? Make sure it's marked fix 4.0/3.2! Thanks.
Done: https://issues.apache.org/jira/browse/LUCENE-2960. 
 
 
 Mike
 
 On Wed, Mar 9, 2011 at 1:01 AM, Shay Banon kim...@gmail.com wrote:
  Heya,
  I think the ability to change the RAMBufferSizeMB on a live IndexWriter
  (without the need to close and open it) is an important one, and it seems
  like tis deprecated on 3.1 and removed in trunk. Is there a chance to get it
  back?
  -shay.banon
 
 
 
 -- 
 Mike
 
 http://blog.mikemccandless.com
 


Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-10 Thread Shay Banon
I am not sure that IndexWriterConfig is bad. Its nice to be able to set all the 
upfront configurations in a single object and pass it to the IndexWriter. And, 
have the IndexWriter allow for specific setters allowing for real time changes 
(those should not be done through the IndexWriterConfig).

The question is which real time changes are allowed or not. The fact that they 
are separated (IndexWriterConfig, and real time setters) is good since it 
allows to distinguish between what can be set when setting up an IndexWriter, 
compared to what can be set in real time. We did not have this distinction 
before the IndexWriterConfig was introduced.

This open the door for optimizations for things that can only be set when 
constructing an IndexWriter. Usually, supporting real time changes can hinder 
concurrency, while having parameters that are basically immutable allows to 
optimize in this case.

-shay.banon
On Thursday, March 10, 2011 at 2:28 PM, Robert Muir wrote:
On Thu, Mar 10, 2011 at 6:49 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 
  Can you open an issue? Make sure it's marked fix 4.0/3.2! Thanks.
 
 I'm not sure we should handle it this way: I really don't like
 deprecation in one release and undeprecation in another.
 So, I think we should open an issue for 3.1 and figure out if we want
 to do this for setters at all.
 If we decide to start moving setters out of IndexWriterConfig, then we
 need to start asking very hard questions about IndexWriterConfig as a
 whole, because I think it will be confusing if IndexWriter has two
 separate configuration APIs.
 This should block the release: if IndexWriterConfig is a broken design
 then we need to revert this now before its released, not make users
 switch over and then undeprecate/revert in a future release.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-10 Thread Robert Muir
On Thu, Mar 10, 2011 at 7:41 AM, Shay Banon kim...@gmail.com wrote:
 I am not sure that IndexWriterConfig is bad. Its nice to be able to set all
 the upfront configurations in a single object and pass it to the
 IndexWriter. And, have the IndexWriter allow for specific setters allowing
 for real time changes (those should not be done through the
 IndexWriterConfig).
 The question is which real time changes are allowed or not. The fact that
 they are separated (IndexWriterConfig, and real time setters) is good since
 it allows to distinguish between what can be set when setting up an
 IndexWriter, compared to what can be set in real time. We did not have this
 distinction before the IndexWriterConfig was introduced.
 This open the door for optimizations for things that can only be set when
 constructing an IndexWriter. Usually, supporting real time changes can
 hinder concurrency, while having parameters that are basically immutable
 allows to optimize in this case.
 -shay.banon


I disagree that its good if things are separate... Instead of API
confusion I think I would prefer a single method on IW that best
effort tries to apply any realtime setters

This way we can avoid constant deprecation and undeprecation between
these APIs. Instead, whether something can be changed on the fly is
only a documentation issue.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-10 Thread Shai Erera
IWC simplified IW creation - now there is only one ctor, where before there
were multiple ones, and some settings could only be changed after IW was
created.

With IWC, our code is (can become) simpler -- e.g. RAM buffer size, if
specified up front is one thing, but if it's dynamic, we need to have code
which dynamically increases or decreases it. Increasing is not the problem,
but decreasing requires special code that flushes and discards the extra
memory. Maybe the code already exists, I haven't checked.

I don't like setters that are all over the place either. Having said that
though, today the setters are inconsistent -- some are 'static' (meaning,
cannot change after IW created) while some are dynamic, such as the
MergePolicy settings. Because MP responds to those setters.

One thing we can do is keep all the setters in IWC and have IW pass itself
to IWC after creation. Then, we can modify certain settings in IWC to notify
IW of these changes. But it's complicated.

Another thing is separate some runtime settings from IWC and include them in
IW, like we do for MP ... that's what's been suggested. But then, what is a
'runtime' setting? Someone can decide to have IndexDeletionPolicy change
'on-the-fly' in his app -- does it make sense that we make IDP a runtime
setting? I don't think so.

In fact, I don't think RAM buffer is changed that dynamically by
applications (or any other setter). Elastic search may have a use case where
it's needed, that's ok. If this setting does not change very often, it can
still close IW and reopen it with the new settings, right?

A third solution is to keep IWC for construction time, but introduce the
setters back on IW for runtime changes.That way we keep IW ctor simple but
still allow apps to change on-the-fly settings. We'll dup setters which I
don't like ...

Shai

On Thu, Mar 10, 2011 at 2:47 PM, Robert Muir rcm...@gmail.com wrote:

 On Thu, Mar 10, 2011 at 7:41 AM, Shay Banon kim...@gmail.com wrote:
  I am not sure that IndexWriterConfig is bad. Its nice to be able to set
 all
  the upfront configurations in a single object and pass it to the
  IndexWriter. And, have the IndexWriter allow for specific setters
 allowing
  for real time changes (those should not be done through the
  IndexWriterConfig).
  The question is which real time changes are allowed or not. The fact that
  they are separated (IndexWriterConfig, and real time setters) is good
 since
  it allows to distinguish between what can be set when setting up an
  IndexWriter, compared to what can be set in real time. We did not have
 this
  distinction before the IndexWriterConfig was introduced.
  This open the door for optimizations for things that can only be set when
  constructing an IndexWriter. Usually, supporting real time changes can
  hinder concurrency, while having parameters that are basically immutable
  allows to optimize in this case.
  -shay.banon
 

 I disagree that its good if things are separate... Instead of API
 confusion I think I would prefer a single method on IW that best
 effort tries to apply any realtime setters

 This way we can avoid constant deprecation and undeprecation between
 these APIs. Instead, whether something can be changed on the fly is
 only a documentation issue.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-10 Thread Michael McCandless
On Thu, Mar 10, 2011 at 8:23 AM, Shai Erera ser...@gmail.com wrote:
 IWC simplified IW creation - now there is only one ctor, where before there
 were multiple ones, and some settings could only be changed after IW was
 created.

 With IWC, our code is (can become) simpler -- e.g. RAM buffer size, if
 specified up front is one thing, but if it's dynamic, we need to have code
 which dynamically increases or decreases it. Increasing is not the problem,
 but decreasing requires special code that flushes and discards the extra
 memory. Maybe the code already exists, I haven't checked.

Actually IW handles this (RAM buffer grows or shrinks) today, or it
did before the IWC change.  Though I'm not sure it provoked a flush
immediately (ie, it was probably on the next add/update/delete call
that the flush happened); we should fix that.

 I don't like setters that are all over the place either. Having said that
 though, today the setters are inconsistent -- some are 'static' (meaning,
 cannot change after IW created) while some are dynamic, such as the
 MergePolicy settings. Because MP responds to those setters.

 One thing we can do is keep all the setters in IWC and have IW pass itself
 to IWC after creation. Then, we can modify certain settings in IWC to notify
 IW of these changes. But it's complicated.

+1 -- I think this is the best option?

Ie, we leave all setters/getters in IWC, but we make it clear (in
javadocs) which settings are live and which must be done before
init'ing the IW.  If we want to be anal about it we can throw
IllegalStateExc if you try to change a static setting after IW has
bound to the IWC.

-- 
Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-10 Thread Michael McCandless
On Thu, Mar 10, 2011 at 7:28 AM, Robert Muir rcm...@gmail.com wrote:

 This should block the release: if IndexWriterConfig is a broken design
 then we need to revert this now before its released, not make users
 switch over and then undeprecate/revert in a future release.

+1

I think we have to sort this out, one way or another, before releasing 3.1.

I really don't like splitting setters across IWC vs IW.  That'll just
cause confusion, and noise over time as we change our minds about
where things belong.

Looking through IWC, it seems that most setters can be done live.
In fact, setRAMBufferSizeMB is *almost* live: all places in IW that
use this pull it from the config, except for DocumentsWriter.  We
could just push the config down to DW and have it pull live too...

Other settings are not pulled live but for no good reason, eg
termsIndexInterval is copied to a private field in IW but could just
as easily be pulled when it's time to write a new segment...

Maybe we should simply document which settings are live vs only take
effect at init time?

Mike

-- 
Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-08 Thread Shay Banon
Heya,

I think the ability to change the RAMBufferSizeMB on a live IndexWriter 
(without the need to close and open it) is an important one, and it seems like 
tis deprecated on 3.1 and removed in trunk. Is there a chance to get it back?

-shay.banon