addIndexes()

Steven Parkes Wed, 30 May 2007 10:53:00 -0700

I'm cleaning up the patch for LUCENE-847 (factored merge policy) and
noticed a couple of things about the addIndexes methods.


Is there any particular reason that the version that takes a Directory[]
optimizes first? The later merge is going to use the normal logarithmic
stepping; is there a compelling reason why it's better to create one
segment from the existing segments before adding the new indexes? I can
certainly come up with counter examples where this is a really poor
choice.

The IndexReader[] version has kind of the dual issue: it doesn't use the
merge policy semantics at all. It does one giant merge without regard
for the merge factor, etc. Is there any reason why this is better than
the normal logarithmic stepping case? One thing I can imagine is that it
would be done because it's possible? In other words, by definition (?)
you can open all the necessary files because in this case you already
have IndexReaders created for them and if there were a file descriptor
issue, you'd have died before you go to the addIndexes call.

And seeing where it was first added (1.3 rc2) it was to support derived
IndexReaders.

The fact that it's doing an end around the merge policy makes me uneasy.
Two things jump out as difficulties, both having to do with how much
flexibility we can delegate to merge policies:

First, it'd be nice if merge policies could decide whether resulting
segments are going to be compound or not (talked about previously on
dev). The upshot of this is that there would be no
get/setUseCompoundFile() in IndexWriter anymore (except the deprecated
version for compatibility). So addIndexes can't decide as it does now.

Second, it's possible that a merge policy doesn't want to do optimize
this way. In fact, I think it's reasonable for optimize on some merge
policies not to reduce all the way down to a single segment, but instead
to obey something like maxMergeDocs even when optimize is called. (That
may be debatable; all I care about right now is leaving it up to the
merge policy).

These things would argue for making addIndexes(IndexReader[]) like
addIndexes(Directory[]), letting the merge policy decide whether to do
staged merges or not. But that would seriously mess with the API.

Are there good example use cases for addIndexes(IndexReader[])?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

addIndexes()

Reply via email to