I'm cleaning up the patch for LUCENE-847 (factored merge policy) and noticed a couple of things about the addIndexes methods.
Is there any particular reason that the version that takes a Directory[] optimizes first? The later merge is going to use the normal logarithmic stepping; is there a compelling reason why it's better to create one segment from the existing segments before adding the new indexes? I can certainly come up with counter examples where this is a really poor choice. The IndexReader[] version has kind of the dual issue: it doesn't use the merge policy semantics at all. It does one giant merge without regard for the merge factor, etc. Is there any reason why this is better than the normal logarithmic stepping case? One thing I can imagine is that it would be done because it's possible? In other words, by definition (?) you can open all the necessary files because in this case you already have IndexReaders created for them and if there were a file descriptor issue, you'd have died before you go to the addIndexes call. And seeing where it was first added (1.3 rc2) it was to support derived IndexReaders. The fact that it's doing an end around the merge policy makes me uneasy. Two things jump out as difficulties, both having to do with how much flexibility we can delegate to merge policies: First, it'd be nice if merge policies could decide whether resulting segments are going to be compound or not (talked about previously on dev). The upshot of this is that there would be no get/setUseCompoundFile() in IndexWriter anymore (except the deprecated version for compatibility). So addIndexes can't decide as it does now. Second, it's possible that a merge policy doesn't want to do optimize this way. In fact, I think it's reasonable for optimize on some merge policies not to reduce all the way down to a single segment, but instead to obey something like maxMergeDocs even when optimize is called. (That may be debatable; all I care about right now is leaving it up to the merge policy). These things would argue for making addIndexes(IndexReader[]) like addIndexes(Directory[]), letting the merge policy decide whether to do staged merges or not. But that would seriously mess with the API. Are there good example use cases for addIndexes(IndexReader[])? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]