On Mon, Jul 6, 2009 at 2:18 AM, John Wang<john.w...@gmail.com> wrote:

>      Currently, addIndexesNoOptimize(Directory[] dir) is really really
> really fast! (I duplicated my index of 15k docs 200 times and created a 3M
> doc index in less than a minute) Perhaps we should handle duplicate
> directory names more gracefully? e.g. append a numeral after the segment
> name or something? (I'd happy to work on a patch for it)

I guess we could explicitly disambiguate on adding external
SegmentInfo instances into IndexWriter's segmentInfos (add a new
member to SegmentInfo that's normally set to a default value but on
importing dups is set to unique values, and then use that member in
hashCode/equals).  It's somewhat "smelly" though...

Or, you could call addIndexesNoOptimize N times, instead; I wonder how
the performance would compare.  Is performance a real issue here?
This is just for testing right?

>  For what I need now, I think in my case addIndexesNoOptimize(IndexReader[]) 
> would work as well (I wouldn't know how performance would compare though).

Actually implementing this is actually rather tricky, because
MergePolicy expects to receive SegmentInfo instances, not IndexReader
instances, to make its decisions.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to