Michael McCandless wrote:
1. IndexRecoverer - assuming the "segments" file is missing or corrupted, this tool rebuilds it based on the *.cfs (and other) files found in the index dir (excludes files listed in deletable)

Excellent.  I know that various cases of "recovering an index" have
come up on the lists over time.  It would be great to have a single
tool that can try to correct the different problems that users hit, eg
removing a single unusable segments file, regenerating the segments
file, etc.

2. IndexSplitter - splits an existing index in 2, 3 or more relatively equally sized indices. It simply splits the segments files in distinct directories and the uses the IndexRecoverer to rebuild each new Index's segment file

Seems like a good tool for contrib?

If these rely only on the public index-format spec and public index apis, then they could go in contrib, which would be easiest, since expectations about back-compatibility and long-term support are lower for contrib.

But if they rely on index package internals then they should be maintained with the core. Then the question becomes: are these features that we can maintain long-term? The index implementation will likely evolve, and the existing public API should be supported through this evolution: APIs must be more durable than implementations. So, are these features things that can be supported through likely implementation changes?

I suspect they are. We've talked about making the postings format more flexible, but I have not heard anyone talk about a need to substantially alter the segments & merging model. Are we comfortable adding public APIs that depend on that model?

An index splitter is useful with parallel and/or distributed search. Splitting on segment boundaries is fairly limited, but perhaps, with clever use of IndexWriter.setMaxMergeDocs(), it is sufficient.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to