Michael McCandless wrote:
1. IndexRecoverer - assuming the "segments" file is missing or
corrupted, this tool rebuilds it based on the *.cfs (and other) files
found in the index dir (excludes files listed in deletable)
Excellent. I know that various cases of "recovering an index" have
come up on the lists over time. It would be great to have a single
tool that can try to correct the different problems that users hit, eg
removing a single unusable segments file, regenerating the segments
file, etc.
2. IndexSplitter - splits an existing index in 2, 3 or more relatively
equally sized indices. It simply splits the segments files in distinct
directories and the uses the IndexRecoverer to rebuild each new
Index's segment file
Seems like a good tool for contrib?
If these rely only on the public index-format spec and public index
apis, then they could go in contrib, which would be easiest, since
expectations about back-compatibility and long-term support are lower
for contrib.
But if they rely on index package internals then they should be
maintained with the core. Then the question becomes: are these features
that we can maintain long-term? The index implementation will likely
evolve, and the existing public API should be supported through this
evolution: APIs must be more durable than implementations. So, are
these features things that can be supported through likely
implementation changes?
I suspect they are. We've talked about making the postings format more
flexible, but I have not heard anyone talk about a need to substantially
alter the segments & merging model. Are we comfortable adding public
APIs that depend on that model?
An index splitter is useful with parallel and/or distributed search.
Splitting on segment boundaries is fairly limited, but perhaps, with
clever use of IndexWriter.setMaxMergeDocs(), it is sufficient.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]