Re: Merging "orphaned" segments into a composite index

Andrzej Bialecki Sat, 16 Sep 2006 06:09:10 -0700

Rob Staveley (Tom) wrote:

It looks like my segments file only contains information for the .cfs
segments. So this approach doesn't work. I was wondering if I could use
IndexWriter.addIndexes(IndexReader[]) instead. Can I open an IndexReader
without a corresponding segments file? In notice that IndexReader.open(...)
always operates on directories, which I guess means that it uses the

segments file.

Is a segments file something that can be easily bodged for a bunch of index
files which aren't referenced by the segments file?


This probably all seems like a foolish errand, but my two indexes are > 300G
each and regenerating them is something I'd like to avoid.



1. do a backup first !!!

3. separate the segment data which is accounted for in the current"segments" file from all other data, and move that to a "healthy" index.Move the rest of the info to a "corrupt" index. Make sure that the"healthy" index is still healthy after this operation .. ;)

2. look at the source oforg.apache.lucene.index.SegmentInfos.read(Directory) andwrite(Directory). You can see how the new "segments" file is createdbased on the SegmentInfo information. So, the only challenge is tocreate a bunch of SegmentInfo instances corresponding to your segmentnames in the "corrupt" index, and write them out to a new "segments"file according to this format.

3. you can easily discover the number of documents in each segment. Thisis equivalent to the length (in bytes) of each <segment_name>.f<number>file, which are storing lengthNorm info per document and per field.

Once you have written the new "segments" file, try to open it inIndexReader and iterate over documents to see if the index is basically"healthy". You could also do an IndexWriter.optimize() to make sure thatall parts of the index correctly fit together.

A note: if a cfs file is corrupted, you can try to explode it intoindividual files - you can useorg.apache.lucene.index.CompoundFileReader, iterate over file names,open streams and save them to regular files; and then do the trick withre-generating the segment file.

Then you may consider this "corrupt" index to be restored. Now you canmerge your "healthy" and "restored" indexes using IndexWriter.addIndexes().

BTW. I strongly recommend doing frequent backups at different stages.Also, I recommend using BeanShell - you will save a lot of time youwould otherwise spend on editing and compilation of these steps.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Merging "orphaned" segments into a composite index

Reply via email to