Hi Folks,

We're trying to build a search architecture using segment replication
(indexer and searcher are separated and indexer shipping new segments to
searchers) right now and one of the problems we're facing is: for
availability reason we need to have multiple indexers running, and when the
searcher is switching from consuming one indexer to another, there are
chances where the segment names collide with each other (because segment
names are count based) and the searcher have to reload the whole index.
To avoid that we're looking for a way to name the segments so that Lucene
is able to tell the difference and load only the difference (by calling
`openIfChanged`). I've checked the IndexWriter and the DocumentsWriter and
it seems it is controlled by a private final method `newSegmentName()` so
likely not possible there. So I wonder whether there's any other ways
people are aware of that can help control the segment names?

A example of the situation described above:
Searcher previously consuming from indexer 1, and have following segments:
_1, _2, _3, _4
Indexer 2 previously sync'd from indexer 1, sharing the first 3 segments,
and produced its own 4th segments (notioned as _4', but it shares the same
"_4" name): _1, _2, _3, _4'
Suddenly Indexer 1 dies and searcher switched from Indexer 1 to Indexer 2,
then when it finished downloading the segments and trying to refresh the
reader, it will likely hit the exception here
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/StandardDirectoryReader.java#L218>,
and seems all we can do right now is to reload the whole index and that
could be potentially a high cost.

Sorry for the long email and thank you in advance for any replies!

Best
Patrick

Reply via email to