I think the idea is to exert control over the distribution of documents
among the segments, in a deterministic reproducible way.

On Sat, Dec 19, 2020, 11:39 AM Adrien Grand <jpou...@gmail.com> wrote:

> Have you considered leveraging Lucene's built-in index sorting? It
> supports concurrent indexing and is quite fast.
>
> On Fri, Dec 18, 2020 at 7:26 PM Haoyu Zhai <zhai7...@gmail.com> wrote:
>
>> Hi
>> Our team is seeking a way of construct (or rebuild) a deterministic
>> sorted index concurrently (I know lucene could achieve that in a sequential
>> manner but that might be too slow for us sometimes)
>> Currently we have roughly 2 ideas, all assuming there's a pre-built index
>> and have dumped a doc-segment map so that IndexWriter would be able to be
>> aware of which doc belong to which segment:
>> 1. First build index in the normal way (concurrently), after the index is
>> built, using "addIndexes" functionality to merge documents into the correct
>> segment.
>> 2. By controlling FlushPolicy and other related classes, make sure each
>> segment created (before merge) has only the documents that belong to one of
>> the segments in the pre-built index. And create a dedicated MergePolicy to
>> only merge segments belonging to one pre-built segment.
>>
>> Basically we think first one is easier to implement and second one is
>> faster. Want to seek some ideas & suggestions & feedback here.
>>
>> Thanks
>> Patrick Zhai
>>
>
>
> --
> Adrien
>

Reply via email to