Re: enhancing data locality wrt certain document clusters

Tommaso Teofili Mon, 22 May 2017 01:55:37 -0700

p.s.
Adrien, any docs / references on how to implement index time sorting for
versions prior to 6.2 and LUCENE-6766 ?


Il giorno ven 19 mag 2017 alle ore 12:38 Tommaso Teofili <
[email protected]> ha scritto:

> Thanks Adrien, it sounds like a good suggestion, I'll try it out.
> Another approach might be to use separate per cluster indexes, there one
> can somehow control the no. of segments, however that wouldn't probably
> scale with lots of clusters (and sounds weird too).
>
> Regards,
> Tommaso
>
>
> Il giorno gio 18 mag 2017 alle ore 16:54 Adrien Grand <[email protected]>
> ha scritto:
>
>> You can't make documents more likely to be in the same segment, however
>> I'm thinking you could use index sorting to make documents closer to each
>> other on a per-segment basis?
>>
>> Le jeu. 18 mai 2017 à 11:04, Tommaso Teofili <[email protected]>
>> a écrit :
>>
>>> Hi all,
>>>
>>> I am working on a use case where my Lucene index stores documents
>>> composed by (relatively short) text and binary values, at retrieval time I
>>> need to retrieve documents that belong to a set of cluster values (e.g.
>>> facets).
>>> In that context I was wondering if and how it'd be possible to make it
>>> more probable that documents (and associated docValues) that belong to a
>>> same cluster fall into the same segment.
>>> That would allow to have a higher storage locality [1] and presumably a
>>> better performance (given docs belonging to the same clusters get retrieved
>>> together most of the times in my use case).
>>> At first I had looked into extending the DV format but that's segment
>>> agnostic therefore I am thinking of coming up with a merge policy which
>>> produces segments whose docs belong to the same cluster with a high
>>> probability.
>>> Any other ideas / suggestions ?
>>>
>>> Regards,
>>> Tommaso
>>>
>>> [1] : https://en.wikipedia.org/wiki/Locality_of_reference
>>>
>>

Re: enhancing data locality wrt certain document clusters

Reply via email to