Re: enhancing data locality wrt certain document clusters

Tommaso Teofili Fri, 19 May 2017 03:39:13 -0700

Thanks Adrien, it sounds like a good suggestion, I'll try it out.
Another approach might be to use separate per cluster indexes, there one
can somehow control the no. of segments, however that wouldn't probably
scale with lots of clusters (and sounds weird too).


Regards,
Tommaso


Il giorno gio 18 mag 2017 alle ore 16:54 Adrien Grand <[email protected]>
ha scritto:

> You can't make documents more likely to be in the same segment, however
> I'm thinking you could use index sorting to make documents closer to each
> other on a per-segment basis?
>
> Le jeu. 18 mai 2017 à 11:04, Tommaso Teofili <[email protected]>
> a écrit :
>
>> Hi all,
>>
>> I am working on a use case where my Lucene index stores documents
>> composed by (relatively short) text and binary values, at retrieval time I
>> need to retrieve documents that belong to a set of cluster values (e.g.
>> facets).
>> In that context I was wondering if and how it'd be possible to make it
>> more probable that documents (and associated docValues) that belong to a
>> same cluster fall into the same segment.
>> That would allow to have a higher storage locality [1] and presumably a
>> better performance (given docs belonging to the same clusters get retrieved
>> together most of the times in my use case).
>> At first I had looked into extending the DV format but that's segment
>> agnostic therefore I am thinking of coming up with a merge policy which
>> produces segments whose docs belong to the same cluster with a high
>> probability.
>> Any other ideas / suggestions ?
>>
>> Regards,
>> Tommaso
>>
>> [1] : https://en.wikipedia.org/wiki/Locality_of_reference
>>
>

Re: enhancing data locality wrt certain document clusters

Reply via email to