enhancing data locality wrt certain document clusters

Tommaso Teofili Thu, 18 May 2017 02:04:27 -0700

Hi all,

I am working on a use case where my Lucene index stores documents composed
by (relatively short) text and binary values, at retrieval time I need to
retrieve documents that belong to a set of cluster values (e.g. facets).
In that context I was wondering if and how it'd be possible to make it more
probable that documents (and associated docValues) that belong to a same
cluster fall into the same segment.
That would allow to have a higher storage locality [1] and presumably a
better performance (given docs belonging to the same clusters get retrieved
together most of the times in my use case).
At first I had looked into extending the DV format but that's segment
agnostic therefore I am thinking of coming up with a merge policy which
produces segments whose docs belong to the same cluster with a high
probability.
Any other ideas / suggestions ?


Regards,
Tommaso

[1] : https://en.wikipedia.org/wiki/Locality_of_reference

enhancing data locality wrt certain document clusters

Reply via email to