Hi all, I am working on a use case where my Lucene index stores documents composed by (relatively short) text and binary values, at retrieval time I need to retrieve documents that belong to a set of cluster values (e.g. facets). In that context I was wondering if and how it'd be possible to make it more probable that documents (and associated docValues) that belong to a same cluster fall into the same segment. That would allow to have a higher storage locality [1] and presumably a better performance (given docs belonging to the same clusters get retrieved together most of the times in my use case). At first I had looked into extending the DV format but that's segment agnostic therefore I am thinking of coming up with a merge policy which produces segments whose docs belong to the same cluster with a high probability. Any other ideas / suggestions ?
Regards, Tommaso [1] : https://en.wikipedia.org/wiki/Locality_of_reference