p.s. Adrien, any docs / references on how to implement index time sorting for versions prior to 6.2 and LUCENE-6766 ?
Il giorno ven 19 mag 2017 alle ore 12:38 Tommaso Teofili < tommaso.teof...@gmail.com> ha scritto: > Thanks Adrien, it sounds like a good suggestion, I'll try it out. > Another approach might be to use separate per cluster indexes, there one > can somehow control the no. of segments, however that wouldn't probably > scale with lots of clusters (and sounds weird too). > > Regards, > Tommaso > > > Il giorno gio 18 mag 2017 alle ore 16:54 Adrien Grand <jpou...@gmail.com> > ha scritto: > >> You can't make documents more likely to be in the same segment, however >> I'm thinking you could use index sorting to make documents closer to each >> other on a per-segment basis? >> >> Le jeu. 18 mai 2017 à 11:04, Tommaso Teofili <tommaso.teof...@gmail.com> >> a écrit : >> >>> Hi all, >>> >>> I am working on a use case where my Lucene index stores documents >>> composed by (relatively short) text and binary values, at retrieval time I >>> need to retrieve documents that belong to a set of cluster values (e.g. >>> facets). >>> In that context I was wondering if and how it'd be possible to make it >>> more probable that documents (and associated docValues) that belong to a >>> same cluster fall into the same segment. >>> That would allow to have a higher storage locality [1] and presumably a >>> better performance (given docs belonging to the same clusters get retrieved >>> together most of the times in my use case). >>> At first I had looked into extending the DV format but that's segment >>> agnostic therefore I am thinking of coming up with a merge policy which >>> produces segments whose docs belong to the same cluster with a high >>> probability. >>> Any other ideas / suggestions ? >>> >>> Regards, >>> Tommaso >>> >>> [1] : https://en.wikipedia.org/wiki/Locality_of_reference >>> >>