Hi David,

I see your point, I am not saying such big low level changes are badly
needed today for most of production scenarios; I am just observing that it
might become a possibly useful extension, e.g. today word / document
embeddings are being used more and more (mostly in research) so that
retrieving / scoring docs belonging to same the cluster (or near/similar
embeddings wise, regardless of the metric) is a significant part of
the query (retrieving/ranking) part.

However I think your suggestion to look in easier solutions first like
MultiReader is a good one, e.g. in "my" use case if each doc belongs to a
single cluster it might be good to create an index per cluster.

Thanks and regards,
Tommaso

Il giorno lun 16 ott 2017 alle ore 21:28 David Smiley <
[email protected]> ha scritto:

> Hi Tomaso,
>
> It's definitely something I've pondered on occasion but I'm left wondering
> (a) is it worth it (experimentation will tell), and (b) perhaps Lucene
> doesn't need anything new here: see MultiReader. Arguably this can be
> handled at the search server layer by constructing multiple IndexWriters
> and then a MultiReader over their collective indexes.  Perhaps a special
> IndexSearcher QueryCache could be developed to partition itself on the
> separate underlying readers.  Of course it would probably take a lot of
> work to retrofit, say Solr, to do this but I'm dubious Lucene should be
> saddled with unneeded complexity for this.
>
> On Thu, Oct 12, 2017 at 9:55 AM Tommaso Teofili <[email protected]>
> wrote:
>
>> Hi all,
>>
>> having been involved in such kind of challenge and having seen a few more
>> similar enquiries on the dev list, I was wondering if it may be time to
>> think about making it possible to have an explicit (customizable and
>> therefore pluggable) policy which allows people to chime into where
>> documents and / or segments get written (on write or on merge).
>> Recently there was someone asking about possibly having segments sorted
>> by a field using SortingMergePolicy, but as Uwe noted it's currently an
>> implementation detail. Personally I have tried (and failed because it was
>> too costly) to make sure docs belonging to certain clusters (identified by
>> a field) being written within same segments (for data locality / memory
>> footprint concerns when "loading" docs from a certain cluster).
>>
>> As of today that'd be *really* hard, but I just wanted to share my
>> feeling that such topic might be something to keep an eye on.
>>
>> My 2 cents,
>> Tommaso
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>

Reply via email to