[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403576#comment-17403576 ] Adrien Grand commented on LUCENE-10016: --- TestDemo fails with SimpleText, you can reproduce with {code} gradlew -Dtests.codec=SimpleText :lucene:demo:test --tests "org.apache.lucene.demo.TestDemo.testKnnVectorSearch" {code} I opened LUCENE-10063. > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: main (9.0) > > Time Spent: 7h > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401010#comment-17401010 ] ASF subversion and git services commented on LUCENE-10016: -- Commit a37844aedd52948e06917bb870873a212ee4fea4 in lucene's branch refs/heads/main from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a37844a ] LUCENE-10016: Added KnnVector index/query support to demo > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: main (9.0) > > Time Spent: 6h 40m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398774#comment-17398774 ] Michael Sokolov commented on LUCENE-10016: -- https://github.com/apache/lucene/pull/241 adds a small token->vector dictionary to the demo, and support for indexing and search using those vectors. The search part creates a query that matches either terms or vectors. > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388899#comment-17388899 ] Michael Sokolov commented on LUCENE-10016: -- as for the demo, there is a start on something we could use in luceneutil. It would requirea a fairly large word->vector dictionary though. I think maybe the way to do it is to provide instructions for downloading the dictionary rather than shipping it as part of the demo. > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 50m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388868#comment-17388868 ] Robert Muir commented on LUCENE-10016: -- Even if it isn't in the o.a.l.demo module, a simple test similar to "TestDemo" would be a great step: https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/TestDemo.java By this, I mean a high-level unit test that uses indexwriter/indexsearcher/queries and not low-level codec apis. > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 50m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388866#comment-17388866 ] Adrien Grand commented on LUCENE-10016: --- One thing that would still be missing would be the oal.demo integration. At the same time I'm unsure if we can easily add vector search to the demo as we'd need a way to turn some data that exists on the user computer into vectors in a way that nearest-neighbor search makes sense. > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 50m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388806#comment-17388806 ] Julie Tibshirani commented on LUCENE-10016: --- Deletions are an interesting topic, I opened https://issues.apache.org/jira/browse/LUCENE-10040 for a dedicated discussion. Maybe we could close this issue in favor of that one and also https://issues.apache.org/jira/browse/LUCENE-9614, which discusses a high-level API for KNN search? If we close this, we should decide if we want to transfer its "blocker" status to those issues. > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 50m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386761#comment-17386761 ] ASF subversion and git services commented on LUCENE-10016: -- Commit 0ec93b632ce0be880a1e68902bccd07bae65602d in lucene's branch refs/heads/main from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=0ec93b6 ] LUCENE-10016: fix test case to use the same similarity in both cases > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 50m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383271#comment-17383271 ] Adrien Grand commented on LUCENE-10016: --- +1 I believe that we need to have a no-param method anyway for users who are getting started who would have no idea what a good value of fanout or ef would be. We can discuss adding expert methods that expose the accuracy/speed trade-off later if needed. Is someone looking into the other suggestion of this issue, which consists of adding a {{Bits liveDocs}} parameter to the search method in order to make it possible to ignore deleted documents? > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 50m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383101#comment-17383101 ] ASF subversion and git services commented on LUCENE-10016: -- Commit acf45d8a315f94c4bf685458faa1aae24c1e8599 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=acf45d8 ] LUCENE-10016: Remove VectorValues#getSimilarityFunction. (#213) VectorValues is only about iterating over vectors in doc ID order, so it feels wrong to tie it to the similarity function. > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 40m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382588#comment-17382588 ] ASF subversion and git services commented on LUCENE-10016: -- Commit 9b5e23396092ea1d4cfb19c8a996b8fc118c33e8 in lucene's branch refs/heads/main from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9b5e233 ] LUCENE-10016: remove fanout parameter from nearest neighbor vector search (#210) > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382201#comment-17382201 ] Julie Tibshirani commented on LUCENE-10016: --- I also didn't see a precedent for it. It seemed okay to me, but understand your concern. I'm not sure reader attributes would work because it could be common to adjust these parameters per-request. Coming back to this, I may have been too worried about this API simplification. The vectors format is experimental, and it's always possible to evolve it when we consider new NN algorithms. This sort of change is self-contained and doesn't affect index format. So perhaps we could move ahead with it but make sure to keep in mind that not all algorithms fit into this mold? > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382000#comment-17382000 ] Adrien Grand commented on LUCENE-10016: --- I'm not comfortable with having an API parameter like recallFactor whose semantics would depend on the codec. I don't think we have a precedent for this. Maybe something like reader attributes ([https://github.com/apache/lucene-solr/pull/640|https://github.com/apache/lucene-solr/pull/640)], introduced to make it possible to configure whether to load the terms index on or off heap, and later removed when we decided to always load it off-heap) would be a better way to configure these read-time configuration options of vectors, assuming that it wouldn't be a common use-case to tune them on a per-request basis? > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378871#comment-17378871 ] Julie Tibshirani commented on LUCENE-10016: --- I'm sorry for jumping in late -- I actually think having a parameter here to control recall makes sense and that we should keep it. I agree it'd be good to rename general and not specific to HNSW though, for example in LUCENE-9322 we called it {{recallFactor}}. Explaining my reasoning -- in the current implementation, you can indeed just scale K in order to increase recall. But many other ANN algorithms have recall-tuning parameters that can't be controlled through K. Some examples: * ScaNN (the current leader in ann-benchmarks) is based on a quantization technique, where vectors are grouped into clusters or 'leaves'. There is a search-time parameter to control the number of leaves that are considered as candidates. This is a totally separate concept from K -- these candidates are never fully ranked against each other, to avoid unnecessary distance computations. * Multi-probe LSH (which I think is implemented in the elastiknn plugin?) has a number of probes 'T' defining the extra number of hash buckets to check per query. This is also separate from K, it increases the initial candidate set but not all of these vectors will be ranked and returned. In other places we've worked hard to keep the API general enough to support other implementations, and I see keeping this parameter as part of that effort. Not as important an example, but the HNSW algorithm also treats K as separate from its recall factor 'ef'. In the current-setup, we're able to align the API to the algorithm description in the paper and its reference implementations, which I think is easier to understand for users. > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 10m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378489#comment-17378489 ] Michael Sokolov commented on LUCENE-10016: -- I posted a PR removing fanout. For the question of how to integrate with "regular" search, handle deletions, etc, let's track over in LUCENE-9614 > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 10m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373038#comment-17373038 ] Michael Sokolov commented on LUCENE-10016: -- > We can move it to a codec parameter? Probably we can just remove altogether. Users that want to increase fanout can effectively do it by increasing top K and discarding all but the top K2 < K. > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372057#comment-17372057 ] Robert Muir commented on LUCENE-10016: -- [~sokolov] let's not make these assumptions in such an abstraction. We can move it to a codec parameter? > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372056#comment-17372056 ] Michael Sokolov commented on LUCENE-10016: -- Hmm I think we expect that any approximate nearest-neighbor algorithm is going to have a parameter that trades off speed for accuracy. Fanout is not a good name for it, but I think it is a useful knob > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org