[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-08-24 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403576#comment-17403576
 ] 

Adrien Grand commented on LUCENE-10016:
---

TestDemo fails with SimpleText, you can reproduce with

{code}
gradlew -Dtests.codec=SimpleText :lucene:demo:test --tests 
"org.apache.lucene.demo.TestDemo.testKnnVectorSearch"
{code}

I opened LUCENE-10063.

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: main (9.0)
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-08-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401010#comment-17401010
 ] 

ASF subversion and git services commented on LUCENE-10016:
--

Commit a37844aedd52948e06917bb870873a212ee4fea4 in lucene's branch 
refs/heads/main from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a37844a ]

LUCENE-10016: Added KnnVector index/query support to demo




> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: main (9.0)
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-08-13 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398774#comment-17398774
 ] 

Michael Sokolov commented on LUCENE-10016:
--

https://github.com/apache/lucene/pull/241 adds a small token->vector dictionary 
to the demo, and support for indexing and search using those vectors. The 
search part creates a query that matches either terms or vectors.

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-28 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388899#comment-17388899
 ] 

Michael Sokolov commented on LUCENE-10016:
--

as for the demo, there is a start on something we could use in luceneutil. It 
would requirea a fairly large word->vector dictionary though. I think maybe the 
way to do it is to provide instructions for downloading the dictionary rather 
than shipping it as part of the demo. 

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388868#comment-17388868
 ] 

Robert Muir commented on LUCENE-10016:
--

Even if it isn't in the o.a.l.demo module, a simple test similar to "TestDemo" 
would be a great step:

https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/TestDemo.java

By this, I mean a high-level unit test that uses 
indexwriter/indexsearcher/queries and not low-level codec apis.

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-28 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388866#comment-17388866
 ] 

Adrien Grand commented on LUCENE-10016:
---

One thing that would still be missing would be the oal.demo integration. At the 
same time I'm unsure if we can easily add vector search to the demo as we'd 
need a way to turn some data that exists on the user computer into vectors in a 
way that nearest-neighbor search makes sense. 

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-28 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388806#comment-17388806
 ] 

Julie Tibshirani commented on LUCENE-10016:
---

Deletions are an interesting topic, I opened 
https://issues.apache.org/jira/browse/LUCENE-10040 for a dedicated discussion. 
Maybe we could close this issue in favor of that one and also 
https://issues.apache.org/jira/browse/LUCENE-9614, which discusses a high-level 
API for KNN search? If we close this, we should decide if we want to transfer 
its "blocker" status to those issues.

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386761#comment-17386761
 ] 

ASF subversion and git services commented on LUCENE-10016:
--

Commit 0ec93b632ce0be880a1e68902bccd07bae65602d in lucene's branch 
refs/heads/main from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=0ec93b6 ]

LUCENE-10016: fix test case to use the same similarity in both cases


> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-19 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383271#comment-17383271
 ] 

Adrien Grand commented on LUCENE-10016:
---

+1 I believe that we need to have a no-param method anyway for users who are 
getting started who would have no idea what a good value of fanout or ef would 
be. We can discuss adding expert methods that expose the accuracy/speed 
trade-off later if needed.

Is someone looking into the other suggestion of this issue, which consists of 
adding a {{Bits liveDocs}} parameter to the search method in order to make it 
possible to ignore deleted documents?

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383101#comment-17383101
 ] 

ASF subversion and git services commented on LUCENE-10016:
--

Commit acf45d8a315f94c4bf685458faa1aae24c1e8599 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=acf45d8 ]

LUCENE-10016: Remove VectorValues#getSimilarityFunction. (#213)

VectorValues is only about iterating over vectors in doc ID order, so it feels
wrong to tie it to the similarity function.

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382588#comment-17382588
 ] 

ASF subversion and git services commented on LUCENE-10016:
--

Commit 9b5e23396092ea1d4cfb19c8a996b8fc118c33e8 in lucene's branch 
refs/heads/main from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9b5e233 ]

LUCENE-10016: remove fanout parameter from nearest neighbor vector search (#210)



> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-16 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382201#comment-17382201
 ] 

Julie Tibshirani commented on LUCENE-10016:
---

I also didn't see a precedent for it. It seemed okay to me, but understand your 
concern. I'm not sure reader attributes would work because it could be common 
to adjust these parameters per-request.

Coming back to this, I may have been too worried about this API simplification. 
The vectors format is experimental, and it's always possible to evolve it when 
we consider new NN algorithms. This sort of change is self-contained and 
doesn't affect index format. So perhaps we could move ahead with it but make 
sure to keep in mind that not all algorithms fit into this mold?

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-16 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382000#comment-17382000
 ] 

Adrien Grand commented on LUCENE-10016:
---

I'm not comfortable with having an API parameter like recallFactor whose 
semantics would depend on the codec. I don't think we have a precedent for this.

Maybe something like reader attributes 
([https://github.com/apache/lucene-solr/pull/640|https://github.com/apache/lucene-solr/pull/640)],
 introduced to make it possible to configure whether to load the terms index on 
or off heap, and later removed when we decided to always load it off-heap) 
would be a better way to configure these read-time configuration options of 
vectors, assuming that it wouldn't be a common use-case to tune them on a 
per-request basis?

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-11 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378871#comment-17378871
 ] 

Julie Tibshirani commented on LUCENE-10016:
---

I'm sorry for jumping in late -- I actually think having a parameter here to 
control recall makes sense and that we should keep it. I agree it'd be good to 
rename general and not specific to HNSW though, for example in LUCENE-9322 we 
called it {{recallFactor}}.

Explaining my reasoning -- in the current implementation, you can indeed just 
scale K in order to increase recall. But many other ANN algorithms have 
recall-tuning parameters that can't be controlled through K. Some examples:
* ScaNN (the current leader in ann-benchmarks) is based on a quantization 
technique, where vectors are grouped into clusters or 'leaves'. There is a 
search-time parameter to control the number of leaves that are considered as 
candidates. This is a totally separate concept from K -- these candidates are 
never fully ranked against each other, to avoid unnecessary distance 
computations.
* Multi-probe LSH (which I think is implemented in the elastiknn plugin?) has a 
number of probes 'T' defining the extra number of hash buckets to check per 
query. This is also separate from K, it increases the initial candidate set but 
not all of these vectors will be ranked and returned.

In other places we've worked hard to keep the API general enough to support 
other implementations, and I see keeping this parameter as part of that effort.

Not as important an example, but the HNSW algorithm also treats K as separate 
from its recall factor 'ef'. In the current-setup, we're able to align the API 
to the algorithm description in the paper and its reference implementations, 
which I think is easier to understand for users.

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-10 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378489#comment-17378489
 ] 

Michael Sokolov commented on LUCENE-10016:
--

I posted a PR removing fanout. For the question of how to integrate with 
"regular" search, handle deletions, etc, let's track over in LUCENE-9614

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-01 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373038#comment-17373038
 ] 

Michael Sokolov commented on LUCENE-10016:
--

> We can move it to a codec parameter?

Probably we can just remove altogether. Users that want to increase fanout can 
effectively do it by increasing top K and discarding all but the top K2 < K.

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-06-30 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372057#comment-17372057
 ] 

Robert Muir commented on LUCENE-10016:
--

[~sokolov] let's not make these assumptions in such an abstraction. We can move 
it to a codec parameter? 

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-06-30 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372056#comment-17372056
 ] 

Michael Sokolov commented on LUCENE-10016:
--

Hmm I think we expect that any approximate nearest-neighbor algorithm is going 
to have a parameter that trades off speed for accuracy. Fanout is not a good 
name for it, but I think it is a useful knob

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org