[
https://issues.apache.org/jira/browse/LUCENE-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julie Tibshirani resolved LUCENE-10559.
---------------------------------------
Fix Version/s: 9.4
Resolution: Fixed
> Add preFilter/postFilter options to KnnGraphTester
> --------------------------------------------------
>
> Key: LUCENE-10559
> URL: https://issues.apache.org/jira/browse/LUCENE-10559
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael Sokolov
> Priority: Major
> Fix For: 9.4
>
> Time Spent: 6h 40m
> Remaining Estimate: 0h
>
> We want to be able to test the efficacy of pre-filtering in KnnVectorQuery:
> if you (say) want the top K nearest neighbors subject to a constraint Q, are
> you better off over-selecting (say 2K) top hits and *then* filtering
> (post-filtering), or incorporating the filtering into the query
> (pre-filtering). How does it depend on the selectivity of the filter?
> I think we can get a reasonable testbed by generating a uniform random filter
> with some selectivity (that is consistent and repeatable). Possibly we'd also
> want to try filters that are correlated with index order, but it seems they'd
> be unlikely to be correlated with vector values in a way that the graph
> structure would notice, so random is a pretty good starting point for this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]