[jira] [Updated] (CASSANDRA-19185) Vector search tests are failing on recall accuracy
[ https://issues.apache.org/jira/browse/CASSANDRA-19185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-19185: --- Fix Version/s: 5.0 > Vector search tests are failing on recall accuracy > -- > > Key: CASSANDRA-19185 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19185 > Project: Cassandra > Issue Type: Bug > Components: Feature/SAI >Reporter: Mike Adamson >Assignee: Mike Adamson >Priority: Normal > Fix For: 5.0-beta2, 5.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Vector tests are failing randomly because they do not meet recall assertion > values. Currently, the following tests have been reported as failing: > VectorSegmentationTest.testMultipleSegmentsForCompaction > VectorDistributedTest.rangeRestrictedTest > VectorDistributedTest.testPartitionRestrictedVectorSearch > Since the vector searches are approximate and the vectors used in the tests > are random, it is unlikely that they will always meet a high recall. The > recall assertions are looking for recall values of 0.9 and above. Part of > this issue is related to the use of random values in the vectors being > tested. We have seen, with other tests, that the vector search performs > better with non-random generated datasets like the Glove datasets. As such, > there are the following available to fix these tests. > # Downgrade the assertions to a value that is likely to always pass. The > problem is that there is no guarantee that a test will always pass any recall > value we give it. > # Use generated datasets for these tests to see if that improves the recall > results. > # Remove the recall assertions unless they are specifically asked for. We > could use a system property to enable recall testing for targeted vector > testing. > I don't think option 1 is a viable long-term solution as we can never be > certain that it will always work. Option 2 has more promise but it could > still result in failures because of the approximate nature of the vector > searches. As such, option 3 seems the only viable solution here but means > that, in most cases, we are only really testing that we are returning results > from the search, not how accurate those results are. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19185) Vector search tests are failing on recall accuracy
[ https://issues.apache.org/jira/browse/CASSANDRA-19185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Adamson updated CASSANDRA-19185: - Fix Version/s: 5.0-beta2 (was: 5.x) (was: 5.0.x) Since Version: 5.0-alpha1 Source Control Link: https://github.com/apache/cassandra/commit/7aab61b06357ce0b59977715f82fed1ad24474b4 Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed as https://github.com/apache/cassandra/commit/7aab61b06357ce0b59977715f82fed1ad24474b4 > Vector search tests are failing on recall accuracy > -- > > Key: CASSANDRA-19185 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19185 > Project: Cassandra > Issue Type: Bug > Components: Feature/SAI >Reporter: Mike Adamson >Assignee: Mike Adamson >Priority: Normal > Fix For: 5.0-beta2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Vector tests are failing randomly because they do not meet recall assertion > values. Currently, the following tests have been reported as failing: > VectorSegmentationTest.testMultipleSegmentsForCompaction > VectorDistributedTest.rangeRestrictedTest > VectorDistributedTest.testPartitionRestrictedVectorSearch > Since the vector searches are approximate and the vectors used in the tests > are random, it is unlikely that they will always meet a high recall. The > recall assertions are looking for recall values of 0.9 and above. Part of > this issue is related to the use of random values in the vectors being > tested. We have seen, with other tests, that the vector search performs > better with non-random generated datasets like the Glove datasets. As such, > there are the following available to fix these tests. > # Downgrade the assertions to a value that is likely to always pass. The > problem is that there is no guarantee that a test will always pass any recall > value we give it. > # Use generated datasets for these tests to see if that improves the recall > results. > # Remove the recall assertions unless they are specifically asked for. We > could use a system property to enable recall testing for targeted vector > testing. > I don't think option 1 is a viable long-term solution as we can never be > certain that it will always work. Option 2 has more promise but it could > still result in failures because of the approximate nature of the vector > searches. As such, option 3 seems the only viable solution here but means > that, in most cases, we are only really testing that we are returning results > from the search, not how accurate those results are. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org