[jira] [Updated] (CASSANDRA-19185) Vector search tests are failing on recall accuracy

2024-03-26 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-19185:
---
Fix Version/s: 5.0

> Vector search tests are failing on recall accuracy
> --
>
> Key: CASSANDRA-19185
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19185
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-beta2, 5.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Vector tests are failing randomly because they do not meet recall assertion 
> values. Currently, the following tests have been reported as failing:
> VectorSegmentationTest.testMultipleSegmentsForCompaction
> VectorDistributedTest.rangeRestrictedTest
> VectorDistributedTest.testPartitionRestrictedVectorSearch
> Since the vector searches are approximate and the vectors used in the tests 
> are random, it is unlikely that they will always meet a high recall. The 
> recall assertions are looking for recall values of 0.9 and above. Part of 
> this issue is related to the use of random values in the vectors being 
> tested. We have seen, with other tests, that the vector search performs 
> better with non-random generated datasets like the Glove datasets. As such, 
> there are the following available to fix these tests.
>  # Downgrade the assertions to a value that is likely to always pass. The 
> problem is that there is no guarantee that a test will always pass any recall 
> value we give it.
>  # Use generated datasets for these tests to see if that improves the recall 
> results.
>  # Remove the recall assertions unless they are specifically asked for. We 
> could use a system property to enable recall testing for targeted vector 
> testing.
> I don't think option 1 is a viable long-term solution as we can never be 
> certain that it will always work. Option 2 has more promise but it could 
> still result in failures because of the approximate nature of the vector 
> searches. As such, option 3 seems the only viable solution here but means 
> that, in most cases, we are only really testing that we are returning results 
> from the search, not how accurate those results are.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19185) Vector search tests are failing on recall accuracy

2024-03-21 Thread Mike Adamson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Adamson updated CASSANDRA-19185:
-
  Fix Version/s: 5.0-beta2
 (was: 5.x)
 (was: 5.0.x)
  Since Version: 5.0-alpha1
Source Control Link: 
https://github.com/apache/cassandra/commit/7aab61b06357ce0b59977715f82fed1ad24474b4
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed as

https://github.com/apache/cassandra/commit/7aab61b06357ce0b59977715f82fed1ad24474b4

> Vector search tests are failing on recall accuracy
> --
>
> Key: CASSANDRA-19185
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19185
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SAI
>Reporter: Mike Adamson
>Assignee: Mike Adamson
>Priority: Normal
> Fix For: 5.0-beta2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Vector tests are failing randomly because they do not meet recall assertion 
> values. Currently, the following tests have been reported as failing:
> VectorSegmentationTest.testMultipleSegmentsForCompaction
> VectorDistributedTest.rangeRestrictedTest
> VectorDistributedTest.testPartitionRestrictedVectorSearch
> Since the vector searches are approximate and the vectors used in the tests 
> are random, it is unlikely that they will always meet a high recall. The 
> recall assertions are looking for recall values of 0.9 and above. Part of 
> this issue is related to the use of random values in the vectors being 
> tested. We have seen, with other tests, that the vector search performs 
> better with non-random generated datasets like the Glove datasets. As such, 
> there are the following available to fix these tests.
>  # Downgrade the assertions to a value that is likely to always pass. The 
> problem is that there is no guarantee that a test will always pass any recall 
> value we give it.
>  # Use generated datasets for these tests to see if that improves the recall 
> results.
>  # Remove the recall assertions unless they are specifically asked for. We 
> could use a system property to enable recall testing for targeted vector 
> testing.
> I don't think option 1 is a viable long-term solution as we can never be 
> certain that it will always work. Option 2 has more promise but it could 
> still result in failures because of the approximate nature of the vector 
> searches. As such, option 3 seems the only viable solution here but means 
> that, in most cases, we are only really testing that we are returning results 
> from the search, not how accurate those results are.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org