[ 
https://issues.apache.org/jira/browse/CASSANDRA-19185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Adamson updated CASSANDRA-19185:
-------------------------------------
          Fix Version/s: 5.0-beta2
                             (was: 5.x)
                             (was: 5.0.x)
          Since Version: 5.0-alpha1
    Source Control Link: 
https://github.com/apache/cassandra/commit/7aab61b06357ce0b59977715f82fed1ad24474b4
             Resolution: Fixed
                 Status: Resolved  (was: Ready to Commit)

Committed as

https://github.com/apache/cassandra/commit/7aab61b06357ce0b59977715f82fed1ad24474b4

> Vector search tests are failing on recall accuracy
> --------------------------------------------------
>
>                 Key: CASSANDRA-19185
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19185
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/SAI
>            Reporter: Mike Adamson
>            Assignee: Mike Adamson
>            Priority: Normal
>             Fix For: 5.0-beta2
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Vector tests are failing randomly because they do not meet recall assertion 
> values. Currently, the following tests have been reported as failing:
> VectorSegmentationTest.testMultipleSegmentsForCompaction
> VectorDistributedTest.rangeRestrictedTest
> VectorDistributedTest.testPartitionRestrictedVectorSearch
> Since the vector searches are approximate and the vectors used in the tests 
> are random, it is unlikely that they will always meet a high recall. The 
> recall assertions are looking for recall values of 0.9 and above. Part of 
> this issue is related to the use of random values in the vectors being 
> tested. We have seen, with other tests, that the vector search performs 
> better with non-random generated datasets like the Glove datasets. As such, 
> there are the following available to fix these tests.
>  # Downgrade the assertions to a value that is likely to always pass. The 
> problem is that there is no guarantee that a test will always pass any recall 
> value we give it.
>  # Use generated datasets for these tests to see if that improves the recall 
> results.
>  # Remove the recall assertions unless they are specifically asked for. We 
> could use a system property to enable recall testing for targeted vector 
> testing.
> I don't think option 1 is a viable long-term solution as we can never be 
> certain that it will always work. Option 2 has more promise but it could 
> still result in failures because of the approximate nature of the vector 
> searches. As such, option 3 seems the only viable solution here but means 
> that, in most cases, we are only really testing that we are returning results 
> from the search, not how accurate those results are.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to