dungba88 commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-2499978963
I think there are still 2 issues to address: - Prevent quantized vectors from being swapped out: Loading full-precision vectors are costly and can cause the quantized vectors to be swapped out if the OS is under memory pressure. Maybe we can use something similar to `mlock` if the system supports it. But I guess it can be done by the developers instead having it built-in support in the re-ranking Query. - The latency could be better. I'm still running a thorough benchmark with KnnGraphTester, but preliminary results show the re-ranking step adds quite some latency. Maybe we can execute the re-ranking per segment in parallel, or apply some optimization. Another thing is that we are running the rewrite phase and createRewrittenQuery twice: once for the main search phase and one for the re-ranking phase. Not sure how much overhead it will introduce. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
