kaivalnp opened a new pull request, #15979: URL: https://github.com/apache/lucene/pull/15979
### Description Closes #14758 Add a new de-duplicating vector format that only stores unique vectors on disk. De-duplication is done for vectors across all docs and fields indexed by the format. Disclaimer: This was mostly written by an AI, with me refining the implementation through prompts -- although I think it did a pretty good job on its own! Details about the format itself (layout of vectors on disk, de-duplication strategy during flush and merge, performance tradeoffs, etc) are included in a markdown doc in the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
