benwtrent commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2200120237
> I could update the existing FlatVectorsFormat and write these data offsets only for when the field is a tensor. I was thinking something like this. We should dynamically handle if more than one vector is provided. Having to configure up front "Hey, more than one vector is incoming" is weird. Why would anybody ever configure the "single vector case" as the multi-vector case would also just handle the single vector one. Seems like we don't need specialized formats but should instead update the current flat vector formats. > I feel this fits conveniently with a lot of our existing interfaces. Do you see a specific need for [Float|Byte]VectorValues to iterate on individual vector values instead? Yes, we need to be able to iterate vectors via doc Ids and gather each individual vector for a given document. Three concerns immediately come to mind: - rescoring docs that are gathered via some quantized methodology - Determining the true nearest vector for a given document. - Ability to iterate vectors when quantizing them. We need randomly sample across all vectors. We don't want to sample via docs ids, this will likely add bias and hurt the quantization quality. As for adding new information to the FieldInfo, another valid option is making it configurable directly on the format and not update fieldinfo. I am not sure its valuable to have it in fieldinfo. I wouldn't expect the useages for how to resolve the multi-vector scoring to be as broad as our similarity functions or vector dimensions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org