benwtrent commented on PR #13525:
URL: https://github.com/apache/lucene/pull/13525#issuecomment-2200120237

   >  I could update the existing FlatVectorsFormat and write these data 
offsets only for when the field is a tensor.
   
   I was thinking something like this. We should dynamically handle if more 
than one vector is provided. Having to configure up front "Hey, more than one 
vector is incoming" is weird. Why would anybody ever configure the "single 
vector case" as the multi-vector case would also just handle the single vector 
one. Seems like we don't need specialized formats but should instead update the 
current flat vector formats. 
   
   
   
   > I feel this fits conveniently with a lot of our existing interfaces. Do 
you see a specific need for [Float|Byte]VectorValues to iterate on individual 
vector values instead?
   
   Yes, we need to be able to iterate vectors via doc Ids and gather each 
individual vector for a given document. 
   
   Three concerns immediately come to mind:
   
    - rescoring docs that are gathered via some quantized methodology
    - Determining the true nearest vector for a given document.
    - Ability to iterate vectors when quantizing them. We need randomly sample 
across all vectors. We don't want to sample via docs ids, this will likely add 
bias and hurt the quantization quality.
   
   
   As for adding new information to the FieldInfo, another valid option is 
making it configurable directly on the format and not update fieldinfo. I am 
not sure its valuable to have it in fieldinfo. I wouldn't expect the useages 
for how to resolve the multi-vector scoring to be as broad as our similarity 
functions or vector dimensions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to