I replied with a tentative comment on SOLR-17974 - let's continue
discussion there!

Jason

On Wed, Oct 22, 2025 at 2:56 PM Chris Hostetter
<[email protected]> wrote:
>
>
> TL;DR: How much do we care about our client APIs supporting multi-valued
> fields where each value is a "Vector" (aka: "java.util.List" of numbers),
> when the alternative (for now) is to use a String representation of each
> value ?
>
> ---
>
> Once upon a time, I was working on some custom plugin work where I had a
> field type using "Tuples" (of strings) as values -- but the field type
> also needed to be multi-valued (ie: a variable number of tuples in a
> single document)
>
> Writing my field type (as a composite around multiple abstracted away
> indexed & docValue fields) was pretty easy, but I ran into weirdness
> writing tests with indexing (with SolrInputDocument), and getting my
> "stored" values back (via SolrDocument) because I was trying to represent
> my "Tuples" in code as java.util.List instances (to play nice with Solr's
> existing serialization logic, response writers, etc).
>
> So *each* field value was a List<String>
>
> The problem came in my tests of multi-valued fields -- due to long
> standing "convinience" logic in SolrInputDocument and SolrDocument those
> classes that assumes that if you set/add a java.util.Collection as a field
> "value", the *real* field values are the contents of that Collection -- it
> either re-uses it as is, or will add *each* of those elements to it's
> existing java.util.Collection for that field.
>
> (Since this email was getting really long & convoluted the first 3 times I
> tried to draft it, I created SOLR-17974 to go into all the details and
> attached a test case showing how weird it all is.)
>
>
> Even though it was/is possible -- since I knew how the internals work --
> to carefully set a field value in SolrInputDocument top be a
> List<List<String>>, I instead gave up and represented each "Tuple" as a
> String using a special delimiter -- making my "multi-valued" fields a
> List<String>.
>
> (Which is/was similar to how Solr's spatial field types work, and plays
> nice with all ContentStream loaders and response writers)
>
>
>
> I was reminded of all this about a year ago when I realized:
>
> 1) Solr's DenseVectorField can only be multiValues=false
>
> 2) For external representation purposes, DenseVectorField *acts* like a
> multi-valued numeric field (either List<Float> or List<Byte>)
>
> 3) There was/is work being considered in Lucene to add "multiple vectors
> per document" to the underlying HSNW graph logic (lucene/issues/12313)
>
>
> All of which raised my eyebrow: "I wonder how Solr's going to deal with
> the List<List<Float>> problem if/when that happens?" ... but I had other
> things on my mind at the time.
>
>
>
> Skip ahead to today...
>
> Lucene still doesn't support multiple vectors per document in the HNSW
> graphs, but 10.3 *did* introduce a new LateInteractionField which is a
> different type of "vector" field where each document has a "float[][]" (a
> variable number of fixed sized "vectors", each represented as a fixed size
> float[]).
>
> So the question becomes: How to we represent these document values in Solr
> if we want to add support for this new field type? (SOLR-17975)
>
>
> The most expedient approach would be to follow in the footsteps of the
> spatial fields (and my old "Tuple" type) and use a String encoding --
> either mapping a String<->float[] and being multiValued=true in schema, or
> mapping String<->float[][] and requiring multiValued=false (the "query"
> side of this field type will already need some way for users to express a
> "float[][]" in a query string)
>
>
> The alternative is to pay off our very old tech dept: SOLR-17974.
>
> *IF* we redesign SolrInputDocument and SolrDocument to have more explicit
> APIs allowing for the possibility that a single "value" in a multi-valued
> field might be represented as a complex (possibly nested)
> java.util.Collection of primatives (recognizing that along the way, we
> will probably find lots of other places in the code base that make
> assumptions about multi-valued fields, just because it's alwasy been that
> way.) ... *THEN* ... we could model each vector value as a "List<Float>"
> and still have "multi-valued" vector fields.
>
>
> But how much to people actaully care about this?
>
> Are there other usecases where his would be useful?
>
> Is having a "cleaner" API worth the headaches of changing this now?
>
>     ?
>
>
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to