[ 
https://issues.apache.org/jira/browse/SOLR-17948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Puneet Ahuja updated SOLR-17948:
--------------------------------
    Description: 
Currently, when a document containing a primitive float[] or double[] field is 
sent to Solr using the JavaBin format, indexing fails because DenseVectorParser 
does not recognize primitive arrays as valid input types. Other Solr loaders 
(JSON, CSV, XML) typically represent vector values as lists when parsed, which 
means the ability to accept primitive float[]/double[] would particularly 
benefit JavaBin use cases—allowing more compact serialization paths for clients 
that can produce primitive arrays.

JavaBin (including SolrJ’s JavaBin codec) can serialize primitive arrays 
efficiently without boxing. Today, users must box vectors into 
List<Float>/List<Double>, which adds padding/overhead and produces larger 
payloads. Accepting primitive arrays allows everyone to send leaner JavaBin 
updates and reduce overhead.

I plan to extend DenseVectorParser to handle float[] and double[] inputs in 
addition to the existing List-based formats.

In typical cases, JavaBin request bodies can be ~20% smaller when vectors are 
sent as primitive arrays instead of boxed lists, and Solr will parse and index 
them correctly.

 

Manual test I conducted:



1. Write javabin with both List and primitive float.
2. Then we index both these payloads, and search on both of them to validate 
the index.
We do this using solrj client.

Script used: [https://gist.github.com/punAhuja/c77cc60e396ccf7aa5a55a92ba23ffc3]

JavaBin sizes:
List : 63.1 MB (66188931 bytes)
float[] : 51.1 MB (53588931 bytes)
Savings : 12.0 MB (19.04% smaller)

  was:
Currently, when a document containing a primitive float[] or double[] field is 
sent to Solr using the JavaBin format, indexing fails because DenseVectorParser 
does not recognize primitive arrays as valid input types. Other Solr loaders 
(JSON, CSV, XML) typically represent vector values as lists when parsed, which 
means the ability to accept primitive float[]/double[] would particularly 
benefit JavaBin use cases—allowing more compact serialization paths for clients 
that can produce primitive arrays.

JavaBin (including SolrJ’s JavaBin codec) can serialize primitive arrays 
efficiently without boxing. Today, users must box vectors into 
List<Float>/List<Double>, which adds padding/overhead and produces larger 
payloads. Accepting primitive arrays allows everyone to send leaner JavaBin 
updates and reduce overhead.

I plan to extend DenseVectorParser to handle float[] and double[] inputs in 
addition to the existing List-based formats.

In typical cases, JavaBin request bodies can be ~20% smaller when vectors are 
sent as primitive arrays instead of boxed lists, and Solr will parse and index 
them correctly.


> Support indexing primitive float[] values for DenseVectorField via JavaBin
> --------------------------------------------------------------------------
>
>                 Key: SOLR-17948
>                 URL: https://issues.apache.org/jira/browse/SOLR-17948
>             Project: Solr
>          Issue Type: Task
>            Reporter: Puneet Ahuja
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: main (10.0)
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, when a document containing a primitive float[] or double[] field 
> is sent to Solr using the JavaBin format, indexing fails because 
> DenseVectorParser does not recognize primitive arrays as valid input types. 
> Other Solr loaders (JSON, CSV, XML) typically represent vector values as 
> lists when parsed, which means the ability to accept primitive 
> float[]/double[] would particularly benefit JavaBin use cases—allowing more 
> compact serialization paths for clients that can produce primitive arrays.
> JavaBin (including SolrJ’s JavaBin codec) can serialize primitive arrays 
> efficiently without boxing. Today, users must box vectors into 
> List<Float>/List<Double>, which adds padding/overhead and produces larger 
> payloads. Accepting primitive arrays allows everyone to send leaner JavaBin 
> updates and reduce overhead.
> I plan to extend DenseVectorParser to handle float[] and double[] inputs in 
> addition to the existing List-based formats.
> In typical cases, JavaBin request bodies can be ~20% smaller when vectors are 
> sent as primitive arrays instead of boxed lists, and Solr will parse and 
> index them correctly.
>  
> Manual test I conducted:
> 1. Write javabin with both List and primitive float.
> 2. Then we index both these payloads, and search on both of them to validate 
> the index.
> We do this using solrj client.
> Script used: 
> [https://gist.github.com/punAhuja/c77cc60e396ccf7aa5a55a92ba23ffc3]
> JavaBin sizes:
> List : 63.1 MB (66188931 bytes)
> float[] : 51.1 MB (53588931 bytes)
> Savings : 12.0 MB (19.04% smaller)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to