[
https://issues.apache.org/jira/browse/SOLR-17948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Puneet Ahuja updated SOLR-17948:
--------------------------------
Description:
Currently, when a document containing a primitive float[] or double[] field is
sent to Solr using the JavaBin format, indexing fails because DenseVectorParser
does not recognize primitive arrays as valid input types. Other Solr loaders
(JSON, CSV, XML) typically represent vector values as lists when parsed, which
means the ability to accept primitive float[]/double[] would particularly
benefit JavaBin use cases—allowing more compact serialization paths for clients
that can produce primitive arrays.
JavaBin (including SolrJ’s JavaBin codec) can serialize primitive arrays
efficiently without boxing. Today, users must box vectors into
List<Float>/List<Double>, which adds padding/overhead and produces larger
payloads. Accepting primitive arrays allows everyone to send leaner JavaBin
updates and reduce overhead.
I plan to extend DenseVectorParser to handle float[] and double[] inputs in
addition to the existing List-based formats.
In typical cases, JavaBin request bodies can be ~20% smaller when vectors are
sent as primitive arrays instead of boxed lists, and Solr will parse and index
them correctly.
Manual test I conducted:
1. Write javabin with both List and primitive float.
2. Then we index both these payloads, and search on both of them to validate
the index.
We do this using solrj client.
Script used: [https://gist.github.com/punAhuja/c77cc60e396ccf7aa5a55a92ba23ffc3]
JavaBin sizes:
List : 63.1 MB (66188931 bytes)
float[] : 51.1 MB (53588931 bytes)
Savings : 12.0 MB (19.04% smaller)
was:
Currently, when a document containing a primitive float[] or double[] field is
sent to Solr using the JavaBin format, indexing fails because DenseVectorParser
does not recognize primitive arrays as valid input types. Other Solr loaders
(JSON, CSV, XML) typically represent vector values as lists when parsed, which
means the ability to accept primitive float[]/double[] would particularly
benefit JavaBin use cases—allowing more compact serialization paths for clients
that can produce primitive arrays.
JavaBin (including SolrJ’s JavaBin codec) can serialize primitive arrays
efficiently without boxing. Today, users must box vectors into
List<Float>/List<Double>, which adds padding/overhead and produces larger
payloads. Accepting primitive arrays allows everyone to send leaner JavaBin
updates and reduce overhead.
I plan to extend DenseVectorParser to handle float[] and double[] inputs in
addition to the existing List-based formats.
In typical cases, JavaBin request bodies can be ~20% smaller when vectors are
sent as primitive arrays instead of boxed lists, and Solr will parse and index
them correctly.
> Support indexing primitive float[] values for DenseVectorField via JavaBin
> --------------------------------------------------------------------------
>
> Key: SOLR-17948
> URL: https://issues.apache.org/jira/browse/SOLR-17948
> Project: Solr
> Issue Type: Task
> Reporter: Puneet Ahuja
> Priority: Major
> Labels: pull-request-available
> Fix For: main (10.0)
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Currently, when a document containing a primitive float[] or double[] field
> is sent to Solr using the JavaBin format, indexing fails because
> DenseVectorParser does not recognize primitive arrays as valid input types.
> Other Solr loaders (JSON, CSV, XML) typically represent vector values as
> lists when parsed, which means the ability to accept primitive
> float[]/double[] would particularly benefit JavaBin use cases—allowing more
> compact serialization paths for clients that can produce primitive arrays.
> JavaBin (including SolrJ’s JavaBin codec) can serialize primitive arrays
> efficiently without boxing. Today, users must box vectors into
> List<Float>/List<Double>, which adds padding/overhead and produces larger
> payloads. Accepting primitive arrays allows everyone to send leaner JavaBin
> updates and reduce overhead.
> I plan to extend DenseVectorParser to handle float[] and double[] inputs in
> addition to the existing List-based formats.
> In typical cases, JavaBin request bodies can be ~20% smaller when vectors are
> sent as primitive arrays instead of boxed lists, and Solr will parse and
> index them correctly.
>
> Manual test I conducted:
> 1. Write javabin with both List and primitive float.
> 2. Then we index both these payloads, and search on both of them to validate
> the index.
> We do this using solrj client.
> Script used:
> [https://gist.github.com/punAhuja/c77cc60e396ccf7aa5a55a92ba23ffc3]
> JavaBin sizes:
> List : 63.1 MB (66188931 bytes)
> float[] : 51.1 MB (53588931 bytes)
> Savings : 12.0 MB (19.04% smaller)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]