Re: [I] String columns come back as `Utf8` even when cardinality is tiny — is that expected? [arrow-adbc]

via GitHub Tue, 28 Apr 2026 01:08:27 -0700


PavloPolovyi commented on issue #4271:
URL: https://github.com/apache/arrow-adbc/issues/4271#issuecomment-4333430128


   > The Databricks Go driver is too early in development. I think I saw it 
wasn't even enabling parallel downloads in CloudFetch. Also if Arrow-Java + JNI 
bindings are being used, I think Arrow-Java also introduces some issues around 
strings in particular...data gets copied into a separate buffer (Text type) 
before it then gets re-encoded as a String so you're paying extra copies
   
   Thanks for the comment. On our end, the CloudFetch parallel downloads do 
seem to kick in. Tuning databricks.cloudfetch.max_chunks_in_memory=16 and 
databricks.cloudfetch.link_prefetch_window=128 gave us measurable improvement, 
so the parallelism path appears to be active in the version we tested (the Rust 
databricks-adbc crate at the current main revision). The gap to JDBC is still 
real on our end (~2.5× slower on 1M-row mixed-type queries on Databricks).
   On Arrow-Java + JNI, we're not on that path directly. Our Java service reads 
the JDBC ResultSet via the standard getObject() API and builds its own internal 
columnar format from there, with no Arrow-Java in our code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] String columns come back as `Utf8` even when cardinality is tiny — is that expected? [arrow-adbc]

Reply via email to