PavloPolovyi commented on issue #4271: URL: https://github.com/apache/arrow-adbc/issues/4271#issuecomment-4333430128
> The Databricks Go driver is too early in development. I think I saw it wasn't even enabling parallel downloads in CloudFetch. Also if Arrow-Java + JNI bindings are being used, I think Arrow-Java also introduces some issues around strings in particular...data gets copied into a separate buffer (Text type) before it then gets re-encoded as a String so you're paying extra copies Thanks for the comment. On our end, the CloudFetch parallel downloads do seem to kick in. Tuning databricks.cloudfetch.max_chunks_in_memory=16 and databricks.cloudfetch.link_prefetch_window=128 gave us measurable improvement, so the parallelism path appears to be active in the version we tested (the Rust databricks-adbc crate at the current main revision). The gap to JDBC is still real on our end (~2.5× slower on 1M-row mixed-type queries on Databricks). On Arrow-Java + JNI, we're not on that path directly. Our Java service reads the JDBC ResultSet via the standard getObject() API and builds its own internal columnar format from there, with no Arrow-Java in our code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
