[PR] improve direct query perf [arrow-adbc]

via GitHub Fri, 26 Sep 2025 13:27:51 -0700


eric-wang-1990 opened a new pull request, #3489:
URL: https://github.com/apache/arrow-adbc/pull/3489


   The directResults field control how many rows/bytes can be returned in one 
arrow batch.
   Before this change, due to a bug for databricks it is calling base class 
SparkConnection, which has maxRows=1000, which is too small.
   ODBC can get all results in a single ExecuteStatement call while ADBC needs 
1 ExecuteStatement and multiple FetchResults, which cause ADBC to be slower in 
small queries.
   For ADBC:
   <img width="614" height="136" alt="image" 
src="https://github.com/user-attachments/assets/64faa63c-9bc6-4dd1-8d71-66af09e95df4";
 />
   For ODBC:
   <img width="611" height="27" alt="image" 
src="https://github.com/user-attachments/assets/52817f46-412a-41fc-9f0b-17d7ae02d91d";
 />
   This PR update the DefaultMaxBytes to 10MB, which is the same limit on 
Databricks backend for Arrow row set.
   MaxRows to be 500K, assuming a minimum 20 Bytes column size.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] improve direct query perf [arrow-adbc]

Reply via email to