Re: [I] String columns come back as `Utf8` even when cardinality is tiny — is that expected? [arrow-adbc]

via GitHub Mon, 27 Apr 2026 08:59:47 -0700


zeroshade commented on issue #4271:
URL: https://github.com/apache/arrow-adbc/issues/4271#issuecomment-4328480182


   @skalkin how many rows/columns were you dealing with in your tests? As far 
as I'm aware, neither Snowflake nor Databricks are using the Arrow compression. 
Also, both of them are already using Arrow for their transport in the JDBC 
drivers so the big difference between ADBC or JDBC in those cases is going to 
be solely the fact that ADBC avoids the transposition into rows that JDBC does. 
As a result, for many cases the exact performance benefits will depend on the 
number of rows (hundreds of thousands/millions) and number of columns and on 
what specifically you're doing with the data afterwards.
   
   For example, if you're taking the data and feeding it into a dataframe, 
writing it out to a parquet file, or building charts/visualizations (i.e. 
things that already convert to a columnar representation) you'll see more 
benefit than if you're just printing it out. 
   
   The query also matters when testing as a particularly expensive query may 
end up dwarfing the transport I/O. Can you share more information about what 
your experiments were testing? Specifically for Snowflake, we've tested using 
the default TPC-H sample dataset that it provides and saw that we start getting 
statistically significant performance benefits at around half a million rows. 
Again, mostly because they are already using Arrow for the transport in the 
ODBC/JDBC drivers so it's just the cost of the transpose/conversion. To be fair 
though, we've mostly tested against ODBC not JDBC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] String columns come back as `Utf8` even when cardinality is tiny — is that expected? [arrow-adbc]

Reply via email to