zeroshade commented on issue #4271: URL: https://github.com/apache/arrow-adbc/issues/4271#issuecomment-4328480182
@skalkin how many rows/columns were you dealing with in your tests? As far as I'm aware, neither Snowflake nor Databricks are using the Arrow compression. Also, both of them are already using Arrow for their transport in the JDBC drivers so the big difference between ADBC or JDBC in those cases is going to be solely the fact that ADBC avoids the transposition into rows that JDBC does. As a result, for many cases the exact performance benefits will depend on the number of rows (hundreds of thousands/millions) and number of columns and on what specifically you're doing with the data afterwards. For example, if you're taking the data and feeding it into a dataframe, writing it out to a parquet file, or building charts/visualizations (i.e. things that already convert to a columnar representation) you'll see more benefit than if you're just printing it out. The query also matters when testing as a particularly expensive query may end up dwarfing the transport I/O. Can you share more information about what your experiments were testing? Specifically for Snowflake, we've tested using the default TPC-H sample dataset that it provides and saw that we start getting statistically significant performance benefits at around half a million rows. Again, mostly because they are already using Arrow for the transport in the ODBC/JDBC drivers so it's just the cost of the transpose/conversion. To be fair though, we've mostly tested against ODBC not JDBC. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
