Re: [I] R: Benchmarking ADBC (Snowflake) vs ODBC shows issue with larger datasets [arrow-adbc]

via GitHub Wed, 12 Feb 2025 12:12:49 -0800


zeroshade commented on issue #2508:
URL: https://github.com/apache/arrow-adbc/issues/2508#issuecomment-2654744391


   I created a table in Snowflake consisting of 75 columns and 1M rows, where 
25 columns were `NUMBER(38,0)`, 25 were `FLOAT` and 25 were `VARCHAR`. So that 
it should be guaranteed to go through the integerToDecimal128 function (I 
checked the debugger to verify).
   
   With a pure go mainprog, I was able to download the entire million rows in 
under 5 to 7 seconds (roughly 7x faster than ODBC in your original screenshot 
for the 1M rows).
   
   Just to confirm things for myself, I also tested using python (which will go 
through a C stream of data just like R would) and rather than timing just 
streaming the results, I also included creating a pyarrow table from the 
streamed records (i.e. materializing the entire result set in memory at once 
rather than just grabbing one batch at a time), the corresponding code looks 
like:
   
   ```python
   import adbc_driver_snowflake.dbapi
   
   with adbc_driver_snowflake.dbapi.connect("<snowflake URI>") as conn, 
conn.cursor() as cur:
       cur.execute('SELECT * FROM "my_table"')
       tbl = cur.fetch_arrow_table()
       print(tbl.num_rows)
   ```
   
   And even with the added cost of materializing the entire result set, it 
still only takes around 20s for the python to run. So whatever is causing it to 
take so long seems to be specific to R, and not on the Go side. @paleolimbot 
any ideas? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] R: Benchmarking ADBC (Snowflake) vs ODBC shows issue with larger datasets [arrow-adbc]

Reply via email to