Re: [I] R: Benchmarking ADBC (Snowflake) vs ODBC shows issue with larger datasets [arrow-adbc]

via GitHub Wed, 12 Feb 2025 13:52:17 -0800


zeroshade commented on issue #2508:
URL: https://github.com/apache/arrow-adbc/issues/2508#issuecomment-2654922554


   Okay, so I'm not quite sure what the difference here is since I'm not an R 
developer at all but there's a significant performance difference between this 
(essentially what you are doing in your benchmark):
   
   ```R
   df <- conn |> read_adbc('SELECT * FROM "my_table"') |> tibble::as_tibble()
   ```
   and the following:
   
   ```R
   df <- conn |> read_adbc('SELECT * FROM "my_table"') |> 
arrow::as_arrow_table() |> tibble::as_tibble()
   ```
   
   I did a little bit of testing: The first scenario (i.e. what you're doing) 
takes between 28 seconds and over a minute while the second version (taking a 
trip through arrow::as_arrow_table before going to tibble) takes just under 7 
seconds.
   
   @JavOrraca can you try adding the `|> arrow::as_arrow_table()` before `|> 
tibble::as_tibble()` and see what that does for your performance?
   
   @paleolimbot I have no idea what would cause this difference but it's 
completely reproducible. As far as I can tell, the driver and Golang side is 
producing all the batches super quickly regardless and it's only whatever work 
is happening between nanoarrow and tibble that's causing the slowdown and 
something in the Arrow R package's implementation makes the conversion to 
tibble much faster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] R: Benchmarking ADBC (Snowflake) vs ODBC shows issue with larger datasets [arrow-adbc]

Reply via email to