zeroshade commented on issue #2508:
URL: https://github.com/apache/arrow-adbc/issues/2508#issuecomment-2654922554
Okay, so I'm not quite sure what the difference here is since I'm not an R
developer at all but there's a significant performance difference between this
(essentially what you are doing in your benchmark):
```R
df <- conn |> read_adbc('SELECT * FROM "my_table"') |> tibble::as_tibble()
```
and the following:
```R
df <- conn |> read_adbc('SELECT * FROM "my_table"') |>
arrow::as_arrow_table() |> tibble::as_tibble()
```
I did a little bit of testing: The first scenario (i.e. what you're doing)
takes between 28 seconds and over a minute while the second version (taking a
trip through arrow::as_arrow_table before going to tibble) takes just under 7
seconds.
@JavOrraca can you try adding the `|> arrow::as_arrow_table()` before `|>
tibble::as_tibble()` and see what that does for your performance?
@paleolimbot I have no idea what would cause this difference but it's
completely reproducible. As far as I can tell, the driver and Golang side is
producing all the batches super quickly regardless and it's only whatever work
is happening between nanoarrow and tibble that's causing the slowdown and
something in the Arrow R package's implementation makes the conversion to
tibble much faster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]