paleolimbot commented on PR #870:
URL: https://github.com/apache/arrow-adbc/pull/870#issuecomment-1623802514
I added an option here although I don't know if it's the best name
(`"adbc.postgres.batch_size_hint_bytes"`) or defined in the best place
(statement.h). The a value as low as 1 MB seems to cover conversion overhead
(see details); however, some environments may have performance issues with tiny
batches (Arrow/R did for several releases before we spotted the issue).
<details>
``` r
library(adbcdrivermanager)
uri <- Sys.getenv("ADBC_POSTGRESQL_TEST_URI")
db <- adbc_database_init(adbcpostgresql::adbcpostgresql(), uri = uri)
con <- adbc_connection_init(db)
read_with_batch_size <- function(sz_bytes) {
stmt <- local_adbc(
adbc_statement_init(con, "adbc.postgres.batch_size_hint_bytes" =
sz_bytes)
)
stream <- local_adbc( nanoarrow::nanoarrow_allocate_array_stream())
stmt |>
adbc_statement_set_sql_query("SELECT * from flights") |>
adbc_statement_execute_query(stream)
# Minimize R conversion overhead for benchmark
reader <- arrow::as_record_batch_reader(stream)
on.exit(reader$Close())
arrow::as_arrow_table(reader)
}
results <- bench::mark(
read_with_batch_size(2^10),
read_with_batch_size(2^15),
read_with_batch_size(2^20),
read_with_batch_size(2^22),
read_with_batch_size(2^24),
read_with_batch_size(2^26),
read_with_batch_size(2^28)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
results
#> # A tibble: 7 × 6
#> expression min median `itr/sec` mem_alloc
`gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
<dbl>
#> 1 read_with_batch_size(2^10) 1.37s 1.37s 0.730 12.11MB
0.730
#> 2 read_with_batch_size(2^15) 338.87ms 353.24ms 2.83 59.77KB 0
#> 3 read_with_batch_size(2^20) 331.4ms 349.6ms 2.86 5.25KB 0
#> 4 read_with_batch_size(2^22) 335.74ms 337.7ms 2.96 5.25KB 0
#> 5 read_with_batch_size(2^24) 339.18ms 345.29ms 2.90 5.25KB 0
#> 6 read_with_batch_size(2^26) 333.02ms 339.22ms 2.95 5.25KB 0
#> 7 read_with_batch_size(2^28) 336.97ms 338.05ms 2.96 5.25KB 0
```
<sup>Created on 2023-07-06 with [reprex
v2.0.2](https://reprex.tidyverse.org)</sup>
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]