paleolimbot commented on PR #870:
URL: https://github.com/apache/arrow-adbc/pull/870#issuecomment-1623802514

   I added an option here although I don't know if it's the best name 
(`"adbc.postgres.batch_size_hint_bytes"`) or defined in the best place 
(statement.h). The a value as low as 1 MB seems to cover conversion overhead 
(see details); however, some environments may have performance issues with tiny 
batches (Arrow/R did for several releases before we spotted the issue).
   
   <details>
   
   ``` r
   library(adbcdrivermanager)
   
   uri <- Sys.getenv("ADBC_POSTGRESQL_TEST_URI")
   db <- adbc_database_init(adbcpostgresql::adbcpostgresql(), uri = uri)
   con <- adbc_connection_init(db)
   
   read_with_batch_size <- function(sz_bytes) {
     stmt <- local_adbc(
       adbc_statement_init(con, "adbc.postgres.batch_size_hint_bytes" = 
sz_bytes)
     )
     stream <- local_adbc( nanoarrow::nanoarrow_allocate_array_stream())
     
     stmt |> 
       adbc_statement_set_sql_query("SELECT * from flights") |> 
       adbc_statement_execute_query(stream)
     
     # Minimize R conversion overhead for benchmark
     reader <- arrow::as_record_batch_reader(stream)
     on.exit(reader$Close())
     arrow::as_arrow_table(reader)
   }
   
   
   results <- bench::mark(
     read_with_batch_size(2^10),
     read_with_batch_size(2^15),
     read_with_batch_size(2^20),
     read_with_batch_size(2^22),
     read_with_batch_size(2^24),
     read_with_batch_size(2^26),
     read_with_batch_size(2^28)
   )
   #> Warning: Some expressions had a GC in every iteration; so filtering is
   #> disabled.
   
   results
   #> # A tibble: 7 × 6
   #>   expression                      min   median `itr/sec` mem_alloc 
`gc/sec`
   #>   <bch:expr>                 <bch:tm> <bch:tm>     <dbl> <bch:byt>    
<dbl>
   #> 1 read_with_batch_size(2^10)    1.37s    1.37s     0.730   12.11MB    
0.730
   #> 2 read_with_batch_size(2^15) 338.87ms 353.24ms     2.83    59.77KB    0   
 
   #> 3 read_with_batch_size(2^20)  331.4ms  349.6ms     2.86     5.25KB    0   
 
   #> 4 read_with_batch_size(2^22) 335.74ms  337.7ms     2.96     5.25KB    0   
 
   #> 5 read_with_batch_size(2^24) 339.18ms 345.29ms     2.90     5.25KB    0   
 
   #> 6 read_with_batch_size(2^26) 333.02ms 339.22ms     2.95     5.25KB    0   
 
   #> 7 read_with_batch_size(2^28) 336.97ms 338.05ms     2.96     5.25KB    0
   ```
   
   <sup>Created on 2023-07-06 with [reprex 
v2.0.2](https://reprex.tidyverse.org)</sup>
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to