joellubi commented on issue #1327:
URL: https://github.com/apache/arrow-adbc/issues/1327#issuecomment-1840602154

   Following up on #1322.
   
   The Snowflake Connector that our ADBC driver uses [claims to make 
optimizations](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Batch_Inserts_and_Binding_Parameters)
 when many values are bound to an `INSERT` statement. There are some 
limitations to when this optimization can be made, but it does appear that in 
this case the code is already going through the connector's optimized path. 
Given this still doesn't offer the throughput we would expect, it seems 
reasonable to handle this on the ADBC side while addressing some of the 
connector's existing limitations.
   
   The primary limitations we'd want our solution to overcome:
   1. Currently each batch becomes its own temp stage. We would want to upload 
multiple (or all) batches to a single stage and load from there.
   2. The connector relies on conversion to golang types which must then be 
loaded into a CSV for the stage. We could likely do a lot better with arrow 
type mapping by using parquet directly from arrow as the stage format.
   
   Open question: Does adbc_ingest need to optimize ingestion of small tables 
as well? Currently the connector uses a single `INSERT` query without staging 
any files for very small tables. Using COPY in all cases _might_ not perform 
well in these scenarios. Perhaps we can start with COPY in all cases and add 
better handling for small tables in the future if there are actually issues in 
these cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to