Re: [I] Troubleshooting bulk insert performance of Snowflake connector [arrow-adbc]

via GitHub Sat, 11 Oct 2025 11:52:07 -0700


CurtHagenlocher commented on issue #3480:
URL: https://github.com/apache/arrow-adbc/issues/3480#issuecomment-3390548348


   Thanks @sfc-gh-pfus! This is very interesting. What the ADBC driver is 
actually doing is creating the stage with those options
   ```
   CREATE OR REPLACE TEMPORARY STAGE ADBC$BIND FILE_FORMAT = (TYPE = PARQUET 
USE_LOGICAL_TYPE = TRUE BINARY_AS_TEXT = FALSE USE_VECTORIZED_SCANNER=TRUE 
REPLACE_INVALID_CHARACTERS = TRUE)
   ```
   and then loading the data with
   ```
   COPY INTO IDENTIFIER(?) FROM @ADBC$BIND MATCH_BY_COLUMN_NAME = 
CASE_INSENSITIVE
   ```
   so the options you're seeing on the `COPY INTO` are coming from the stage 
definition.
   
   When I removed the `FILE_FORMAT = (...)` from the stage definition, I had to 
add a `FILE_FORMAT = (TYPE = 'PARQUET')` to the `COPY INTO` for it to work. But 
once I did, it fixed the performance problem. So it seems possible that one of 
these options is what's responsible for the difference. I'd previously tried 
removing `REPLACE_INVALID_CHARACTERS` and found that it didn't have an effect. 
Should I infer that one of the other options for the `FILE_FORMAT` is what 
causes the slowdown?
   
   
   Incidentally, whatever tool is being used to sanitize the traces doesn't 
seem to realize that the `$BIND` is part of the stage name and this could in 
principle be a problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Troubleshooting bulk insert performance of Snowflake connector [arrow-adbc]

Reply via email to