CurtHagenlocher commented on issue #3480: URL: https://github.com/apache/arrow-adbc/issues/3480#issuecomment-3390548348
Thanks @sfc-gh-pfus! This is very interesting. What the ADBC driver is actually doing is creating the stage with those options ``` CREATE OR REPLACE TEMPORARY STAGE ADBC$BIND FILE_FORMAT = (TYPE = PARQUET USE_LOGICAL_TYPE = TRUE BINARY_AS_TEXT = FALSE USE_VECTORIZED_SCANNER=TRUE REPLACE_INVALID_CHARACTERS = TRUE) ``` and then loading the data with ``` COPY INTO IDENTIFIER(?) FROM @ADBC$BIND MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE ``` so the options you're seeing on the `COPY INTO` are coming from the stage definition. When I removed the `FILE_FORMAT = (...)` from the stage definition, I had to add a `FILE_FORMAT = (TYPE = 'PARQUET')` to the `COPY INTO` for it to work. But once I did, it fixed the performance problem. So it seems possible that one of these options is what's responsible for the difference. I'd previously tried removing `REPLACE_INVALID_CHARACTERS` and found that it didn't have an effect. Should I infer that one of the other options for the `FILE_FORMAT` is what causes the slowdown? Incidentally, whatever tool is being used to sanitize the traces doesn't seem to realize that the `$BIND` is part of the stage name and this could in principle be a problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
