lidavidm commented on pull request #9685: URL: https://github.com/apache/arrow/pull/9685#issuecomment-799453681
> > It does not autodetect the type of compression (but perhaps this could be added, by inspecting FileSource). > > Small note here: the python API for reading plain CSV files (using `pyarrow.csv`) automatically detects compressed files and doesn't have an explicit option for that. So _ideally_, the dataset CSV reading would work similarly, I think. > But AFAIK, the decompressing for `pyarrow.csv` currently happens on the python side (and not in C++)? (i.e. `get_input_stream` in the cython code detects compression) Yeah, we'd have to implement that on the C++ side as well. It could be tackled in ARROW-8981 as part of the refactoring that Ben suggested above for that issue, too. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org