westonpace commented on issue #33790: URL: https://github.com/apache/arrow/issues/33790#issuecomment-1399044202
Outside of datasets this is normally achieved by opening a compressed input stream and using the CSV stream reader. If the path ends in `.gz` or `.bz2` I think we also guess that it is compressed and do this for you. Within datasets there are a few un/under documented features which may help. There is a similar "extension guessing" mechanism: https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/file_base.cc#L93 So if your files end in `gz` or `gzip` it should automatically be picked up. There is also `stream_transform_func` as part of the dataset-csv options which takes an arbitrary callable that transforms the stream before you start reading it. In theory this could maybe be used to provide support for zipped files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org