[GitHub] [arrow] westonpace commented on issue #33790: [Python] Support for reading .csv files from a zip archive

via GitHub Fri, 20 Jan 2023 14:53:04 -0800


westonpace commented on issue #33790:
URL: https://github.com/apache/arrow/issues/33790#issuecomment-1399044202


   Outside of datasets this is normally achieved by opening a compressed input 
stream and using the CSV stream reader.  If the path ends in `.gz` or `.bz2` I 
think we also guess that it is compressed and do this for you.
   
   Within datasets there are a few un/under documented features which may help. 
 There is a similar "extension guessing" mechanism: 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/file_base.cc#L93
  So if your files end in `gz` or `gzip` it should automatically be picked up.
   
   There is also `stream_transform_func` as part of the dataset-csv options 
which takes an arbitrary callable that transforms the stream before you start 
reading it.  In theory this could maybe be used to provide support for zipped 
files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #33790: [Python] Support for reading .csv files from a zip archive

Reply via email to