Jordan Samuels created ARROW-5974: ------------------------------------- Summary: read_csv returns truncated read for some valid gzip files Key: ARROW-5974 URL: https://issues.apache.org/jira/browse/ARROW-5974 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.14.0 Reporter: Jordan Samuels
If two gzipped files are concatenated together, the result is a valid gzip file. However, it appears that pyarrow.csv.read_csv will only read the portion related to the first file. If the repro script [here|https://gist.github.com/jordansamuels/d69f1c22c58418f5dfa0785b9ecd211e] is run, the output is: {{$ python repro.py}} {{pyarrow.csv only reads one row:}} {{ x}} {{0 1}} {{pandas reads two rows:}} {{ x}} {{0 1}} {{1 2}} {{pyarrow version: 0.14.0}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)