Jordan Samuels created ARROW-5974:
-------------------------------------

             Summary: read_csv returns truncated read for some valid gzip files
                 Key: ARROW-5974
                 URL: https://issues.apache.org/jira/browse/ARROW-5974
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.14.0
            Reporter: Jordan Samuels


If two gzipped files are concatenated together, the result is a valid gzip 
file.  However, it appears that pyarrow.csv.read_csv will only read the portion 
related to the first file.

If the repro script 
[here|https://gist.github.com/jordansamuels/d69f1c22c58418f5dfa0785b9ecd211e] 
is run, the output is:

{{$ python repro.py}}
{{pyarrow.csv only reads one row:}}
{{ x}}
{{0 1}}
{{pandas reads two rows:}}
{{ x}}
{{0 1}}
{{1 2}}
{{pyarrow version: 0.14.0}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to