Ultrabug created ARROW-3700:
-------------------------------

             Summary: read_csv behavior with blank lines differs between CSV 
deliimters
                 Key: ARROW-3700
                 URL: https://issues.apache.org/jira/browse/ARROW-3700
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Ultrabug
         Attachments: csv_parse_error.zip

This is a copy/paste of the github issue: 
https://github.com/apache/arrow/issues/2883

 

Hi,

I was playing with {{pyarrow.csv}} {{read_csv}} and found a rather strange 
behavior that I'm not sure is normal.

Parsing will fail if the delimiter of the CSV file is a comma and there's a 
blank line after the header (see {{basic_with_blank.csv}} example)

Example output:

{{{{Traceback (most recent call last): File "sorrow.py", line 14, in <module> 
table = pa_csv.read_csv(csv) File "pyarrow/_csv.pyx", line 198, in 
pyarrow._csv.read_csv File "pyarrow/error.pxi", line 81, in 
pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: CSV parse error: Expected 2 
columns, got 1 }}}}

If I change the CSV delimiter to semicolon, the error disappears and everything 
is fine!

I'm providing python code and CSV samples which compares with pandas (which 
does not suffer from this).

Hope this helps, thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to