[ https://issues.apache.org/jira/browse/ARROW-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yibo Cai updated ARROW-16872: ----------------------------- Fix Version/s: 9.0.0 > [Python] open_csv throws ArrowInvalid if csv does not end with a new line and > is above 16384 lines > -------------------------------------------------------------------------------------------------- > > Key: ARROW-16872 > URL: https://issues.apache.org/jira/browse/ARROW-16872 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 7.0.0, 8.0.0 > Reporter: Frederik Fabritius > Assignee: Yibo Cai > Priority: Major > Labels: csvparser, open_csv > Fix For: 9.0.0 > > > `pyarrow.csv.open_csv` throws ArrowInvalid if csv does not end with a new > line and is above 16384 lines. Tested with both pyarrow 7.0.0 and 8.0.0. > Error seen both in production app and on developer laptop. > > Here's a minimal case for reproducing the issue: > ```python > import pyarrow as pa > import pyarrow.csv > from io import BytesIO > for _ in pa.csv.open_csv(BytesIO('\n'.join(['review_id,filter_outcome'] + > ['62593aaec7628b203bad4c6e,fabrication']*16385).encode())): pass > ``` > > Error is thrown: > ArrowInvalid: CSV parse error: Expected 2 columns, got 1: -- This message was sent by Atlassian Jira (v8.20.7#820007)