[
https://issues.apache.org/jira/browse/ARROW-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932190#comment-16932190
]
Antoine Pitrou commented on ARROW-6003:
---------------------------------------
Here is an example in Python:
{code:python}
>>> s = b"""a,b,c\n1,2,3\n"""
>>>
>>>
>>> csv.read_csv(io.BytesIO(s))
>>>
>>>
pyarrow.Table
a: int64
b: int64
c: int64
>>> options = csv.ReadOptions(column_names=['a', 'b'])
>>>
>>>
>>> csv.read_csv(io.BytesIO(s), read_options=options)
>>>
>>>
Traceback (most recent call last):
File "<ipython-input-9-d1311f31bd36>", line 1, in <module>
csv.read_csv(io.BytesIO(s), read_options=options)
File "pyarrow/_csv.pyx", line 541, in pyarrow._csv.read_csv
check_status(reader.get().Read(&table))
File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
raise ArrowInvalid(message)
ArrowInvalid: CSV parse error: Expected 2 columns, got 3
{code}
> [C++] Better input validation and error messaging in CSV reader
> ---------------------------------------------------------------
>
> Key: ARROW-6003
> URL: https://issues.apache.org/jira/browse/ARROW-6003
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Neal Richardson
> Assignee: Neal Richardson
> Priority: Major
> Labels: csv
>
> Followup to https://issues.apache.org/jira/browse/ARROW-5747. The error
> message(s) are not great when you give bad input. For example, if I give too
> many or too few {{column_names}}, the error I get is {{Invalid: Empty CSV
> file}}. In fact, that's about the only error message I've seen from the CSV
> reader, no matter what I've thrown at it.
> It would be better if error messages were more specific so that I as a user
> might know how to fix my bad input.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)