[ https://issues.apache.org/jira/browse/ARROW-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662218#comment-17662218 ]
Rok Mihevc commented on ARROW-5195: ----------------------------------- This issue has been migrated to [issue #21671|https://github.com/apache/arrow/issues/21671] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] read_csv ignores null_values on string types > ----------------------------------------------------- > > Key: ARROW-5195 > URL: https://issues.apache.org/jira/browse/ARROW-5195 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 0.13.0 > Environment: Python 3.6, PyArrow 0.13.0, AWS linux, debian-slim in > docker > Reporter: Scott Burns > Assignee: Antoine Pitrou > Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Let's write a simple CSV with NULL values in a string column: > {quote}with open('foo.csv', 'w') as fobj: > fobj.write('col1,col2\n1,value\n2,NULL') > table = csv.read_csv('foo.csv') > table.column('col2').null_count # => 0 > {quote} > > table.column('col2').null_count will be 0, I think it should be 1. Passing > in {{ConvertOptions(null_values=["NULL"])}} doesn't help. > > Note that {{pandas.read_csv}} parses these NULLs correctly so I have a > workaround available. > But I'd prefer to natively read CSV from pyarrow if possible :) -- This message was sent by Atlassian Jira (v8.20.10#820010)