[ 
https://issues.apache.org/jira/browse/ARROW-12539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Bias updated ARROW-12539:
---------------------------------
    Description: 
when importing csv data with dates in the format "%d-%b-%y" or "%d-%b-%Y" an 
error is given in conversion:

{{pyarrow.lib.ArrowInvalid: In CSV column #1: CSV conversion error to 
date64[ms]: invalid value '15-JAN-16'}}

 

example:


{{ import pyarrow as pa}}
{{ from pyarrow import csv}}

{{data = b"a,b\n1,15-OCT-15\n2,18-JUN-90\n"}}
{{ tp = ["%d-%b-%y"]}}

{{try:}}
{{     schema_d64 = pa.schema([pa.field("a", pa.int64()), pa.field("b", 
pa.date64())])}}
{{     co_d64 = csv.ConvertOptions(timestamp_parsers=tp, 
column_types=schema_d64)}}
{{     a_d64 = csv.read_csv(pa.py_buffer(data), convert_options=co_d64)}}
{{ except Exception as e:}}
{{     print(e)}}

{{try:}}
{{     schema_d32 = pa.schema([pa.field("a", pa.int64()), pa.field("b", 
pa.date32())])}}
{{     co_d32 = csv.ConvertOptions(timestamp_parsers=tp, 
column_types=schema_d32)}}
{{     a_d32 = csv.read_csv(pa.py_buffer(data), convert_options=co_d32)}}
{{ except Exception as e:}}
{{     print(e)}}

  was:
when importing csv data with dates in the format `%d-%b-%y` or `%d-%b-%Y` an 
error is given in conversion:

`pyarrow.lib.ArrowInvalid: In CSV column #1: CSV conversion error to 
date64[ms]: invalid value '15-JAN-16'`

 

example:

```
import pyarrow as pa
from pyarrow import csv

data = b"a,b\n1,15-OCT-15\n2,18-JUN-90\n"
tp = ["%d-%b-%y"]

try:
    schema_d64 = pa.schema([pa.field("a", pa.int64()), pa.field("b", 
pa.date64())])
    co_d64 = csv.ConvertOptions(timestamp_parsers=tp, column_types=schema_d64)
    a_d64 = csv.read_csv(pa.py_buffer(data), convert_options=co_d64)
except Exception as e:
    print(e)

try:
    schema_d32 = pa.schema([pa.field("a", pa.int64()), pa.field("b", 
pa.date32())])
    co_d32 = csv.ConvertOptions(timestamp_parsers=tp, column_types=schema_d32)
    a_d32 = csv.read_csv(pa.py_buffer(data), convert_options=co_d32)
except Exception as e:
    print(e)
```


> Unable to read date64 or date32 in specific format
> --------------------------------------------------
>
>                 Key: ARROW-12539
>                 URL: https://issues.apache.org/jira/browse/ARROW-12539
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 3.0.0
>            Reporter: Stephen Bias
>            Priority: Major
>
> when importing csv data with dates in the format "%d-%b-%y" or "%d-%b-%Y" an 
> error is given in conversion:
> {{pyarrow.lib.ArrowInvalid: In CSV column #1: CSV conversion error to 
> date64[ms]: invalid value '15-JAN-16'}}
>  
> example:
> {{ import pyarrow as pa}}
> {{ from pyarrow import csv}}
> {{data = b"a,b\n1,15-OCT-15\n2,18-JUN-90\n"}}
> {{ tp = ["%d-%b-%y"]}}
> {{try:}}
> {{     schema_d64 = pa.schema([pa.field("a", pa.int64()), pa.field("b", 
> pa.date64())])}}
> {{     co_d64 = csv.ConvertOptions(timestamp_parsers=tp, 
> column_types=schema_d64)}}
> {{     a_d64 = csv.read_csv(pa.py_buffer(data), convert_options=co_d64)}}
> {{ except Exception as e:}}
> {{     print(e)}}
> {{try:}}
> {{     schema_d32 = pa.schema([pa.field("a", pa.int64()), pa.field("b", 
> pa.date32())])}}
> {{     co_d32 = csv.ConvertOptions(timestamp_parsers=tp, 
> column_types=schema_d32)}}
> {{     a_d32 = csv.read_csv(pa.py_buffer(data), convert_options=co_d32)}}
> {{ except Exception as e:}}
> {{     print(e)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to