[
https://issues.apache.org/jira/browse/ARROW-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185784#comment-17185784
]
Frank Smith commented on ARROW-9561:
------------------------------------
Oh, nice, that's great news. I see this functionality was added to the C++
source code, but it doesn't seem to be available in the python wheel I'm using.
{code:java}
// code placeholder
(venv) frank@frank-devel:~/arrow/data$ python3
Python 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> from pyarrow import csv
>>> s = b"""t\n2018-11-13T17:11:10.777000"""
>>> convert_options = csv.ConvertOptions(column_types={'t': pa.timestamp('us')})
>>> table = csv.read_csv(pa.py_buffer(s), convert_options=convert_options)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/_csv.pyx", line 714, in pyarrow._csv.read_csv
File "pyarrow/error.pxi", line 122, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: In CSV column #0: CSV conversion error to
timestamp[us]: invalid value '2018-11-13T17:11:10.777000'
>>>
{code}
These are the wheels I'm using:
{code:java}
(venv) frank@frank-devel:~/arrow/data$ pip3 install pyarrow
Collecting pyarrow
Using cached
https://files.pythonhosted.org/packages/3a/9b/887d1d03d3d43706dee3a71cdad9f9bbb8fe74fc93d8db5d663f5bf34e48/pyarrow-1.0.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting numpy>=1.14 (from pyarrow)
Using cached
https://files.pythonhosted.org/packages/22/e7/4b2bdddb99f5f631d8c1de259897c2b7d65dcfcc1e0a6fd17a7f62923500/numpy-1.19.1-cp36-cp36m-manylinux1_x86_64.whl
Installing collected packages: numpy, pyarrow
Successfully installed numpy-1.19.1 pyarrow-1.0.1
{code}
Also, these are the arrow libraries included in the wheel. I see the .100
suffix in the names... does that mean that the wheel is using the (older) 1.0.0
versions of the libs?
{code:java}
(venv) frank@frank-devel:~/arrow/data$ find venv/ -name '*.so.*'
venv/lib/python3.6/site-packages/pyarrow/libplasma.so.100
venv/lib/python3.6/site-packages/pyarrow/libarrow_flight.so.100
venv/lib/python3.6/site-packages/pyarrow/libarrow_python.so.100
venv/lib/python3.6/site-packages/pyarrow/libarrow_python_flight.so.100
venv/lib/python3.6/site-packages/pyarrow/libarrow_boost_regex.so.1.73.0
venv/lib/python3.6/site-packages/pyarrow/libarrow.so.100
venv/lib/python3.6/site-packages/pyarrow/libparquet.so.100
venv/lib/python3.6/site-packages/pyarrow/libarrow_dataset.so.100
{code}
> [C++] CSV parse fractional seconds in timestamps
> ------------------------------------------------
>
> Key: ARROW-9561
> URL: https://issues.apache.org/jira/browse/ARROW-9561
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Reporter: Frank Smith
> Priority: Minor
> Fix For: 2.0.0
>
>
> It would be great to be able to parse fractional seconds from timestamps in
> CSV files, e.g. 2017-06-26 16:58:20.651901
> strptime does not have a format specifier for fractional seconds, and the
> built-in ISO8601 parser does not parse fractional seconds.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)