[ 
https://issues.apache.org/jira/browse/ARROW-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260878#comment-17260878
 ] 

Lance Dacey commented on ARROW-10523:
-------------------------------------

I noticed that even explicitly using (unit="ns") would not work when using 
write_to_dataset() with the legacy dataset.

I would print table.schema right before saving the dataset to Azure Blob (it 
would show "ns"), and when I read the dataset.schema afterwards the unit was 
the "us". In the end, I explicitly wrote the data using unit="us" and also 
added the coerce_timestamps="us" write option.

> [Python] Pandas timestamps are inferred to have only microsecond precision
> --------------------------------------------------------------------------
>
>                 Key: ARROW-10523
>                 URL: https://issues.apache.org/jira/browse/ARROW-10523
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 2.0.0
>            Reporter: David Li
>            Priority: Minor
>
> {code:java}
> import pyarrow as pa
> import pandas as pd
> arr = pa.array([pd.Timestamp(year=2020, month=1, day=1, nanosecond=999)])
> print(arr)
> print(arr.type) {code}
> This gives:
> {noformat}
> [
>   2020-01-01 00:00:00.000000
> ]
> timestamp[us]
> {noformat}
> However, Pandas Timestamps have nanosecond precision, which would be nice to 
> preserve in inference.
> The reason is that TypeInferrer [hardcodes 
> microseconds|https://github.com/apache/arrow/blob/apache-arrow-2.0.0/cpp/src/arrow/python/inference.cc#L466]
>  as it only knows about the standard library datetime, so I'm treating this 
> as a feature request and not quite a bug. Of course, this can be worked 
> around easily by specifying an explicit type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to