[ https://issues.apache.org/jira/browse/ARROW-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260878#comment-17260878 ]
Lance Dacey commented on ARROW-10523: ------------------------------------- I noticed that even explicitly using (unit="ns") would not work when using write_to_dataset() with the legacy dataset. I would print table.schema right before saving the dataset to Azure Blob (it would show "ns"), and when I read the dataset.schema afterwards the unit was the "us". In the end, I explicitly wrote the data using unit="us" and also added the coerce_timestamps="us" write option. > [Python] Pandas timestamps are inferred to have only microsecond precision > -------------------------------------------------------------------------- > > Key: ARROW-10523 > URL: https://issues.apache.org/jira/browse/ARROW-10523 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Affects Versions: 2.0.0 > Reporter: David Li > Priority: Minor > > {code:java} > import pyarrow as pa > import pandas as pd > arr = pa.array([pd.Timestamp(year=2020, month=1, day=1, nanosecond=999)]) > print(arr) > print(arr.type) {code} > This gives: > {noformat} > [ > 2020-01-01 00:00:00.000000 > ] > timestamp[us] > {noformat} > However, Pandas Timestamps have nanosecond precision, which would be nice to > preserve in inference. > The reason is that TypeInferrer [hardcodes > microseconds|https://github.com/apache/arrow/blob/apache-arrow-2.0.0/cpp/src/arrow/python/inference.cc#L466] > as it only knows about the standard library datetime, so I'm treating this > as a feature request and not quite a bug. Of course, this can be worked > around easily by specifying an explicit type. -- This message was sent by Atlassian Jira (v8.3.4#803005)