[ https://issues.apache.org/jira/browse/ARROW-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439892#comment-16439892 ]
Joshua Storck commented on ARROW-2429: -------------------------------------- If you invoke the write_table function as follows, the type will not change: {code:python} pq.write_table(table, 'foo.parquet', use_deprecated_int96_timestamps=True) {code} > [Python] Timestamp unit in schema changes when writing to Parquet file then > reading back > ---------------------------------------------------------------------------------------- > > Key: ARROW-2429 > URL: https://issues.apache.org/jira/browse/ARROW-2429 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.9.0 > Environment: Mac OS High Sierra > PyArrow 0.9.0 (py36_1) > Python > Reporter: Dave Challis > Priority: Minor > > When creating an Arrow table from a Pandas DataFrame, the table schema > contains a field of type `timestamp[ns]`. > When serialising that table to a parquet file and then immediately reading it > back, the schema of the table read instead contains a field with type > `timestamp[us]`. > Minimal example: > > {code:python} > #!/usr/bin/env python > import pyarrow as pa > import pyarrow.parquet as pq > import pandas as pd > # create DataFrame with a datetime column > df = pd.DataFrame({'created': ['2018-04-04T10:14:14Z']}) > df['created'] = pd.to_datetime(df['created']) > # create Arrow table from DataFrame > table = pa.Table.from_pandas(df, preserve_index=False) > # write the table as a parquet file, then read it back again > pq.write_table(table, 'foo.parquet') > table2 = pq.read_table('foo.parquet') > print(table.schema[0]) # pyarrow.Field<created: timestamp[ns]> (nanosecond > units) > print(table2.schema[0]) # pyarrow.Field<created: timestamp[us]> (microsecond > units) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)