[ https://issues.apache.org/jira/browse/ARROW-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney reassigned ARROW-5878: ----------------------------------- Assignee: Benjamin Kietzman > [Python][C++] Parquet reader not forward compatible for timestamps without > timezone > ----------------------------------------------------------------------------------- > > Key: ARROW-5878 > URL: https://issues.apache.org/jira/browse/ARROW-5878 > Project: Apache Arrow > Issue Type: Bug > Affects Versions: 0.14.0 > Reporter: Florian Jetter > Assignee: Benjamin Kietzman > Priority: Major > Labels: pull-request-available > Fix For: 1.0.0, 0.14.1 > > Attachments: timezones_pyarrow_14.paquet > > Time Spent: 1h > Remaining Estimate: 0h > > Timestamps without timezone which are written by pyarrow 0.14.0 cannot be > read anymore as timestamps by earlier versions. The timestamp is read as an > integer when reading in with pyarrow 0.13.0 > Looking at the parquet schemas, it seems that the logical type cannot be > understood by the older versions, see below. > h4. File generation with pyarrow 0.14.0 > {code:java} > import datetime > import pyarrow.parquet as pq > import pandas as pd > df = pd.DataFrame( > { > "datetime64": pd.Series(["2018-01-01"], dtype="datetime64[ns]"), > "datetime64_ts": pd.Series( > [pd.Timestamp(datetime.datetime(2018, 1, 1), tz="Europe/Berlin")], > dtype="datetime64[ns]", > ), > } > ) > pq.write_table(pa.Table.from_pandas(df), "timezones_pyarrow_14.paquet") > {code} > h4. Reading with pyarrow 0.13.0 > {code:java} > In [1]: import pyarrow.parquet as pq > In [2]: import pyarrow as pa > In [3]: with open("timezones_pyarrow_14.paquet", "rb") as fd: > ...: table = pq.read_pandas(fd) > ...: > In [4]: table.to_pandas() > Out[4]: > datetime64 datetime64_ts > 0 1514764800000000 2018-01-01 00:00:00+01:00 > In [5]: table.to_pandas().dtypes > Out[5]: > datetime64 int64 > datetime64_ts datetime64[ns, Europe/Berlin] > dtype: object > {code} > h3. Parquet schema as seen by pyarrow versions: > pyarrow 0.13.0 parquet schema > {code:java} > datetime64: INT64 > datetime64_ts: INT64 TIMESTAMP_MICROS > {code} > pyarrow 0.14.0 parquet schema > {code:java} > datetime64: INT64 Timestamp(isAdjustedToUTC=false, timeUnit=microseconds) > datetime64_ts: INT64 Timestamp(isAdjustedToUTC=true, timeUnit=microseconds) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)