[ 
https://issues.apache.org/jira/browse/ARROW-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5878:
--------------------------------
    Fix Version/s: 0.14.1

> [Python][C++] Parquet reader not forward compatible for timestamps without 
> timezone
> -----------------------------------------------------------------------------------
>
>                 Key: ARROW-5878
>                 URL: https://issues.apache.org/jira/browse/ARROW-5878
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Florian Jetter
>            Priority: Major
>             Fix For: 1.0.0, 0.14.1
>
>         Attachments: timezones_pyarrow_14.paquet
>
>
> Timestamps without timezone which are written by pyarrow 0.14.0 cannot be 
> read anymore as timestamps by earlier versions. The timestamp is read as an 
> integer when reading in with pyarrow 0.13.0
> Looking at the parquet schemas, it seems that the logical type cannot be 
> understood by the older versions, see below.
> h4. File generation with pyarrow 0.14.0
> {code:java}
> import datetime
> import pyarrow.parquet as pq
> import pandas as pd
> df = pd.DataFrame(
>     {
>         "datetime64": pd.Series(["2018-01-01"], dtype="datetime64[ns]"),
>         "datetime64_ts": pd.Series(
>             [pd.Timestamp(datetime.datetime(2018, 1, 1), tz="Europe/Berlin")],
>             dtype="datetime64[ns]",
>         ),
>     }
> )
> pq.write_table(pa.Table.from_pandas(df), "timezones_pyarrow_14.paquet")
> {code}
> h4. Reading with pyarrow 0.13.0
> {code:java}
> In [1]: import pyarrow.parquet as pq
> In [2]: import pyarrow as pa
> In [3]: with open("timezones_pyarrow_14.paquet", "rb") as fd:
>    ...:     table = pq.read_pandas(fd)
>    ...:
> In [4]: table.to_pandas()
> Out[4]:
>          datetime64             datetime64_ts
> 0  1514764800000000 2018-01-01 00:00:00+01:00
> In [5]: table.to_pandas().dtypes
> Out[5]:
> datetime64                               int64
> datetime64_ts    datetime64[ns, Europe/Berlin]
> dtype: object
> {code}
> h3. Parquet schema as seen by pyarrow versions:
> pyarrow 0.13.0 parquet schema
> {code:java}
> datetime64: INT64
> datetime64_ts: INT64 TIMESTAMP_MICROS
> {code}
> pyarrow 0.14.0 parquet schema
> {code:java}
> datetime64: INT64 Timestamp(isAdjustedToUTC=false, timeUnit=microseconds)
> datetime64_ts: INT64 Timestamp(isAdjustedToUTC=true, timeUnit=microseconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to