Florian Jetter created ARROW-5888:
-------------------------------------

             Summary: [Python][C++] Parquet write metadata not roundtrip safe 
for timezone timestamps
                 Key: ARROW-5888
                 URL: https://issues.apache.org/jira/browse/ARROW-5888
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Florian Jetter


The timezone is not roundtrip safe for timezones other than UTC when storing to 
parquet. Expected behavior would be that the timezone is properly reconstructed

{code:python}
schema = pa.schema(
    [
        pa.field("no_tz", pa.timestamp('us')),
        pa.field("no_tz", pa.timestamp('us', tz="UTC")),
        pa.field("no_tz", pa.timestamp('us', tz="Europe/Berlin")),
]
)
buf = pa.BufferOutputStream()
pq.write_metadata(
    schema,
    buf,
    coerce_timestamps="us"
)

pq_bytes = buf.getvalue().to_pybytes()
reader = pa.BufferReader(pq_bytes)
parquet_file = pq.ParquetFile(reader)
parquet_file.schema.to_arrow_schema()
# Output:
# no_tz: timestamp[us]
# utc: timestamp[us, tz=UTC]
# europe: timestamp[us, tz=UTC]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to