Lucas Pickup created ARROW-1435: ----------------------------------- Summary: PyArrow not propagating timezone information from Parquet to Pyhon Key: ARROW-1435 URL: https://issues.apache.org/jira/browse/ARROW-1435 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.6.0 Reporter: Lucas Pickup
PyArrow reads timezone metadata for Timestamp values from Parquet. This information isn't propagated through to the resulting python datetime object though. {noformat} λ python Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import pytz >>> import pandas >>> from datetime import datetime >>> >>> d1 = datetime.strptime('2015-07-05 23:50:00', '%Y-%m-%d %H:%M:%S') >>> d1 datetime.datetime(2015, 7, 5, 23, 50) >>> aware = pytz.utc.localize(d1) >>> aware datetime.datetime(2015, 7, 5, 23, 50, tzinfo=<UTC>) >>> >>> df = pandas.DataFrame() >>> df['DateNaive'] = [d1] >>> df['DateAware'] = [aware] >>> df DateNaive DateAware 0 2015-07-05 23:50:00 2015-07-05 23:50:00+00:00 >>> >>> table = pa.Table.from_pandas(df) >>> table pyarrow.Table DateNaive: timestamp[ns] DateAware: timestamp[ns, tz=UTC] __index_level_0__: int64 -- metadata -- pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": "datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": ["__index_level_0__"]} >>> >>> pq.write_table(table, "E:\\pyarrowDates.parquet") >>> >>> pyarrowTable = pq.read_table("E:\\pyarrowDates.parquet") >>> pyarrowTable pyarrow.Table DateNaive: timestamp[us] DateAware: timestamp[us] __index_level_0__: int64 -- metadata -- pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": "datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": ["__index_level_0__"]} >>> >>> pyarrowDF = pyarrowTable.to_pandas() >>> pyarrowDF DateNaive DateAware 0 2015-07-05 23:50:00 2015-07-05 23:50:00 {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)