Quick follow up. I'm trying to work around this myself in the meantime. The goal is to qualify the TimestampValue with a timezone (by creating a new column in the arrow table based off the previous one). If this can be done before the Value's are converted to python it may fix the issue I was having. But it doesn't appear that I can create a new Timestamp type column with the values from the old timestamp column.
Here is the code I'm using: def chunkedToArray(data): for chunk in data.iterchunks(): for value in chunk: yield value def datetimeColumnsAddTimezone(table): for i, field in enumerate(table.schema): if field.type == pa.timestamp('ns'): newField = pa.field(field.name, pa.timestamp('ns', tz='GMT'), field.nullable, field.metadata) newArray = pa.array([val for val in chunkedToArray(table[i].data)], pa.timestamp('ns', tz='GMT')) newColumn = pa.Column.from_array(newField, newArray) table = table.remove_column(i) table = table.add_column(i, newColumn) return table Cheers, Lucas Pickup From: Lucas Pickup [mailto:lucas.pic...@microsoft.com.INVALID] Sent: Friday, August 25, 2017 3:23 PM To: dev@arrow.apache.org Subject: Reading Parquet datetime column gives different answer in Spark vs PyArrow Hi all, I've been messing around with Spark and PyArrow Parquet reading. In my testing I've found that a Parquet file written by Spark containing a datetime column, results in different datetimes from Spark and PyArrow. The attached script demonstrates this. Output: Spark Reading the parquet file into a DataFrame: [Row(Date=datetime.datetime(2015, 7, 5, 23, 50)), Row(Date=datetime.datetime(2015, 7, 5, 23, 30))] PyArrow table has dates as UTC (7 hours ahead) <pyarrow.lib.TimestampArray object at 0x0000029F3AFE79A8> [ Timestamp('2015-07-06 06:50:00') ] Pandas DF from pyarrow table has dates as UTC (7 hours ahead) Date 0 2015-07-06 06:50:00 1 2015-07-06 06:30:00 I would've expected to end up with the same datetime from both readers since there was no timezone attached at any point. It just a date and time value. Am I missing anything here? Or is this a bug. Cheers, Lucas Pickup