[ https://issues.apache.org/jira/browse/ARROW-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sarah Gilmore updated ARROW-14104: ---------------------------------- Description: In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List<Timestamp> columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); >>> pq.write_table(t, "example.parq", version='2.0'); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list<item: timestamp[us, tz=America/Denver]> child 0, item: timestamp[us, tz=America/Denver] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list<item: timestamp[us, tz=UTC]> child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. was: In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List<Timestamp> columns to and from parquet files: {code:java} >>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], >>> pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list<item: timestamp[us, tz=America/Denver]> child 0, item: timestamp[us, tz=America/Denver] {code} However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC: {code:java} >>> t3 = pq.read_table("example.parq"); >>> t3 pyarrow.Table Dates: list<item: timestamp[us, tz=UTC]> child 0, item: timestamp[us, tz=UTC] {code} I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns. > Reading Lists of Timestamps from parquet files in Arrow 5.0.0 fails to > preserve the TimeZone - unlike in Arrow 4.0.0 > -------------------------------------------------------------------------------------------------------------------- > > Key: ARROW-14104 > URL: https://issues.apache.org/jira/browse/ARROW-14104 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Parquet, Python > Affects Versions: 5.0.0 > Reporter: Sarah Gilmore > Priority: Minor > > In Arrow 4.0.0 it is possible to round-trip the TimeZone property of > List<Timestamp> columns to and from parquet files: > {code:java} > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import datetime > >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], > >>> pa.list_(pa.timestamp('us', 'America/New_York'))); > >>> t = pa.Table.from_arrays([column], name=['TimestampColumn']); > >>> pq.write_table(t, "example.parq", version='2.0'); > >>> t2 = pq.read_table("example.parq"); > >>> t2 > pyarrow.Table > Dates: list<item: timestamp[us, tz=America/Denver]> > child 0, item: timestamp[us, tz=America/Denver] > {code} > However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is > set to UTC: > {code:java} > >>> t3 = pq.read_table("example.parq"); > >>> t3 > pyarrow.Table > Dates: list<item: timestamp[us, tz=UTC]> > child 0, item: timestamp[us, tz=UTC] > {code} > > I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested > timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)