[ 
https://issues.apache.org/jira/browse/ARROW-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua Pedrick closed ARROW-7678.
---------------------------------
    Resolution: Invalid

Managed to recreate the bug without setting TZ.

> [C++][Parquet] setting TZ= in environment on Linux causes broken parquet
> ------------------------------------------------------------------------
>
>                 Key: ARROW-7678
>                 URL: https://issues.apache.org/jira/browse/ARROW-7678
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 0.15.1
>         Environment: Linux, Ubuntu 18.04, arrow/parquet 0.15.1 from 
> instructions https://arrow.apache.org/install/
>            Reporter: Joshua Pedrick
>            Priority: Blocker
>
> When I set TZ=CST-8, or other timezone on Linux time columns are corrupted in 
> my resulting parquet file.
>  
> Below are the calls I use to define my schema:
>  
> {code:java}
> PrimitiveNode::Make( columnName, Repetition::REQUIRED,
>  LogicalType::Timestamp( true, LogicalType::TimeUnit::MICROS, false, false ),
>  ::parquet::Type::INT64 ) );
> PrimitiveNode::Make( columnName,
>  repetition,
>  LogicalType::Time( true, LogicalType::TimeUnit::MICROS ),
>  ::parquet::Type::INT64 ) );
> {code}
> I use an Int64Writer for both types. When reading, in this case using pandas 
> with pyarrow, but also in C++, I get the following exception:
> {code:java}
>  File "pyarrow/_parquet.pyx", line 1136, in 
> pyarrow._parquet.ParquetReader.read_all
>  File "pyarrow/error.pxi", line 80, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Couldn't deserialize thrift: TProtocolException: 
> Invalid data
> Deserializing page header failed.{code}
> Seems as if the column header must be defining a timestamp+timezone even 
> though I manually set is_adjusted_to_utc.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to