[ https://issues.apache.org/jira/browse/ARROW-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487634#comment-17487634 ]
Micah Kornfield commented on ARROW-15492: ----------------------------------------- On the exposing the write field, per the other Jira I don't think we should do it. It makes it much harder to deal with bugs that might occur in a particular version of the library. {quote}Or handle the timestamp type with timezone which files created by parquet-mr? {quote} I'm not familiar with this, could you link the to specification on this or provide more details? It seems like this might be a better approach. > [Python] handle timestamp type in parquet file for compatibility with older > HiveQL > ---------------------------------------------------------------------------------- > > Key: ARROW-15492 > URL: https://issues.apache.org/jira/browse/ARROW-15492 > Project: Apache Arrow > Issue Type: New Feature > Affects Versions: 6.0.1 > Reporter: nero > Priority: Major > > Hi there, > I face an issue when I write a parquet file by PyArrow. > In the older version of Hive, it can only recognize the timestamp type stored > in INT96, so I use table.write_to_data with `use_deprecated > timestamp_int96_timestamps=True` option to save the parquet file. But the > HiveQL will skip conversion when the metadata of parquet file is not > created_by "parquet-mr". > [hive/ParquetRecordReaderBase.java at > f1ff99636a5546231336208a300a114bcf8c5944 · apache/hive > (github.com)|https://github.com/apache/hive/blob/f1ff99636a5546231336208a300a114bcf8c5944/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L137-L139] > > So I have to save the timestamp columns with timezone info(pad to UTC+8). > But when pyarrow.parquet read from a dir which contains parquets created by > both PyArrow and parquet-mr, Arrow.Table will ignore the timezone info for > parquet-mr files. > > Maybe PyArrow can expose the created_by option in pyarrow({*}prefer{*}, > parquet::WriterProperties::created_by is available in the C++ ). > Or handle the timestamp type with timezone which files created by parquet-mr? > > Maybe related to https://issues.apache.org/jira/browse/ARROW-14422 -- This message was sent by Atlassian Jira (v8.20.1#820001)