[GitHub] [hudi] satishkotha edited a comment on issue #2123: Timestamp not parsed correctly on Athena

GitBox Tue, 29 Sep 2020 21:50:57 -0700


satishkotha edited a comment on issue #2123:
URL: https://github.com/apache/hudi/issues/2123#issuecomment-701154113



   This is a bit complicated. Hudi uses spark converters to convert dataframe 
type into parquet type. Spark SchemaConverters converts timestamp to  
[int64](https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L164)
 with logical type 'TIMESTAMP_MICROS'.
   
   This is because int96 is no longer supported in 
[parquet](https://issues.apache.org/jira/browse/PARQUET-1883), especially 
parquet-avro module. In general, int96 is discouraged going forward. 
   
   To make timestamp work, we had to 
   1) Change query engines to support reading parquet logical type. Example for 
[presto](https://github.com/prestodb/presto/pull/15074/files). We did similar 
change for Hive. You probably need similar change in Athena
   2) Change DLASync/HiveSync to convert logical type TIMESTAMP_MICROS as  hive 
type 'timestamp'. [PR here](https://github.com/apache/hudi/pull/2129)
   
   Unfortunately, there is no clean workaround. As i mentioned, this is a bit 
complicated. Please don't hesitate to ping me if you have any questions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] satishkotha edited a comment on issue #2123: Timestamp not parsed correctly on Athena

Reply via email to