[GitHub] [hudi] cdmikechen opened a new pull request, #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3

GitBox Fri, 09 Dec 2022 18:43:09 -0800


cdmikechen opened a new pull request, #3391:
URL: https://github.com/apache/hudi/pull/3391


   ### Change Logs
   
   This pull request let hive can read timestamp type column datas correctly.  
   The problem was initially related to JIRA 
[HUDI-83](https://issues.apache.org/jira/browse/HUDI-83) and related issues on 
issue https://github.com/apache/hudi/issues/2544
   
   - Change `HoodieParquetInputFormat` to use a custom `ParquetInputFormat` 
named `HudiAvroParquetInputFormat`
   - In `HudiAvroParquetInputFormat` we use a custom `RecordReader` named 
`HudiAvroParquetReader`. In this class we use `AvroReadSupport` so that Hive 
can get parquet data with an avro GenericRecord.
   - Use 
`org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.avroToArrayWritable`
 to transform GenericRecord to ArrayWriteable. At the same time, timestamp/date 
type processing for different situations of hive2 and hive3 is added to this 
method.
   - Set `hoodie.datasource.hive_sync.support_timestamp` default value from 
false to true
   - add a `supportAvroRead` value to be compatible with the adaptation of some 
old hudi versions for hive3 timestamp/date types
   
   ### Impact
   
   - hudi-hadoop-mr
   - spark
   
   ### Risk level 
   low
   
   ### Documentation Update
   The javadoc has been modified and the website document will be on other PR 
later.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] cdmikechen opened a new pull request, #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3

Reply via email to