[GitHub] [hudi] cdmikechen commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

GitBox Tue, 28 Dec 2021 21:03:18 -0800


cdmikechen commented on pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1002398757



   > I have a concern around performance overhead and also wondering if we can 
just do it as a part of the existing inputformat with a flag, instead of 
switching over entirely to a new ipf? thougnts?
   
   For compatibility `com.twitter:parquet-hadoop-bundle` which used for 
`ParquetInputFormat` in Spark2 (It only contains a parameterless constructor, 
while in hive2 and hive3, a constructor containing ParquetInputFormat is added)
   
   
https://github.com/apache/hive/blob/8e7f23f34b2ce7328c9d571a13c336f0c8cdecb6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java#L48-L55
   ```java
     public MapredParquetInputFormat() {
       this(new 
ParquetInputFormat<ArrayWritable>(DataWritableReadSupport.class));
     }
   
     protected MapredParquetInputFormat(final ParquetInputFormat<ArrayWritable> 
inputFormat) {
       this.realInput = inputFormat;
       vectorizedSelf = new VectorizedParquetInputFormat();
     }
   ```
   Otherwise, we can actually consider refactoring directly into
   ```java
     public HoodieParquetInputFormat() {
       super(new HudiAvroParquetInputFormat());
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] cdmikechen commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

Reply via email to