cdmikechen commented on pull request #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1002398757
> I have a concern around performance overhead and also wondering if we can just do it as a part of the existing inputformat with a flag, instead of switching over entirely to a new ipf? thougnts? For compatibility `com.twitter:parquet-hadoop-bundle` which used for `ParquetInputFormat` in Spark2 (It only contains a parameterless constructor, while in hive2 and hive3, a constructor containing ParquetInputFormat is added) https://github.com/apache/hive/blob/8e7f23f34b2ce7328c9d571a13c336f0c8cdecb6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java#L48-L55 ```java public MapredParquetInputFormat() { this(new ParquetInputFormat<ArrayWritable>(DataWritableReadSupport.class)); } protected MapredParquetInputFormat(final ParquetInputFormat<ArrayWritable> inputFormat) { this.realInput = inputFormat; vectorizedSelf = new VectorizedParquetInputFormat(); } ``` Otherwise, we can actually consider refactoring directly into ```java public HoodieParquetInputFormat() { super(new HudiAvroParquetInputFormat()); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org