Re: [PR] [HUDI-6910] Fix schema evolution in the filegroup reader for parquet log blocks [hudi]

via GitHub Sat, 12 Oct 2024 11:15:09 -0700


jonvex commented on code in PR #12075:
URL: https://github.com/apache/hudi/pull/12075#discussion_r1797746402



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala:
##########
@@ -87,17 +88,15 @@ class 
SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea
     }
     val structType = HoodieInternalRowUtils.getCachedSchema(requiredSchema)
     if (FSUtils.isLogFile(filePath)) {
-      val projection = 
HoodieInternalRowUtils.getCachedUnsafeProjection(structType, structType)
-      new CloseableMappingIterator[InternalRow, UnsafeRow](
+      val dataSchemaWithMergeCol = if (hasRowIndexField) {
+        HoodiePositionBasedSchemaHandler.addPositionalMergeCol(dataSchema)
+      } else {
+        dataSchema

Review Comment:
   The HoodieSparkFileReaderFactory parquet reader doesn't have schema 
evolution implemented so we want to read using the write schema because we know 
we can do that without any errors. But now that you point it out, we might be 
reading a lot of columns we don't need. So need to think more about what to do 
here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-6910] Fix schema evolution in the filegroup reader for parquet log blocks [hudi]

Reply via email to