jonvex commented on code in PR #12075:
URL: https://github.com/apache/hudi/pull/12075#discussion_r1797746402


##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala:
##########
@@ -87,17 +88,15 @@ class 
SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea
     }
     val structType = HoodieInternalRowUtils.getCachedSchema(requiredSchema)
     if (FSUtils.isLogFile(filePath)) {
-      val projection = 
HoodieInternalRowUtils.getCachedUnsafeProjection(structType, structType)
-      new CloseableMappingIterator[InternalRow, UnsafeRow](
+      val dataSchemaWithMergeCol = if (hasRowIndexField) {
+        HoodiePositionBasedSchemaHandler.addPositionalMergeCol(dataSchema)
+      } else {
+        dataSchema

Review Comment:
   The HoodieSparkFileReaderFactory parquet reader doesn't have schema 
evolution implemented so we want to read using the write schema because we know 
we can do that without any errors. But now that you point it out, we might be 
reading a lot of columns we don't need. So need to think more about what to do 
here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to