yihua commented on code in PR #18478:
URL: https://github.com/apache/hudi/pull/18478#discussion_r3053305395


##########
hudi-spark-datasource/hudi-spark3.4.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark34LegacyHoodieParquetFileFormat.scala:
##########
@@ -377,7 +377,7 @@ class Spark34LegacyHoodieParquetFileFormat(private val 
shouldAppendPartitionValu
           new HoodieParquetReadSupport(
             convertTz,
             enableVectorizedReader = false,
-            enableTimestampFieldRepair = true,
+            enableTimestampFieldRepair = hasTimestampMillisFieldInTableSchema,

Review Comment:
   🤖 This change now gates the parquet-level timestamp repair on 
`hasTimestampMillisFieldInTableSchema`, which only detects `TimestampMillis` 
and `LocalTimestampMillis`. The same `LocalTimestampMicros` gap I flagged 
earlier on the compactor path now applies here too: if the table schema has 
`LocalTimestampMicros` fields and the base file was written as plain LONG, 
`hasTimestampMillisFieldInTableSchema` will be false, 
`enableTimestampFieldRepair` will be false, and the needed repair will be 
skipped during Spark reads. Before this change the repair was unconditionally 
enabled, so this is a regression for that (admittedly uncommon) case. Could you 
broaden the check to cover all logical timestamp types that 
`needsLogicalTypeRepair` handles — or rename it to something like 
`hasTimestampLogicalTypeField` so the scope is clear?
   
   🤖 *This review comment was generated by an AI agent and may contain 
mistakes. Please verify any suggestions before applying.*



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to