yihua commented on code in PR #18478:
URL: https://github.com/apache/hudi/pull/18478#discussion_r3056999400


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java:
##########
@@ -130,6 +131,10 @@ public HoodieData<WriteStatus> compact(
     TaskContextSupplier taskContextSupplier = table.getTaskContextSupplier();
     // if this is a MDT, set up the instant range of log reader just like 
regular MDT snapshot reader.
     Option<InstantRange> instantRange = 
CompactHelpers.getInstance().getInstantRange(metaClient);
+    // Since we are using merge handle here, we can directly query the write 
schema from conf
+    // Write handle provides an option to use overridden write schema as well 
which is not used by merge handle
+    Schema writerSchema = HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(config.getWriteSchema()), 
config.allowOperationMetadataField());

Review Comment:
   🤖 This still has the `LocalTimestampMicros` gap I've flagged in prior 
rounds. Concretely: the old code left `ENABLE_LOGICAL_TIMESTAMP_REPAIR` unset 
when `hasTimestampMillisField` was false, so `HoodieMergeHelper` fell through 
to its default of `true` and repair happened. Now 
`setLogicalTimestampRepairIfNotSet` writes `"false"` into the config, 
`HoodieMergeHelper` reads it, and repair is skipped. For a table whose schema 
has `LocalTimestampMicros` (non-UTC local timestamp with microsecond 
precision), this is a regression — base files written as plain LONG will no 
longer be repaired during compaction. Could you broaden the check to cover all 
logical timestamp types that `needsLogicalTypeRepair` handles (i.e., any 
non-UTC-adjusted local timestamp, regardless of precision)?
   
   <sub><i>- Generated by an AI agent and may contain mistakes. Please verify 
any suggestions before applying.</i></sub>



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java:
##########
@@ -256,6 +260,22 @@ private HoodieData<WriteStatus> 
mapPartitionsAsRDD(HoodieData<HoodieRecord<T>> d
       // Partition only
       partitionedRDD = mappedRDD.partitionBy(partitioner);
     }
+
+    if (!table.isMetadataTable() && 
table.getMetaClient().getActiveTimeline().getCommitsTimeline().filterCompletedInstants().countInstants()
 > 0) {
+      TableSchemaResolver schemaResolver = new 
TableSchemaResolver(table.getMetaClient());
+      try {
+        
AvroSchemaUtils.setLogicalTimestampRepairIfNotSet(table.getStorageConf(), () -> 
{

Review Comment:
   🤖 If `getTableAvroSchema()` throws (e.g., corrupted timeline, missing commit 
metadata), the entire commit operation will fail. Previously, repair was 
unconditionally enabled, so a schema resolution failure wouldn't block writes. 
Could you catch the inner exception and default to `true` (enable repair) 
instead of aborting? Something like wrapping the supplier to return `true` on 
failure would preserve the old fail-safe behavior.
   
   <sub><i>- Generated by an AI agent and may contain mistakes. Please verify 
any suggestions before applying.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to