yihua commented on code in PR #18478:
URL: https://github.com/apache/hudi/pull/18478#discussion_r3056999400
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java:
##########
@@ -130,6 +131,10 @@ public HoodieData<WriteStatus> compact(
TaskContextSupplier taskContextSupplier = table.getTaskContextSupplier();
// if this is a MDT, set up the instant range of log reader just like
regular MDT snapshot reader.
Option<InstantRange> instantRange =
CompactHelpers.getInstance().getInstantRange(metaClient);
+ // Since we are using merge handle here, we can directly query the write
schema from conf
+ // Write handle provides an option to use overridden write schema as well
which is not used by merge handle
+ Schema writerSchema = HoodieAvroUtils.addMetadataFields(new
Schema.Parser().parse(config.getWriteSchema()),
config.allowOperationMetadataField());
Review Comment:
🤖 This still has the `LocalTimestampMicros` gap I've flagged in prior
rounds. Concretely: the old code left `ENABLE_LOGICAL_TIMESTAMP_REPAIR` unset
when `hasTimestampMillisField` was false, so `HoodieMergeHelper` fell through
to its default of `true` and repair happened. Now
`setLogicalTimestampRepairIfNotSet` writes `"false"` into the config,
`HoodieMergeHelper` reads it, and repair is skipped. For a table whose schema
has `LocalTimestampMicros` (non-UTC local timestamp with microsecond
precision), this is a regression — base files written as plain LONG will no
longer be repaired during compaction. Could you broaden the check to cover all
logical timestamp types that `needsLogicalTypeRepair` handles (i.e., any
non-UTC-adjusted local timestamp, regardless of precision)?
<sub><i>- Generated by an AI agent and may contain mistakes. Please verify
any suggestions before applying.</i></sub>
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java:
##########
@@ -256,6 +260,22 @@ private HoodieData<WriteStatus>
mapPartitionsAsRDD(HoodieData<HoodieRecord<T>> d
// Partition only
partitionedRDD = mappedRDD.partitionBy(partitioner);
}
+
+ if (!table.isMetadataTable() &&
table.getMetaClient().getActiveTimeline().getCommitsTimeline().filterCompletedInstants().countInstants()
> 0) {
+ TableSchemaResolver schemaResolver = new
TableSchemaResolver(table.getMetaClient());
+ try {
+
AvroSchemaUtils.setLogicalTimestampRepairIfNotSet(table.getStorageConf(), () ->
{
Review Comment:
🤖 If `getTableAvroSchema()` throws (e.g., corrupted timeline, missing commit
metadata), the entire commit operation will fail. Previously, repair was
unconditionally enabled, so a schema resolution failure wouldn't block writes.
Could you catch the inner exception and default to `true` (enable repair)
instead of aborting? Something like wrapping the supplier to return `true` on
failure would preserve the old fail-safe behavior.
<sub><i>- Generated by an AI agent and may contain mistakes. Please verify
any suggestions before applying.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]