hudi-agent commented on code in PR #18384:
URL: https://github.com/apache/hudi/pull/18384#discussion_r3277658155


##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileWriter.java:
##########
@@ -56,4 +57,21 @@ default void prepRecordWithMetadata(HoodieKey key, 
IndexedRecord avroRecord, Str
     HoodieAvroUtils.addHoodieKeyToRecord((GenericRecord) avroRecord, 
key.getRecordKey(), key.getPartitionPath(), fileName);
     HoodieAvroUtils.addCommitMetadataToRecord((GenericRecord) avroRecord, 
instantTime, seqId);
   }
+
+  /**
+   * Selectively populates meta fields based on the provided {@link 
HoodieMetaFieldFlags} flags.
+   * Fields that are not populated are explicitly set to null so that any 
pre-existing values on
+   * {@code avroRecord} are cleared and the on-disk output is deterministic.
+   */
+  default void prepRecordWithMetadata(HoodieKey key, IndexedRecord avroRecord, 
String instantTime,
+      Integer partitionId, long recordIndex, String fileName, 
HoodieMetaFieldFlags populateFlags) {
+    GenericRecord rec = (GenericRecord) avroRecord;
+    String seqId = populateFlags.isCommitSeqNoPopulated()
+        ? HoodieRecord.generateSequenceId(instantTime, partitionId, 
recordIndex) : null;
+    rec.put(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
populateFlags.getCommitTime(instantTime));

Review Comment:
   🤖 nit: `populateFlags.getCommitTime(instantTime)` reads like a plain getter 
but actually returns either `instantTime` or `null` depending on the flag. 
Consider renaming these helpers (e.g. `commitTimeOrNull`, `recordKeyOrNull`, 
...) so the conditional semantics are obvious at the call site — or just inline 
the ternary here to match how `seqId` is handled two lines above.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAbstractMergeHandle.java:
##########
@@ -143,8 +144,17 @@ private void initWriteStatus(String fileId, String 
partitionPath) {
     setWriteStatusPath();
   }
 
-  private void validateAndSetAndKeyGenProps(Option<BaseKeyGenerator> 
keyGeneratorOpt, boolean populateMetaFields) {
-    ValidationUtils.checkArgument(populateMetaFields == 
!keyGeneratorOpt.isPresent());
+  private void validateAndSetAndKeyGenProps(Option<BaseKeyGenerator> 
keyGeneratorOpt,
+                                            boolean keyGeneratorRequired) {
+    // Invariant: the merge handle reconstructs the record key and/or 
partition path of old
+    // records via the supplied BaseKeyGenerator when those meta columns are 
not populated
+    // on disk - either populate.meta.fields=false or the field is in 
META_FIELDS_EXCLUDE_LIST.
+    // When both columns are populated, the values come from the base file and 
no key
+    // generator is needed.
+    ValidationUtils.checkArgument(keyGeneratorRequired == 
keyGeneratorOpt.isPresent(),
+        "Merge handle keyGenerator presence must mirror meta-field population: 
"
+            + "keyGeneratorRequired=" + keyGeneratorRequired

Review Comment:
   🤖 This assertion is strictly stronger than the previous `populateMetaFields 
== !keyGeneratorOpt.isPresent()` check in the new case where 
`populateMetaFields=true` AND a meta column is in `META_FIELDS_EXCLUDE_LIST` 
(so `keyGeneratorRequired=true`). Any merge-handle constructor call site that 
still passes `Option.empty()` for `keyGeneratorOpt` (e.g. compaction/clustering 
paths not shown in this chunk) will now fail this `checkArgument` at runtime as 
soon as a user enables exclusion. Have all merge-handle creation sites been 
audited and updated to source `keyGeneratorOpt` from 
`isKeyGeneratorRequired()`? @yihua could you confirm coverage of the compaction 
and clustering execution paths before this lands?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroParquetWriter.java:
##########
@@ -75,7 +75,8 @@ public void testProperWriting() throws IOException {
     StoragePath filePath = new 
StoragePath(tmpDir.resolve("test.parquet").toAbsolutePath().toString());
 
     try (HoodieAvroParquetWriter writer =
-             new HoodieAvroParquetWriter(filePath, parquetConfig, "001", new 
LocalTaskContextSupplier(), true)) {
+             new HoodieAvroParquetWriter(filePath, parquetConfig, "001", new 
LocalTaskContextSupplier(), true,
+                 
org.apache.hudi.common.model.HoodieMetaFieldFlags.allPopulated())) {

Review Comment:
   🤖 nit: could you add an import for `HoodieMetaFieldFlags` instead of using 
the fully qualified name? Same fix applies to `HoodieWriteableTestTable.java`, 
`TestHoodieOrcReaderWriter.java`, and 
`TestSparkBinaryCopyClusteringAndValidationMeta.java`, which all spell out 
`org.apache.hudi.common.model.HoodieMetaFieldFlags.allPopulated()` inline.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to