hudi-agent commented on code in PR #18437:
URL: https://github.com/apache/hudi/pull/18437#discussion_r3274632439


##########
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java:
##########
@@ -53,12 +54,26 @@ public static int archive(JavaSparkContext jsc,
                             int maxCommits,
                             int commitsRetained,
                             boolean enableMetadata,
-                            String basePath) throws IOException {
+                            String basePath,
+                            Map<String, String> conf) throws IOException {
+    // NOTE on builder ordering:
+    //   `withArchivalConfig`/`withCleanConfig`/`withMetadataConfig` each call
+    //   `putAll(subConfig.getProps())` onto `writeConfig.getProps()`, which
+    //   includes every key filled in by `setDefaults` during the sub-config's
+    //   `build()`. If `withProps(conf)` ran BEFORE them, those defaults would
+    //   overwrite the user's options (e.g. `hoodie.keep.min.commits`).
+    //
+    //   Therefore `withProps(conf)` is intentionally placed LAST so 
user-supplied
+    //   options reliably win over sub-config defaults. Named procedure params
+    //   (min/max/retain/enableMetadata) are forwarded via the dedicated 
builders
+    //   below; if the caller wants those to win over a same-name key in 
`conf`,
+    //   the procedure layer is responsible for not putting that key into 
`conf`.
     HoodieWriteConfig config = 
HoodieWriteConfig.newBuilder().withPath(basePath)
         
.withArchivalConfig(HoodieArchivalConfig.newBuilder().archiveCommitsWith(minCommits,
 maxCommits).build())
         
.withCleanConfig(HoodieCleanConfig.newBuilder().retainCommits(commitsRetained).build())
         .withEmbeddedTimelineServerEnabled(false)
         
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(enableMetadata).build())
+        .withProps(conf)
         .build();

Review Comment:
   _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_
   
   <details>
   <summary>🧩 Analysis chain</summary>
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   set -euo pipefail
   
   # Verify whether withProps(Map<...>) handles null safely in this codebase.
   rg -n --type=java -C6 '\bwithProps\s*\(\s*Map<[^>]+>\s+\w+\s*\)' 
   rg -n --type=java -C6 
'\bwithProps\s*\(|\bputAll\s*\(|requireNonNull|Objects\.requireNonNull'
   ```
   
   Repository: hudi-agent/hudi
   
   Length of output: 41
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   # Find ArchiveExecutorUtils.java
   find . -name "ArchiveExecutorUtils.java" -type f
   ```
   
   Repository: hudi-agent/hudi
   
   Length of output: 154
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   # Search for HoodieWriteConfig and withProps
   find . -name "HoodieWriteConfig.java" -type f
   ```
   
   Repository: hudi-agent/hudi
   
   Length of output: 152
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   # Broader search for withProps method
   rg -l "withProps" --type java
   ```
   
   Repository: hudi-agent/hudi
   
   Length of output: 8364
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   # Read ArchiveExecutorUtils.java to see the archive method signature and 
withProps call
   cat -n 
./hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java
 | head -100
   ```
   
   Repository: hudi-agent/hudi
   
   Length of output: 5033
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   # Read HoodieWriteConfig.java to find the withProps method implementation
   rg -A 10 "public.*withProps\s*\(" 
./hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
   ```
   
   Repository: hudi-agent/hudi
   
   Length of output: 349
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   # Find callers of the archive method
   rg -B 3 -A 3 "ArchiveExecutorUtils\.archive\(" --type java
   ```
   
   Repository: hudi-agent/hudi
   
   Length of output: 872
   
   ---
   
   
   
   </details>
   
   **Guard `conf` before calling `withProps`.**
   
   The `HoodieWriteConfig.withProps(Map)` method does not null-check its 
argument before calling `putAll()`. If `archive()` is called with `conf = 
null`, it will fail with a NullPointerException at runtime. Normalize to an 
empty map for robustness.
   
   <details>
   <summary>Suggested patch</summary>
   
   ```diff
   +import java.util.Collections;
    import java.util.Map;
   @@
      public static int archive(JavaSparkContext jsc,
                                int minCommits,
                                int maxCommits,
                                int commitsRetained,
                                boolean enableMetadata,
                                String basePath,
                                Map<String, String> conf) throws IOException {
   +    Map<String, String> effectiveConf = conf == null ? 
Collections.emptyMap() : conf;
   @@
        HoodieWriteConfig config = 
HoodieWriteConfig.newBuilder().withPath(basePath)
   @@
   -        .withProps(conf)
   +        .withProps(effectiveConf)
            .build();
   ```
   </details>
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against current code. Fix only still-valid issues, skip 
the
   rest with a brief reason, keep changes minimal, and validate.
   
   In
   
`@hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java`
   around lines 58 - 77, The call to HoodieWriteConfig.newBuilder() uses
   withProps(conf) without null-checking conf, which allows a 
NullPointerException
   when archive(...) is invoked with conf == null; update the 
ArchiveExecutorUtils
   method creating the HoodieWriteConfig to normalize conf before passing it 
into
   withProps (e.g., replace the direct withProps(conf) call with one that 
passes a
   non-null Map by using conf == null ? Collections.emptyMap() : conf or a new
   HashMap<>(conf)), ensuring you import Collections if needed and leaving all
   other builder calls (withArchivalConfig, withCleanConfig, withMetadataConfig)
   unchanged.
   ```
   
   </details>
   
   <!-- fingerprinting:phantom:poseidon:hawk -->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/hudi-agent/hudi/pull/35#discussion_r3274631721)) 
(source:comment#3274631721)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to