hudi-agent commented on code in PR #18437:
URL: https://github.com/apache/hudi/pull/18437#discussion_r3274632439
##########
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java:
##########
@@ -53,12 +54,26 @@ public static int archive(JavaSparkContext jsc,
int maxCommits,
int commitsRetained,
boolean enableMetadata,
- String basePath) throws IOException {
+ String basePath,
+ Map<String, String> conf) throws IOException {
+ // NOTE on builder ordering:
+ // `withArchivalConfig`/`withCleanConfig`/`withMetadataConfig` each call
+ // `putAll(subConfig.getProps())` onto `writeConfig.getProps()`, which
+ // includes every key filled in by `setDefaults` during the sub-config's
+ // `build()`. If `withProps(conf)` ran BEFORE them, those defaults would
+ // overwrite the user's options (e.g. `hoodie.keep.min.commits`).
+ //
+ // Therefore `withProps(conf)` is intentionally placed LAST so
user-supplied
+ // options reliably win over sub-config defaults. Named procedure params
+ // (min/max/retain/enableMetadata) are forwarded via the dedicated
builders
+ // below; if the caller wants those to win over a same-name key in
`conf`,
+ // the procedure layer is responsible for not putting that key into
`conf`.
HoodieWriteConfig config =
HoodieWriteConfig.newBuilder().withPath(basePath)
.withArchivalConfig(HoodieArchivalConfig.newBuilder().archiveCommitsWith(minCommits,
maxCommits).build())
.withCleanConfig(HoodieCleanConfig.newBuilder().retainCommits(commitsRetained).build())
.withEmbeddedTimelineServerEnabled(false)
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(enableMetadata).build())
+ .withProps(conf)
.build();
Review Comment:
_⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_
<details>
<summary>🧩 Analysis chain</summary>
🏁 Script executed:
```shell
#!/bin/bash
set -euo pipefail
# Verify whether withProps(Map<...>) handles null safely in this codebase.
rg -n --type=java -C6 '\bwithProps\s*\(\s*Map<[^>]+>\s+\w+\s*\)'
rg -n --type=java -C6
'\bwithProps\s*\(|\bputAll\s*\(|requireNonNull|Objects\.requireNonNull'
```
Repository: hudi-agent/hudi
Length of output: 41
---
🏁 Script executed:
```shell
#!/bin/bash
# Find ArchiveExecutorUtils.java
find . -name "ArchiveExecutorUtils.java" -type f
```
Repository: hudi-agent/hudi
Length of output: 154
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for HoodieWriteConfig and withProps
find . -name "HoodieWriteConfig.java" -type f
```
Repository: hudi-agent/hudi
Length of output: 152
---
🏁 Script executed:
```shell
#!/bin/bash
# Broader search for withProps method
rg -l "withProps" --type java
```
Repository: hudi-agent/hudi
Length of output: 8364
---
🏁 Script executed:
```shell
#!/bin/bash
# Read ArchiveExecutorUtils.java to see the archive method signature and
withProps call
cat -n
./hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java
| head -100
```
Repository: hudi-agent/hudi
Length of output: 5033
---
🏁 Script executed:
```shell
#!/bin/bash
# Read HoodieWriteConfig.java to find the withProps method implementation
rg -A 10 "public.*withProps\s*\("
./hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
```
Repository: hudi-agent/hudi
Length of output: 349
---
🏁 Script executed:
```shell
#!/bin/bash
# Find callers of the archive method
rg -B 3 -A 3 "ArchiveExecutorUtils\.archive\(" --type java
```
Repository: hudi-agent/hudi
Length of output: 872
---
</details>
**Guard `conf` before calling `withProps`.**
The `HoodieWriteConfig.withProps(Map)` method does not null-check its
argument before calling `putAll()`. If `archive()` is called with `conf =
null`, it will fail with a NullPointerException at runtime. Normalize to an
empty map for robustness.
<details>
<summary>Suggested patch</summary>
```diff
+import java.util.Collections;
import java.util.Map;
@@
public static int archive(JavaSparkContext jsc,
int minCommits,
int maxCommits,
int commitsRetained,
boolean enableMetadata,
String basePath,
Map<String, String> conf) throws IOException {
+ Map<String, String> effectiveConf = conf == null ?
Collections.emptyMap() : conf;
@@
HoodieWriteConfig config =
HoodieWriteConfig.newBuilder().withPath(basePath)
@@
- .withProps(conf)
+ .withProps(effectiveConf)
.build();
```
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
```
Verify each finding against current code. Fix only still-valid issues, skip
the
rest with a brief reason, keep changes minimal, and validate.
In
`@hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java`
around lines 58 - 77, The call to HoodieWriteConfig.newBuilder() uses
withProps(conf) without null-checking conf, which allows a
NullPointerException
when archive(...) is invoked with conf == null; update the
ArchiveExecutorUtils
method creating the HoodieWriteConfig to normalize conf before passing it
into
withProps (e.g., replace the direct withProps(conf) call with one that
passes a
non-null Map by using conf == null ? Collections.emptyMap() : conf or a new
HashMap<>(conf)), ensuring you import Collections if needed and leaving all
other builder calls (withArchivalConfig, withCleanConfig, withMetadataConfig)
unchanged.
```
</details>
<!-- fingerprinting:phantom:poseidon:hawk -->
<!-- This is an auto-generated comment by CodeRabbit -->
— *CodeRabbit*
([original](https://github.com/hudi-agent/hudi/pull/35#discussion_r3274631721))
(source:comment#3274631721)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]