hudi-agent commented on code in PR #18437:
URL: https://github.com/apache/hudi/pull/18437#discussion_r3285956482
##########
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java:
##########
@@ -53,12 +54,26 @@ public static int archive(JavaSparkContext jsc,
int maxCommits,
int commitsRetained,
boolean enableMetadata,
- String basePath) throws IOException {
+ String basePath,
+ Map<String, String> conf) throws IOException {
Review Comment:
🤖 nit: the parameter is a map of override options, not a single config —
could you rename `conf` to something like `extraConfigs` or `options` to match
how the caller in `ArchiveCommitsProcedure` describes it? Singular `conf` reads
as a single value.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java:
##########
@@ -73,4 +88,4 @@ public static int archive(JavaSparkContext jsc,
}
return 0;
}
-}
+}
Review Comment:
🤖 nit: looks like the trailing newline got dropped from this file (`\ No
newline at end of file` in the diff). Could you add it back?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieCLIUtils.scala:
##########
@@ -107,11 +107,56 @@ object HoodieCLIUtils extends Logging {
}
}
+ /**
+ * Parse a comma-separated string of key=value pairs into a Map.
+ *
+ * Notes:
+ * - Whitespace surrounding keys/values is trimmed; empty tokens (e.g. from
a
+ * trailing comma or `", ,"`) are silently ignored.
+ * - The delimiter is the first `=` in a token, so values may themselves
+ * contain `=` (e.g. `k=a=b` parses to `k -> "a=b"`).
+ * - Values cannot contain literal commas; the parser does not support
+ * escaping. Configs that need commas should be set via Spark conf
instead.
+ * - If the same key appears more than once, a WARN is logged and the last
+ * occurrence wins (consistent with `toMap`'s last-write-wins semantics).
+ *
+ * @throws IllegalArgumentException if a non-empty token does not contain `=`
+ * or has an empty key.
+ */
def extractOptions(s: String): Map[String, String] = {
- StringUtils.split(s, ",").asScala
- .map(split => StringUtils.split(split, "="))
- .map(pair => pair.get(0) -> pair.get(1))
- .toMap
+ if (s == null) {
+ Map.empty
+ } else {
+ val pairs = StringUtils.split(s, ",").asScala
+ .map(_.trim)
+ .filter(_.nonEmpty)
+ .map(token => {
+ val delimiterIndex = token.indexOf('=')
+ if (delimiterIndex <= 0) {
+ throw new IllegalArgumentException(
+ s"Invalid options format: '$token'. Expected 'key=value' pairs
separated by commas, "
+ + "for example: 'k1=v1,k2=v2'.")
+ }
+
+ val key = token.substring(0, delimiterIndex).trim
+ if (key.isEmpty) {
+ throw new IllegalArgumentException(
+ s"Invalid options format: '$token'. Option key must not be empty
and options should "
+ + "follow 'key=value' format.")
+ }
+
+ val value = token.substring(delimiterIndex + 1).trim
+ key -> value
+ })
+ .toSeq
+
+ val duplicates = pairs.groupBy(_._1).collect { case (k, vs) if vs.size >
1 => k }
+ if (duplicates.nonEmpty) {
+ logWarning(s"Duplicate option keys detected: ${duplicates.mkString(",
")}. "
+ + "The last occurrence will take effect.")
Review Comment:
🤖 nit: `pairs` is built as a `Seq`, then traversed once for the duplicate
check and again for `.toMap`. Could you compute duplicates inline while
folding, or just use a single pass (e.g. `groupBy` once and derive both)?
Minor, but it avoids the double traversal and the slightly awkward
`toSeq`/`toMap` pair.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]