qlong commented on code in PR #15384:
URL: https://github.com/apache/iceberg/pull/15384#discussion_r3017234226


##########
api/src/main/java/org/apache/iceberg/expressions/PathUtil.java:
##########
@@ -83,6 +84,26 @@ public static String toNormalizedPath(Iterable<String> 
fields) {
             .collect(Collectors.joining(""));
   }
 
+  /**
+   * Converts a normalized path (e.g. $['a']['b']) to dot notation (e.g. 
$.a.b). Used when unbinding
+   * BoundExtract so the result can be passed to Expressions.extract().
+   */
+  static String toDotNotation(String normalizedPath) {
+    Preconditions.checkArgument(
+        normalizedPath != null && normalizedPath.startsWith(ROOT),
+        "Invalid normalized path: %s",
+        normalizedPath);
+    List<String> fields = Lists.newArrayList();
+    Matcher matcher = 
Pattern.compile("\\['([^']*)'\\]").matcher(normalizedPath);
+    while (matcher.find()) {
+      fields.add(matcher.group(1));
+    }
+    if (fields.isEmpty()) {
+      return ROOT;
+    }
+    return ROOT + "." + String.join(".", fields);

Review Comment:
   Good call out. json field name can have dot, and you are right 
toDotNotation("$['user.name']") breaks the round trip, as dot style path itself 
cannot represent "user.name" as a single segment. 
   
   Digging code more, iceberg only supports dot style  path for variant 
(https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/PathUtil.java#L55-L56),
 internally it converts to bracket style 
(https://github.com/apache/iceberg/pull/12835).   I think this is pretty strong 
limitation as it would not support dot in field name. Spark supports bracket, 
dot, and mixed styple path for variant, so $.employee['user.name'] works. I 
think we should support those styles as well.  I can  add that to this PR as 
the current change is small. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to