Copilot commented on code in PR #18429:
URL: https://github.com/apache/pinot/pull/18429#discussion_r3191821129


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java:
##########
@@ -38,24 +39,41 @@
 import org.apache.pinot.core.util.NumberUtils;
 import org.apache.pinot.core.util.NumericException;
 import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.utils.BooleanUtils;
 import org.apache.pinot.spi.utils.JsonUtils;
+import org.apache.pinot.spi.utils.TimestampUtils;
 import org.roaringbitmap.RoaringBitmap;
 
 
-/**
- * The <code>JsonExtractScalarTransformFunction</code> class implements the 
json path transformation based on
- * <a href="https://goessner.net/articles/JsonPath/";>Stefan Goessner JsonPath 
implementation.</a>.
- *
- * Please note, currently this method only works with String field. The values 
in this field should be Json String.
- *
- * Usage:
- * jsonExtractScalar(jsonFieldName, 'jsonPath', 'resultsType')
- * <code>jsonFieldName</code> is the Json String field/expression.
- * <code>jsonPath</code> is a JsonPath expression which used to read from JSON 
document
- * <code>results_type</code> refers to the results data type, could be INT, 
LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING,
- * INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.
- *
- */
+/// Implements the `jsonExtractScalar(jsonField, jsonPath, resultsType[, 
defaultValue])` transform.
+/// Reads a JSON document from `jsonField` for each row, resolves the
+/// [Stefan Goessner JsonPath](https://goessner.net/articles/JsonPath/) 
expression against it, and
+/// converts the resolved value to `resultsType`.
+///
+/// **Arguments:**
+/// - `jsonField` — single-value `STRING` or `BYTES` column / transform 
expression containing JSON.
+/// - `jsonPath` — JsonPath expression used to read the value.
+/// - `resultsType` — Pinot data type for the output. Append `_ARRAY` for 
multi-value results.
+/// - `defaultValue` (optional) — used when the path resolves to `null` or 
fails. Without it, unresolved
+///   SV rows throw `IllegalArgumentException`; MV rows surface as empty 
arrays, but null elements within
+///   a resolved array still throw.
+///
+/// **Supported `resultsType`:** `INT`, `LONG`, `FLOAT`, `DOUBLE`, 
`BIG_DECIMAL`, `BOOLEAN`, `TIMESTAMP`,
+/// `STRING`, `JSON`, `BYTES`, plus `_ARRAY` variants of `INT` / `LONG` / 
`FLOAT` / `DOUBLE` /
+/// `BIG_DECIMAL` / `STRING`.
+///
+/// **Per-row coercion** of the JsonPath result to `resultsType`:
+/// - `BOOLEAN` (stored as `INT`) follows Pinot's numeric convention — any 
non-zero `Number` is true;
+///   `Boolean` and `"true"` / `"TRUE"` / `"1"` strings (via 
[BooleanUtils#toInt(String)]) are also true.
+/// - `TIMESTAMP` (stored as `LONG`) accepts numeric epoch millis directly; 
strings go through
+///   [TimestampUtils#toMillisSinceEpoch] (ISO-8601 and numeric millis 
strings).
+/// - `STRING` returns `String` values as-is; other JSON values are serialized 
via
+///   [JsonUtils#objectToString].
+/// - `BIG_DECIMAL` and `STRING` paths use a BigDecimal-preserving JSON parser
+///   (`JSON_PARSER_CONTEXT_WITH_BIG_DECIMAL`) to avoid precision loss on 
numeric values; other paths use
+///   the default parser.
+/// - Other types coerce via `Number` cast (preserved as the canonical 
primitive form) or
+///   `parse*(toString())`.

Review Comment:
   `///` comments aren’t JavaDoc and typically won’t be picked up by JavaDoc 
tooling (and may violate checkstyle conventions if the project enforces JavaDoc 
on public classes). Converting this header back to a `/** ... */` JavaDoc block 
would preserve generated API docs and keep comment style consistent with 
typical Java conventions.
   



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java:
##########
@@ -437,17 +411,17 @@ public int[][] transformToIntValuesMV(ValueBlock 
valueBlock) {
       int numValues = result.size();
       int[] values = new int[numValues];
       for (int j = 0; j < numValues; j++) {
-        Integer value = result.get(j);
-        if (value == null) {
+        Object element = result.get(j);
+        if (element == null) {
           if (_defaultValue != null) {
-            value = ((Number) _defaultValue).intValue();
-          } else {
-            throw new IllegalArgumentException(
-                "At least one of the resolved JSON arrays include nulls, which 
is not supported in Pinot. "
-                    + "Consider setting a default value as the fourth argument 
of json_extract_scalar.");
+            values[j] = defaultValue;
+            continue;
           }
+          throw new IllegalArgumentException(
+              "At least one of the resolved JSON arrays include nulls, which 
is not supported in Pinot. "
+                  + "Consider setting a default value as the fourth argument 
of json_extract_scalar.");

Review Comment:
   The error message references `json_extract_scalar`, but the function name 
(and docs/tests) use `jsonExtractScalar`. This is user-facing and can be 
confusing; update the message to consistently use `jsonExtractScalar(...)` (and 
consider keeping the phrasing consistent across SV/MV paths where this message 
appears).
   



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java:
##########
@@ -577,23 +607,87 @@ public String[][] transformToStringValuesMV(ValueBlock 
valueBlock) {
       int numValues = result.size();
       String[] values = new String[numValues];
       for (int j = 0; j < numValues; j++) {
-        String value = result.get(j);
-        if (value == null) {
+        Object element = result.get(j);
+        if (element == null) {
           if (_defaultValue != null) {
-            value = _defaultValue.toString();
-          } else {
-            throw new IllegalArgumentException(
-                "At least one of the resolved JSON arrays include nulls, which 
is not supported in Pinot. "
-                    + "Consider setting a default value as the fourth argument 
of json_extract_scalar.");
+            values[j] = defaultValue;
+            continue;
           }
+          throw new IllegalArgumentException(
+              "At least one of the resolved JSON arrays include nulls, which 
is not supported in Pinot. "
+                  + "Consider setting a default value as the fourth argument 
of json_extract_scalar.");
         }
-        values[j] = value;
+        values[j] = toString(element);
       }
       _stringValuesMV[i] = values;
     }
     return _stringValuesMV;
   }
 
+  private static int toInt(Object value, boolean isBoolean) {
+    if (isBoolean) {
+      if (value instanceof Boolean) {
+        return (Boolean) value ? 1 : 0;
+      }
+      // For BOOLEAN result, follow PinotDataType numeric convention: non-zero 
number → true.
+      if (value instanceof Number) {
+        return ((Number) value).doubleValue() != 0 ? 1 : 0;
+      }
+      // String fallback: BooleanUtils.toInt accepts "true" / "TRUE" / "1".
+      return BooleanUtils.toInt(value.toString());
+    }
+    if (value instanceof Number) {
+      return ((Number) value).intValue();
+    }
+    return Integer.parseInt(value.toString());
+  }
+
+  private static long toLong(Object value, boolean isTimestamp) {
+    if (value instanceof Number) {
+      return ((Number) value).longValue();
+    }
+    if (isTimestamp) {
+      return TimestampUtils.toMillisSinceEpoch(value.toString());
+    }
+    try {
+      return NumberUtils.parseJsonLong(value.toString());
+    } catch (NumericException nfe) {
+      throw new NumberFormatException("For input string: \"" + value + "\"");
+    }
+  }
+
+  private static float toFloat(Object value) {
+    if (value instanceof Number) {
+      return ((Number) value).floatValue();
+    }
+    return Float.parseFloat(value.toString());
+  }
+
+  private static double toDouble(Object value) {
+    if (value instanceof Number) {
+      return ((Number) value).doubleValue();
+    }
+    return Double.parseDouble(value.toString());
+  }
+
+  private static BigDecimal toBigDecimal(Object value) {
+    if (value instanceof BigDecimal) {
+      return (BigDecimal) value;
+    }
+    return new BigDecimal(value.toString());
+  }
+
+  private static String toString(Object value) {
+    if (value instanceof String) {
+      return (String) value;
+    }
+    try {
+      return JsonUtils.objectToString(value);
+    } catch (JsonProcessingException e) {
+      throw new RuntimeException("Caught exception while serializing JSON 
value: " + value, e);
+    }

Review Comment:
   Including the full `value` in the exception message can leak sensitive row 
data into query errors/logs and can also create very large error payloads if 
the value is a big object/array. Prefer omitting the raw value or truncating it 
(e.g., include only type/size or a capped-length string) while still chaining 
the original exception.



##########
pinot-core/src/test/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunctionTest.java:
##########
@@ -653,4 +653,177 @@ public void mvWithNullsWithDefault(String resultsType, 
String defaultValSql) {
         // TODO: Change the framework to do not duplicate segments when only 
one segment is used
         .thenResultIs(expectedRow, expectedRow); // 2 rows because of segment 
duplication
   }
+
+  // === Per-stored-type coercion tests for the value-handling helpers ===
+
+  /// Runs `SELECT jsonExtractScalar(json, '$.v', resultsType) FROM testTable` 
against a single-row table
+  /// containing the given JSON document, and asserts the result for the 
(always-duplicated) two
+  /// expected rows.

Review Comment:
   The test introduces `///` comments, which aren’t standard JavaDoc and may 
not match the project’s comment/style conventions for Java tests. Consider 
switching these to `//` (or `/** ... */` if you want JavaDoc-style formatting 
on helpers) to avoid style/lint issues and keep consistency across the test 
suite.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to