Copilot commented on code in PR #18429: URL: https://github.com/apache/pinot/pull/18429#discussion_r3191821129
########## pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java: ########## @@ -38,24 +39,41 @@ import org.apache.pinot.core.util.NumberUtils; import org.apache.pinot.core.util.NumericException; import org.apache.pinot.spi.data.FieldSpec.DataType; +import org.apache.pinot.spi.utils.BooleanUtils; import org.apache.pinot.spi.utils.JsonUtils; +import org.apache.pinot.spi.utils.TimestampUtils; import org.roaringbitmap.RoaringBitmap; -/** - * The <code>JsonExtractScalarTransformFunction</code> class implements the json path transformation based on - * <a href="https://goessner.net/articles/JsonPath/">Stefan Goessner JsonPath implementation.</a>. - * - * Please note, currently this method only works with String field. The values in this field should be Json String. - * - * Usage: - * jsonExtractScalar(jsonFieldName, 'jsonPath', 'resultsType') - * <code>jsonFieldName</code> is the Json String field/expression. - * <code>jsonPath</code> is a JsonPath expression which used to read from JSON document - * <code>results_type</code> refers to the results data type, could be INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING, - * INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY. - * - */ +/// Implements the `jsonExtractScalar(jsonField, jsonPath, resultsType[, defaultValue])` transform. +/// Reads a JSON document from `jsonField` for each row, resolves the +/// [Stefan Goessner JsonPath](https://goessner.net/articles/JsonPath/) expression against it, and +/// converts the resolved value to `resultsType`. +/// +/// **Arguments:** +/// - `jsonField` — single-value `STRING` or `BYTES` column / transform expression containing JSON. +/// - `jsonPath` — JsonPath expression used to read the value. +/// - `resultsType` — Pinot data type for the output. Append `_ARRAY` for multi-value results. +/// - `defaultValue` (optional) — used when the path resolves to `null` or fails. Without it, unresolved +/// SV rows throw `IllegalArgumentException`; MV rows surface as empty arrays, but null elements within +/// a resolved array still throw. +/// +/// **Supported `resultsType`:** `INT`, `LONG`, `FLOAT`, `DOUBLE`, `BIG_DECIMAL`, `BOOLEAN`, `TIMESTAMP`, +/// `STRING`, `JSON`, `BYTES`, plus `_ARRAY` variants of `INT` / `LONG` / `FLOAT` / `DOUBLE` / +/// `BIG_DECIMAL` / `STRING`. +/// +/// **Per-row coercion** of the JsonPath result to `resultsType`: +/// - `BOOLEAN` (stored as `INT`) follows Pinot's numeric convention — any non-zero `Number` is true; +/// `Boolean` and `"true"` / `"TRUE"` / `"1"` strings (via [BooleanUtils#toInt(String)]) are also true. +/// - `TIMESTAMP` (stored as `LONG`) accepts numeric epoch millis directly; strings go through +/// [TimestampUtils#toMillisSinceEpoch] (ISO-8601 and numeric millis strings). +/// - `STRING` returns `String` values as-is; other JSON values are serialized via +/// [JsonUtils#objectToString]. +/// - `BIG_DECIMAL` and `STRING` paths use a BigDecimal-preserving JSON parser +/// (`JSON_PARSER_CONTEXT_WITH_BIG_DECIMAL`) to avoid precision loss on numeric values; other paths use +/// the default parser. +/// - Other types coerce via `Number` cast (preserved as the canonical primitive form) or +/// `parse*(toString())`. Review Comment: `///` comments aren’t JavaDoc and typically won’t be picked up by JavaDoc tooling (and may violate checkstyle conventions if the project enforces JavaDoc on public classes). Converting this header back to a `/** ... */` JavaDoc block would preserve generated API docs and keep comment style consistent with typical Java conventions. ########## pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java: ########## @@ -437,17 +411,17 @@ public int[][] transformToIntValuesMV(ValueBlock valueBlock) { int numValues = result.size(); int[] values = new int[numValues]; for (int j = 0; j < numValues; j++) { - Integer value = result.get(j); - if (value == null) { + Object element = result.get(j); + if (element == null) { if (_defaultValue != null) { - value = ((Number) _defaultValue).intValue(); - } else { - throw new IllegalArgumentException( - "At least one of the resolved JSON arrays include nulls, which is not supported in Pinot. " - + "Consider setting a default value as the fourth argument of json_extract_scalar."); + values[j] = defaultValue; + continue; } + throw new IllegalArgumentException( + "At least one of the resolved JSON arrays include nulls, which is not supported in Pinot. " + + "Consider setting a default value as the fourth argument of json_extract_scalar."); Review Comment: The error message references `json_extract_scalar`, but the function name (and docs/tests) use `jsonExtractScalar`. This is user-facing and can be confusing; update the message to consistently use `jsonExtractScalar(...)` (and consider keeping the phrasing consistent across SV/MV paths where this message appears). ########## pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java: ########## @@ -577,23 +607,87 @@ public String[][] transformToStringValuesMV(ValueBlock valueBlock) { int numValues = result.size(); String[] values = new String[numValues]; for (int j = 0; j < numValues; j++) { - String value = result.get(j); - if (value == null) { + Object element = result.get(j); + if (element == null) { if (_defaultValue != null) { - value = _defaultValue.toString(); - } else { - throw new IllegalArgumentException( - "At least one of the resolved JSON arrays include nulls, which is not supported in Pinot. " - + "Consider setting a default value as the fourth argument of json_extract_scalar."); + values[j] = defaultValue; + continue; } + throw new IllegalArgumentException( + "At least one of the resolved JSON arrays include nulls, which is not supported in Pinot. " + + "Consider setting a default value as the fourth argument of json_extract_scalar."); } - values[j] = value; + values[j] = toString(element); } _stringValuesMV[i] = values; } return _stringValuesMV; } + private static int toInt(Object value, boolean isBoolean) { + if (isBoolean) { + if (value instanceof Boolean) { + return (Boolean) value ? 1 : 0; + } + // For BOOLEAN result, follow PinotDataType numeric convention: non-zero number → true. + if (value instanceof Number) { + return ((Number) value).doubleValue() != 0 ? 1 : 0; + } + // String fallback: BooleanUtils.toInt accepts "true" / "TRUE" / "1". + return BooleanUtils.toInt(value.toString()); + } + if (value instanceof Number) { + return ((Number) value).intValue(); + } + return Integer.parseInt(value.toString()); + } + + private static long toLong(Object value, boolean isTimestamp) { + if (value instanceof Number) { + return ((Number) value).longValue(); + } + if (isTimestamp) { + return TimestampUtils.toMillisSinceEpoch(value.toString()); + } + try { + return NumberUtils.parseJsonLong(value.toString()); + } catch (NumericException nfe) { + throw new NumberFormatException("For input string: \"" + value + "\""); + } + } + + private static float toFloat(Object value) { + if (value instanceof Number) { + return ((Number) value).floatValue(); + } + return Float.parseFloat(value.toString()); + } + + private static double toDouble(Object value) { + if (value instanceof Number) { + return ((Number) value).doubleValue(); + } + return Double.parseDouble(value.toString()); + } + + private static BigDecimal toBigDecimal(Object value) { + if (value instanceof BigDecimal) { + return (BigDecimal) value; + } + return new BigDecimal(value.toString()); + } + + private static String toString(Object value) { + if (value instanceof String) { + return (String) value; + } + try { + return JsonUtils.objectToString(value); + } catch (JsonProcessingException e) { + throw new RuntimeException("Caught exception while serializing JSON value: " + value, e); + } Review Comment: Including the full `value` in the exception message can leak sensitive row data into query errors/logs and can also create very large error payloads if the value is a big object/array. Prefer omitting the raw value or truncating it (e.g., include only type/size or a capped-length string) while still chaining the original exception. ########## pinot-core/src/test/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunctionTest.java: ########## @@ -653,4 +653,177 @@ public void mvWithNullsWithDefault(String resultsType, String defaultValSql) { // TODO: Change the framework to do not duplicate segments when only one segment is used .thenResultIs(expectedRow, expectedRow); // 2 rows because of segment duplication } + + // === Per-stored-type coercion tests for the value-handling helpers === + + /// Runs `SELECT jsonExtractScalar(json, '$.v', resultsType) FROM testTable` against a single-row table + /// containing the given JSON document, and asserts the result for the (always-duplicated) two + /// expected rows. Review Comment: The test introduces `///` comments, which aren’t standard JavaDoc and may not match the project’s comment/style conventions for Java tests. Consider switching these to `//` (or `/** ... */` if you want JavaDoc-style formatting on helpers) to avoid style/lint issues and keep consistency across the test suite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
