Jackie-Jiang commented on code in PR #18429: URL: https://github.com/apache/pinot/pull/18429#discussion_r3191917091
########## pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java: ########## @@ -38,24 +39,41 @@ import org.apache.pinot.core.util.NumberUtils; import org.apache.pinot.core.util.NumericException; import org.apache.pinot.spi.data.FieldSpec.DataType; +import org.apache.pinot.spi.utils.BooleanUtils; import org.apache.pinot.spi.utils.JsonUtils; +import org.apache.pinot.spi.utils.TimestampUtils; import org.roaringbitmap.RoaringBitmap; -/** - * The <code>JsonExtractScalarTransformFunction</code> class implements the json path transformation based on - * <a href="https://goessner.net/articles/JsonPath/">Stefan Goessner JsonPath implementation.</a>. - * - * Please note, currently this method only works with String field. The values in this field should be Json String. - * - * Usage: - * jsonExtractScalar(jsonFieldName, 'jsonPath', 'resultsType') - * <code>jsonFieldName</code> is the Json String field/expression. - * <code>jsonPath</code> is a JsonPath expression which used to read from JSON document - * <code>results_type</code> refers to the results data type, could be INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING, - * INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY. - * - */ +/// Implements the `jsonExtractScalar(jsonField, jsonPath, resultsType[, defaultValue])` transform. +/// Reads a JSON document from `jsonField` for each row, resolves the +/// [Stefan Goessner JsonPath](https://goessner.net/articles/JsonPath/) expression against it, and +/// converts the resolved value to `resultsType`. +/// +/// **Arguments:** +/// - `jsonField` — single-value `STRING` or `BYTES` column / transform expression containing JSON. +/// - `jsonPath` — JsonPath expression used to read the value. +/// - `resultsType` — Pinot data type for the output. Append `_ARRAY` for multi-value results. +/// - `defaultValue` (optional) — used when the path resolves to `null` or fails. Without it, unresolved +/// SV rows throw `IllegalArgumentException`; MV rows surface as empty arrays, but null elements within +/// a resolved array still throw. +/// +/// **Supported `resultsType`:** `INT`, `LONG`, `FLOAT`, `DOUBLE`, `BIG_DECIMAL`, `BOOLEAN`, `TIMESTAMP`, +/// `STRING`, `JSON`, `BYTES`, plus `_ARRAY` variants of `INT` / `LONG` / `FLOAT` / `DOUBLE` / +/// `BIG_DECIMAL` / `STRING`. +/// +/// **Per-row coercion** of the JsonPath result to `resultsType`: +/// - `BOOLEAN` (stored as `INT`) follows Pinot's numeric convention — any non-zero `Number` is true; +/// `Boolean` and `"true"` / `"TRUE"` / `"1"` strings (via [BooleanUtils#toInt(String)]) are also true. +/// - `TIMESTAMP` (stored as `LONG`) accepts numeric epoch millis directly; strings go through +/// [TimestampUtils#toMillisSinceEpoch] (ISO-8601 and numeric millis strings). +/// - `STRING` returns `String` values as-is; other JSON values are serialized via +/// [JsonUtils#objectToString]. +/// - `BIG_DECIMAL` and `STRING` paths use a BigDecimal-preserving JSON parser +/// (`JSON_PARSER_CONTEXT_WITH_BIG_DECIMAL`) to avoid precision loss on numeric values; other paths use +/// the default parser. +/// - Other types coerce via `Number` cast (preserved as the canonical primitive form) or +/// `parse*(toString())`. Review Comment: `///` is JEP 467 Markdown Javadoc, fully supported by the `javadoc` tool. It's the project-preferred style for new Javadoc — renders correctly in IDEs and `javadoc` HTML output. Keeping as-is. ########## pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java: ########## @@ -437,17 +411,17 @@ public int[][] transformToIntValuesMV(ValueBlock valueBlock) { int numValues = result.size(); int[] values = new int[numValues]; for (int j = 0; j < numValues; j++) { - Integer value = result.get(j); - if (value == null) { + Object element = result.get(j); + if (element == null) { if (_defaultValue != null) { - value = ((Number) _defaultValue).intValue(); - } else { - throw new IllegalArgumentException( - "At least one of the resolved JSON arrays include nulls, which is not supported in Pinot. " - + "Consider setting a default value as the fourth argument of json_extract_scalar."); + values[j] = defaultValue; + continue; } + throw new IllegalArgumentException( + "At least one of the resolved JSON arrays include nulls, which is not supported in Pinot. " + + "Consider setting a default value as the fourth argument of json_extract_scalar."); Review Comment: The `json_extract_scalar` (snake_case) form is pre-existing from #16683 and appears identically in 6 messages — not introduced by this PR. This PR is scoped to type-coercion correctness; the cosmetic naming inconsistency is better handled as a separate one-liner touching all 6 messages at once. ########## pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java: ########## @@ -577,23 +607,87 @@ public String[][] transformToStringValuesMV(ValueBlock valueBlock) { int numValues = result.size(); String[] values = new String[numValues]; for (int j = 0; j < numValues; j++) { - String value = result.get(j); - if (value == null) { + Object element = result.get(j); + if (element == null) { if (_defaultValue != null) { - value = _defaultValue.toString(); - } else { - throw new IllegalArgumentException( - "At least one of the resolved JSON arrays include nulls, which is not supported in Pinot. " - + "Consider setting a default value as the fourth argument of json_extract_scalar."); + values[j] = defaultValue; + continue; } + throw new IllegalArgumentException( + "At least one of the resolved JSON arrays include nulls, which is not supported in Pinot. " + + "Consider setting a default value as the fourth argument of json_extract_scalar."); } - values[j] = value; + values[j] = toString(element); } _stringValuesMV[i] = values; } return _stringValuesMV; } + private static int toInt(Object value, boolean isBoolean) { + if (isBoolean) { + if (value instanceof Boolean) { + return (Boolean) value ? 1 : 0; + } + // For BOOLEAN result, follow PinotDataType numeric convention: non-zero number → true. + if (value instanceof Number) { + return ((Number) value).doubleValue() != 0 ? 1 : 0; + } + // String fallback: BooleanUtils.toInt accepts "true" / "TRUE" / "1". + return BooleanUtils.toInt(value.toString()); + } + if (value instanceof Number) { + return ((Number) value).intValue(); + } + return Integer.parseInt(value.toString()); + } + + private static long toLong(Object value, boolean isTimestamp) { + if (value instanceof Number) { + return ((Number) value).longValue(); + } + if (isTimestamp) { + return TimestampUtils.toMillisSinceEpoch(value.toString()); + } + try { + return NumberUtils.parseJsonLong(value.toString()); + } catch (NumericException nfe) { + throw new NumberFormatException("For input string: \"" + value + "\""); + } + } + + private static float toFloat(Object value) { + if (value instanceof Number) { + return ((Number) value).floatValue(); + } + return Float.parseFloat(value.toString()); + } + + private static double toDouble(Object value) { + if (value instanceof Number) { + return ((Number) value).doubleValue(); + } + return Double.parseDouble(value.toString()); + } + + private static BigDecimal toBigDecimal(Object value) { + if (value instanceof BigDecimal) { + return (BigDecimal) value; + } + return new BigDecimal(value.toString()); + } + + private static String toString(Object value) { + if (value instanceof String) { + return (String) value; + } + try { + return JsonUtils.objectToString(value); + } catch (JsonProcessingException e) { + throw new RuntimeException("Caught exception while serializing JSON value: " + value, e); + } Review Comment: `value` here is the JsonPath result that Jackson failed to serialize — it's the only diagnostic signal a user has when serialization fails (e.g., circular reference, non-JSONable type, custom class). Truncating it would hide the actual problem; the original `JsonProcessingException` is already chained for the stack. Other JSON-serialization paths in Pinot don't truncate values in error messages either. ########## pinot-core/src/test/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunctionTest.java: ########## @@ -653,4 +653,177 @@ public void mvWithNullsWithDefault(String resultsType, String defaultValSql) { // TODO: Change the framework to do not duplicate segments when only one segment is used .thenResultIs(expectedRow, expectedRow); // 2 rows because of segment duplication } + + // === Per-stored-type coercion tests for the value-handling helpers === + + /// Runs `SELECT jsonExtractScalar(json, '$.v', resultsType) FROM testTable` against a single-row table + /// containing the given JSON document, and asserts the result for the (always-duplicated) two + /// expected rows. Review Comment: Same as the class-header thread — `///` is JEP 467 Markdown Javadoc, the project-preferred style. No tooling issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
