Jackie-Jiang commented on PR #18428:
URL: https://github.com/apache/pinot/pull/18428#issuecomment-4383420609
Summarizing replies to Copilot's review:
## Re: `Class<?>` API removal (3 comments on `getSingleValueType` /
`getMultiValueType` / `getArgumentType`)
Deliberate. The old `Class<?>`-based API was the source of the bug this PR
fixes — exact-class matching against `Timestamp.class` silently miscategorized
vendor JDBC subclasses (e.g. BigQuery Simba's `TimestampTz`) as `OBJECT`, which
broke downstream conversion. A `Class<?>` shim that delegates to the new
value-based form would either need to materialize a sentinel instance per
`Class<?>` (impossible without per-class registration) or perpetuate the
original bug. Going through `value.getClass()` at the existing call sites is
the cleaner contract — every internal caller already had a value in hand and
was paying `.getClass()` boilerplate to feed the Class-based API. The four call
sites in OSS (`FunctionInvoker`, `BaseDefaultColumnHandler`,
`DataTypeConversionFunctions`, `MapColumnPreIndexStatsCollector`,
`DataTypeTransformerUtils`) all pass values now, none lost functionality. Same
reasoning for removing `getDataType` — it was unused and conceptually broken
(mapped Java *arra
y* classes to the *element-type* `DataType`, losing the SV/MV distinction that
`FieldSpec` tracks separately).
## Re: `BOOLEAN_ARRAY` repurposing
Deliberate. Pre-PR, `BOOLEAN_ARRAY` used `boolean[]` storage while every
*other* `*_ARRAY` type used the boxed form (`INTEGER_ARRAY`/`LONG_ARRAY`/etc.);
the parallel `PRIMITIVE_INT_ARRAY`/`PRIMITIVE_LONG_ARRAY` already existed. This
PR splits the boolean form to match: `PRIMITIVE_BOOLEAN_ARRAY` carries
`boolean[]` (parallel to `PRIMITIVE_INT_ARRAY`), `BOOLEAN_ARRAY` carries
`Boolean[]` (parallel to `INTEGER_ARRAY`). The naming is now uniform across all
numeric types. Pinot's actual ingestion path goes through
`DataTypeTransformer.transform` → `BOOLEAN_ARRAY.convert(value, sourceType)` →
`BOOLEAN_ARRAY.toInternal(...)`, and the new `convert` produces `Boolean[]`
matching the new `toInternal` cast — so the chain stays internally consistent.
Direct external use of `PinotDataType.BOOLEAN_ARRAY.toInternal(boolean[])` is
rare to nonexistent in the codebase.
## Re: empty/all-null typed reference arrays in `getArgumentType`
The collapse to `OBJECT_ARRAY` for empty/all-null arrays is harmless. The
cited concern about `BaseDefaultColumnHandler.createDerivedColumnV1Indices`
doesn't materialize: `outputValueType` is only used to pick which `to*Array`
method to call, and every `to*Array` default impl starts with an `instanceof`
short-circuit on the *destination* boxed-array type:
```java
public Integer[] toIntegerArray(Object value) {
if (value instanceof Integer[]) return (Integer[]) value;
...
}
```
So if a later row produces a typed `Integer[]`,
`OBJECT_ARRAY.toIntegerArray(Integer[])` (the inherited default) returns it
as-is regardless of the cached `outputValueType`. Mixed-type arrays are handled
by the per-element `catch CCE → anyToInt(element)` fallback inside the loop.
The dispatch label and the conversion result are decoupled here — the label
only steers method selection.
## Re: `///` Javadoc on `toInternal`
`///` is JEP 467 Markdown Javadoc — proper Javadoc, not a line comment,
since Java 23. Pinot already uses this convention widely. Tooling (javadoc,
IDEs, IntelliJ) renders it.
## Re: `null` fallback test coverage (3 comments)
The API contract is non-null after this PR (note: `@Nullable` was dropped on
the value parameter). The fact that `instanceof null` returns false and the
chain falls through to `OBJECT`/`OBJECT_ARRAY` is an incidental implementation
detail, not a documented contract. Adding `getSingleValueType(null)` /
`getMultiValueType(null)` / `getArgumentType(null)` assertions would lock down
behavior we don't intend to guarantee — callers with a `null` value should
pre-check, not rely on `OBJECT` fallthrough.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]