Jackie-Jiang opened a new pull request, #18918:
URL: https://github.com/apache/pinot/pull/18918
## Summary
Cleans up the `ColumnReader` SPI (`pinot-spi`), used for column-major
segment building, by removing a
redundant second read API and simplifying type introspection.
`ColumnReader` previously exposed two overlapping ways to read a column:
- an iterator-style API — `hasNext()`, `next()`, `isNextNull()`,
`skipNext()`, `rewind()`, and the
`nextXxx()` / `nextXxxMV()` type-specific methods, and
- random access by document id — `getXxx(docId)` / `getValue(docId)`.
Every production consumer can be expressed with random access alone, so this
removes the iterator API
entirely. Both column-major consumers —
`SegmentColumnarIndexCreator.indexColumn(String, ColumnReader)`
and `ColumnarSegmentPreIndexStatsContainer.collectColumn(...)` — now read
via a
`for (docId = 0; docId < getTotalDocs(); docId++)` loop over
`getValue(docId)`, preserving the previous
null semantics.
It also replaces the boolean type-check methods (`isSingleValue()` +
`isInt()`…`isBytes()`) with a single
`@Nullable PinotDataType getValueType()` that encodes both value type and
cardinality (e.g. `INT` vs
`INT_ARRAY`), returning `null` when the column has no directly-readable
accessor (BOOLEAN / TIMESTAMP /
JSON — read those via `getValue(docId)`). A static
`ColumnReader.toValueType(DataType, boolean)` helper
centralizes the mapping for implementations backed by a `DataType`, and
`DataTypeColumnTransformer.isNoOp()`
collapses to `getValueType() == destType`.
All implementations (`PinotSegmentColumnReaderImpl`,
`DefaultValueColumnReader`, `ArrowColumnReader`,
`BatchedArrowColumnReader`) and their tests are migrated; the interface
members are reordered and the
Javadoc rewritten for clarity.
## Backward incompatibility
`ColumnReader` is a `pinot-spi` interface first shipped in 1.5.0. Removing
methods from it is
binary-incompatible for any external plugin that implements or calls
`ColumnReader`. It is a
recently-added, internal-facing SPI for the still-evolving column-major
segment build, so the expected
blast radius is nil, but the PR is labeled `backward-incompat` accordingly.
Release note: the `ColumnReader` SPI no longer exposes the
sequential/iterator read API
(`hasNext`/`next`/`nextXxx`/`nextXxxMV`/`rewind`/`skipNext`/`isNextNull`) or
the boolean type-check methods
(`isSingleValue`/`isInt`/…/`isBytes`). Read column values by document id via
`getValue(docId)` /
`getXxx(docId)`, and use `getValueType()` to obtain the directly-readable
value type and cardinality.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]