[PR] Simplify ColumnReader SPI: drop iterator API, replace type checks with getValueType() [pinot]

via GitHub Fri, 03 Jul 2026 11:42:09 -0700


Jackie-Jiang opened a new pull request, #18918:
URL: https://github.com/apache/pinot/pull/18918


   ## Summary
   
   Cleans up the `ColumnReader` SPI (`pinot-spi`), used for column-major 
segment building, by removing a
   redundant second read API and simplifying type introspection.
   
   `ColumnReader` previously exposed two overlapping ways to read a column:
   - an iterator-style API — `hasNext()`, `next()`, `isNextNull()`, 
`skipNext()`, `rewind()`, and the
     `nextXxx()` / `nextXxxMV()` type-specific methods, and
   - random access by document id — `getXxx(docId)` / `getValue(docId)`.
   
   Every production consumer can be expressed with random access alone, so this 
removes the iterator API
   entirely. Both column-major consumers — 
`SegmentColumnarIndexCreator.indexColumn(String, ColumnReader)`
   and `ColumnarSegmentPreIndexStatsContainer.collectColumn(...)` — now read 
via a
   `for (docId = 0; docId < getTotalDocs(); docId++)` loop over 
`getValue(docId)`, preserving the previous
   null semantics.
   
   It also replaces the boolean type-check methods (`isSingleValue()` + 
`isInt()`…`isBytes()`) with a single
   `@Nullable PinotDataType getValueType()` that encodes both value type and 
cardinality (e.g. `INT` vs
   `INT_ARRAY`), returning `null` when the column has no directly-readable 
accessor (BOOLEAN / TIMESTAMP /
   JSON — read those via `getValue(docId)`). A static 
`ColumnReader.toValueType(DataType, boolean)` helper
   centralizes the mapping for implementations backed by a `DataType`, and 
`DataTypeColumnTransformer.isNoOp()`
   collapses to `getValueType() == destType`.
   
   All implementations (`PinotSegmentColumnReaderImpl`, 
`DefaultValueColumnReader`, `ArrowColumnReader`,
   `BatchedArrowColumnReader`) and their tests are migrated; the interface 
members are reordered and the
   Javadoc rewritten for clarity.
   
   ## Backward incompatibility
   
   `ColumnReader` is a `pinot-spi` interface first shipped in 1.5.0. Removing 
methods from it is
   binary-incompatible for any external plugin that implements or calls 
`ColumnReader`. It is a
   recently-added, internal-facing SPI for the still-evolving column-major 
segment build, so the expected
   blast radius is nil, but the PR is labeled `backward-incompat` accordingly.
   
   Release note: the `ColumnReader` SPI no longer exposes the 
sequential/iterator read API
   (`hasNext`/`next`/`nextXxx`/`nextXxxMV`/`rewind`/`skipNext`/`isNextNull`) or 
the boolean type-check methods
   (`isSingleValue`/`isInt`/…/`isBytes`). Read column values by document id via 
`getValue(docId)` /
   `getXxx(docId)`, and use `getValueType()` to obtain the directly-readable 
value type and cardinality.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Simplify ColumnReader SPI: drop iterator API, replace type checks with getValueType() [pinot]

Reply via email to