tarun11Mavani opened a new pull request, #18263:
URL: https://github.com/apache/pinot/pull/18263
### Summary
This is **PR 2 of 3** introducing the columnar MAP query layer on top of the
storage layer landed in #17896. This PR makes COLUMNAR_MAP columns actually
queryable — adding the per-key DataSource, mutable consuming-segment index,
filter-operator delegation strategies, and null-aware item() evaluation.
### Stack
- **PR 1 (#17896):** Storage layer — SPI, index format, segment creation,
immutable read path
- **PR 2 (this):** Query layer — DataSource, mutable segments, filter
operators, item() null-handling
- **PR 3:** Comprehensive `ColumnarMapIndexTest` (46 unit tests covering
both storage and query paths, including regression tests for the CRITICAL fixes
in this PR — see "Notes for reviewers" below)
### Motivation
PR 1 added the binary format and immutable read path for COLUMNAR_MAP, but
no query path was wired up — `MapFilterOperator` would fall through to the slow
`ExpressionFilterOperator` (per-doc `Map<String, Object>` materialization),
`ItemTransformFunction` had no per-key null bitmap, and consuming (real-time)
segments would fail at segment creation because `createMutableIndex` threw
`UnsupportedOperationException`. This PR closes those gaps and delivers the
actual query speedups the columnar storage was designed to enable.
### What's in this PR
#### Query data source (`pinot-segment-local`)
`ColumnarMapDataSource` implements `MapDataSource` and routes
`getKeyDataSource(key)` to one of four paths:
- **Dense immutable** — wraps `ColumnarMapKeyForwardIndexReader` + per-key
dictionary + per-key inverted index
- **Sparse immutable** — wraps a per-key reader backed by the JSON sidecar
- **Mutable** — wraps the per-key `FixedByteSVMutableForwardIndex` directly
(O(1) lock-free)
- **Unknown key** — returns `NullDataSource(key)` to match
`BaseMapDataSource` (callers like `ProjectionBlock` and `ItemTransformFunction`
never see null)
Supporting classes: `ColumnarMapForwardIndexReader` (column-level wrapper),
`ColumnarMapRealtimeInvertedIndex` (mutable per-dictId inverted index,
thread-safe via `ThreadSafeMutableRoaringBitmap`),
`MutableColumnarMapIndexImpl` (consuming-segment columnar map implementing
`ColumnarMapIndexReader`).
#### Filter operator delegation (`pinot-core`)
`MapFilterOperator` selects between four strategies:
1. JSON index match (existing) — when a JsonIndexReader exists on the column
2. **Per-key inverted index (new)** — when a `MapDataSource` exposes a
per-key inverted index for the requested key
3. **Presence-bitmap fast path (new)** — for `IS NULL` / `IS NOT NULL` on
COLUMNAR_MAP, via `BitmapBasedFilterOperator` over the per-key presence bitmap
4. Expression filter (existing fallback)
`explainAttributes` now emits `delegateTo:per_key_inverted_index` and
`delegateTo:presence_bitmap` (was previously misreported as
`expression_filter`).
#### `ItemTransformFunction` null-aware reads (`pinot-core`)
Captures the per-key null bitmap from `keyDS.getNullValueVector()` and
projects it to block-local docIds in `getNullBitmap`. Gated on the query's
`nullHandlingEnabled` flag; short-circuits when the segment-level null bitmap
doesn't intersect the block's docId range. Adds `getKeyPath()` for direct key
resolution by `TransformBlock`/`ProjectionBlock`.
#### Storage format addition
`OnHeapColumnarMapIndexCreator` now writes a per-sparse-key presence bitmap
(run-optimized) into the SPMX layout, reusing the dense-tier
`nullBitmapOffset/Len` slot per `tierFlag`. Without this, IS NULL / IS NOT NULL
on sparse keys returned wrong results. Format version unchanged (still SPMX
v3); legacy v3 segments without the new bytes (`nullBitmapLen == 0`) preserve
the prior empty-bitmap behavior.
#### Wire-up + SPI hardening
- `ColumnarMapIndexType.createMutableIndex` returns `new
MutableColumnarMapIndexImpl(...)` instead of throwing
- `ColumnarMapIndexReader.getKeyDataSource(String)` throws
`UnsupportedOperationException` on both reader implementations — callers must
go through `ColumnarMapDataSource` so the unknown-key fall-back stays consistent
- Sparse-tier reader now throws `RuntimeException` with
`column`/`key`/`docId`/`value` context on JSON parse and numeric parse failures
(was previously silent)
### How to use
Schema and table-config setup is unchanged from PR 1 (#17896). Once both PRs
are landed, queries against COLUMNAR_MAP columns work transparently:
```sql
-- EQ filter on a dense key — uses per-key inverted index
SELECT count(*) FROM events WHERE metrics['country'] = 'US'
-- IS NOT NULL on a sparse key — uses presence bitmap (new)
SELECT count(*) FROM events WHERE metrics['rare_flag'] IS NOT NULL
-- Projection of multiple keys
SELECT user_id, metrics['clicks'], metrics['spend'] FROM events WHERE
metrics['country'] = 'US'
-- GROUP BY on a MAP key
SELECT metrics['country'], count(*) FROM events GROUP BY metrics['country']
ORDER BY count(*) DESC
```
EXPLAIN PLAN VERBOSE will show `delegateTo:per_key_inverted_index` for
fast-path EQ predicates and `delegateTo:presence_bitmap` for IS NULL / IS NOT
NULL on sparse keys.
### Test plan
- `ColumnarMapIndexEndToEndTest` — 4 tests (mutable→immutable round-trip,
undeclared-key default type, user-metrics scenario, cross-segment merge)
- `ColumnarMapFilterOperatorTest` — 7 tests (IS_NULL / IS_NOT_NULL / NOT_EQ
/ NOT_IN against absent and partially-present keys)
- `ColumnarMapSegmentCreationTest` — 1 test (segment build pipeline)
- `ColumnarMapBenchmarkTest` — disabled (`@Test(enabled = false)`); manual
perf benchmark, run with `-DenableBenchmark=true` and the annotation flipped
- All 12 in-PR tests pass
- 46 additional tests in follow up test PR cover storage + query +
concurrency in depth
- Deployed this code in a test cluster to collect the query performance and
storage improvements.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]