tarun11Mavani opened a new pull request, #18263:
URL: https://github.com/apache/pinot/pull/18263

   ### Summary
    
   This is **PR 2 of 3** introducing the columnar MAP query layer on top of the 
storage layer landed in #17896. This PR makes COLUMNAR_MAP columns actually 
queryable — adding the per-key DataSource, mutable consuming-segment index, 
filter-operator delegation strategies, and null-aware item() evaluation.
    
   ### Stack
    
   - **PR 1 (#17896):** Storage layer — SPI, index format, segment creation, 
immutable read path
   - **PR 2 (this):** Query layer — DataSource, mutable segments, filter 
operators, item() null-handling
   - **PR 3:** Comprehensive `ColumnarMapIndexTest` (46 unit tests covering 
both storage and query paths, including regression tests for the CRITICAL fixes 
in this PR — see "Notes for reviewers" below)
    
   ### Motivation
    
   PR 1 added the binary format and immutable read path for COLUMNAR_MAP, but 
no query path was wired up — `MapFilterOperator` would fall through to the slow 
`ExpressionFilterOperator` (per-doc `Map<String, Object>` materialization), 
`ItemTransformFunction` had no per-key null bitmap, and consuming (real-time) 
segments would fail at segment creation because `createMutableIndex` threw 
`UnsupportedOperationException`. This PR closes those gaps and delivers the 
actual query speedups the columnar storage was designed to enable.
    
   
   ### What's in this PR
    
   #### Query data source (`pinot-segment-local`)
    
   `ColumnarMapDataSource` implements `MapDataSource` and routes 
`getKeyDataSource(key)` to one of four paths:
   - **Dense immutable** — wraps `ColumnarMapKeyForwardIndexReader` + per-key 
dictionary + per-key inverted index
   - **Sparse immutable** — wraps a per-key reader backed by the JSON sidecar
   - **Mutable** — wraps the per-key `FixedByteSVMutableForwardIndex` directly 
(O(1) lock-free)
   - **Unknown key** — returns `NullDataSource(key)` to match 
`BaseMapDataSource` (callers like `ProjectionBlock` and `ItemTransformFunction` 
never see null)
    
   Supporting classes: `ColumnarMapForwardIndexReader` (column-level wrapper), 
`ColumnarMapRealtimeInvertedIndex` (mutable per-dictId inverted index, 
thread-safe via `ThreadSafeMutableRoaringBitmap`), 
`MutableColumnarMapIndexImpl` (consuming-segment columnar map implementing 
`ColumnarMapIndexReader`).
    
   #### Filter operator delegation (`pinot-core`)
    
   `MapFilterOperator` selects between four strategies:
   1. JSON index match (existing) — when a JsonIndexReader exists on the column
   2. **Per-key inverted index (new)** — when a `MapDataSource` exposes a 
per-key inverted index for the requested key
   3. **Presence-bitmap fast path (new)** — for `IS NULL` / `IS NOT NULL` on 
COLUMNAR_MAP, via `BitmapBasedFilterOperator` over the per-key presence bitmap
   4. Expression filter (existing fallback)
    
   `explainAttributes` now emits `delegateTo:per_key_inverted_index` and 
`delegateTo:presence_bitmap` (was previously misreported as 
`expression_filter`).
    
   #### `ItemTransformFunction` null-aware reads (`pinot-core`)
    
   Captures the per-key null bitmap from `keyDS.getNullValueVector()` and 
projects it to block-local docIds in `getNullBitmap`. Gated on the query's 
`nullHandlingEnabled` flag; short-circuits when the segment-level null bitmap 
doesn't intersect the block's docId range. Adds `getKeyPath()` for direct key 
resolution by `TransformBlock`/`ProjectionBlock`.
    
   #### Storage format addition
    
   `OnHeapColumnarMapIndexCreator` now writes a per-sparse-key presence bitmap 
(run-optimized) into the SPMX layout, reusing the dense-tier 
`nullBitmapOffset/Len` slot per `tierFlag`. Without this, IS NULL / IS NOT NULL 
on sparse keys returned wrong results. Format version unchanged (still SPMX 
v3); legacy v3 segments without the new bytes (`nullBitmapLen == 0`) preserve 
the prior empty-bitmap behavior.
    
   #### Wire-up + SPI hardening
    
   - `ColumnarMapIndexType.createMutableIndex` returns `new 
MutableColumnarMapIndexImpl(...)` instead of throwing
   - `ColumnarMapIndexReader.getKeyDataSource(String)` throws 
`UnsupportedOperationException` on both reader implementations — callers must 
go through `ColumnarMapDataSource` so the unknown-key fall-back stays consistent
   - Sparse-tier reader now throws `RuntimeException` with 
`column`/`key`/`docId`/`value` context on JSON parse and numeric parse failures 
(was previously silent)
    
   
    
   ### How to use
    
   Schema and table-config setup is unchanged from PR 1 (#17896). Once both PRs 
are landed, queries against COLUMNAR_MAP columns work transparently:
    
   ```sql
   -- EQ filter on a dense key — uses per-key inverted index
   SELECT count(*) FROM events WHERE metrics['country'] = 'US'
    
   -- IS NOT NULL on a sparse key — uses presence bitmap (new)
   SELECT count(*) FROM events WHERE metrics['rare_flag'] IS NOT NULL
    
   -- Projection of multiple keys
   SELECT user_id, metrics['clicks'], metrics['spend'] FROM events WHERE 
metrics['country'] = 'US'
    
   -- GROUP BY on a MAP key
   SELECT metrics['country'], count(*) FROM events GROUP BY metrics['country'] 
ORDER BY count(*) DESC
   ```
    
   EXPLAIN PLAN VERBOSE will show `delegateTo:per_key_inverted_index` for 
fast-path EQ predicates and `delegateTo:presence_bitmap` for IS NULL / IS NOT 
NULL on sparse keys.
     
   ### Test plan
    
   - `ColumnarMapIndexEndToEndTest` — 4 tests (mutable→immutable round-trip, 
undeclared-key default type, user-metrics scenario, cross-segment merge)
   - `ColumnarMapFilterOperatorTest` — 7 tests (IS_NULL / IS_NOT_NULL / NOT_EQ 
/ NOT_IN against absent and partially-present keys)
   - `ColumnarMapSegmentCreationTest` — 1 test (segment build pipeline)
   - `ColumnarMapBenchmarkTest` — disabled (`@Test(enabled = false)`); manual 
perf benchmark, run with `-DenableBenchmark=true` and the annotation flipped
   - All 12 in-PR tests pass
   - 46 additional tests in follow up test PR cover storage + query + 
concurrency in depth
   - Deployed this code in a test cluster to collect the query performance and 
storage improvements. 
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to