arunkumarucet opened a new pull request, #18891:
URL: https://github.com/apache/pinot/pull/18891

   ## Summary
   
   Profiling a JSON analytics workload — `JSON_EXTRACT_INDEX(col, '\$.path', 
...)` group-by and `JSON_MATCH(col, ...)` filters over a JSON-indexed column — 
surfaced three hot spots in `ImmutableJsonIndexReader`, all in the per-query 
value / doc-id materialization. This PR optimizes them with no behavior change.
   
   ### Changes
   
   1. **`getValuesSV`** (~16% of query CPU): previously, for every value-block 
it allocated a `RoaringBitmap` mask via `bitmapOf`, ran a `RoaringBitmap.and` 
per distinct value, and populated an `Int2ObjectOpenHashMap` (find/get/insert). 
The non-flattened path — the only path used by `jsonExtractIndex` SV — now 
scatters values with a bounded `PeekableIntIterator`:
      - **Dense, gap-free block** (the common full-scan case): the result 
position is the doc-id offset, so values are written directly with no map, no 
mask, and no per-value `and`.
      - **Sparse block** (e.g. after a selective filter): a primitive 
`Int2IntOpenHashMap` maps each doc id to its position once, then each value's 
posting list is range-scanned within `[lo, hi]`.
   
   2. **`convertFlattenedDocIdsToDocIds`** and the `JSON_MATCH` 
**`getMatchingDocIds`** path both built result bitmaps with a per-element 
`bitmap.add(getDocId(f))`, causing `RoaringArray.setContainerAtIndex` churn 
(~10% of query CPU). The flattened → real doc-id mapping is monotonically 
non-decreasing (the index flattens documents in doc-id order), so the mapped 
doc ids are produced sorted and are now appended through an ordered 
`RoaringBitmapWriter`, avoiding the per-element binary search and container 
reallocation.
   
   ### Results
   
   On a 1M-row segment a `JSON_EXTRACT_INDEX` group-by query drops from ~34ms 
to ~16ms (~2x), and `setContainerAtIndex` disappears from the profile. Output 
is identical.
   
   ### Testing
   
   `JsonIndexTest`, `JsonExtractIndexTransformFunctionTest`, and 
`JsonIndexDistinctOperatorQueriesTest` all pass (62 tests). `spotless:apply` 
and `checkstyle:check` clean.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to