[PR] Fix UnsupportedOperationException in SVScanDocIdIterator for RAW forward index + separate dictionary [pinot]

via GitHub Mon, 25 May 2026 16:03:38 -0700


deepthi912 opened a new pull request, #18579:
URL: https://github.com/apache/pinot/pull/18579


   ## Summary
   
   Fix a crash where `regexp_like(col, pattern, 'i')` or `LIKE 'pattern'` 
(which converts to case-insensitive `REGEXP_LIKE` internally) throws 
`UnsupportedOperationException` on segments where the column has `encodingType: 
RAW` but a separate dictionary is built for secondary indexes (`inverted`, 
`fst`, `ifst`, `range`).
   
   ## Repro
   
   Table config (typical for iceberg / external-table integrations):
   ```json
   {
     "name": "col_string",
     "encodingType": "RAW",
     "indexes": {
       "forward": { "encodingType": "RAW" },
       "dictionary": {},
       "ifst": { "enabled": true }
     }
   }
   ```
   Query:
   ```sql
   SELECT * FROM t WHERE col_string LIKE 'abc%';
   -- or
   SELECT * FROM t WHERE regexp_like(col_string, 'abc', 'i');
   ```
   Stack trace:
   ```
   java.lang.UnsupportedOperationException
     at 
BaseDictionaryBasedPredicateEvaluator.applySV(BaseDictionaryBasedPredicateEvaluator.java:133)
     at 
SVScanDocIdIterator$StringMatcher.doesValueMatch(SVScanDocIdIterator.java:308)
     at ImmutableRoaringBitmap.flip(...)
     at AndFilterOperator.getTrues(...)
   ```
   
   ## Root cause
   
   For this layout:
   
   1. `FilterPlanNode` builds an `IFSTBasedRegexpPredicateEvaluator` (extends 
`BaseDictIdBasedRegexpLikePredicateEvaluator`) which only implements 
`applySV(int dictId)`.
   2. `FilterOperatorUtils.getLeafFilterOperator` checks for sorted / inverted 
index to route to a dict-consuming operator; with neither available it falls 
through to `ScanBasedFilterOperator`.
   3. `SVScanDocIdIterator.getValueMatcher()` picks `StringMatcher` based on 
`_reader.isDictionaryEncoded() == false` (forward index is RAW), ignoring the 
fact that a `Dictionary` is still present in the segment.
   4. `StringMatcher` calls `applySV(String)` on the dict-based evaluator — 
`BaseDictionaryBasedPredicateEvaluator.applySV(String)` is `final` and throws.
   
   ## Fix
   
   In `SVScanDocIdIterator.getValueMatcher()`, before falling back to typed raw 
matchers, route to a new `<Type>DictLookupMatcher` when:
   - `_reader.isDictionaryEncoded() == false` (forward index is RAW), **and**
   - `_dictionary != null` (a separate dictionary is built), **and**
   - `_predicateEvaluator instanceof BaseDictionaryBasedPredicateEvaluator` 
(the evaluator wants dict ids).
   
   Each new matcher reads the raw value from the forward index, looks up its 
dict id via `dictionary.indexOf(value)`, and calls `applySV(int dictId)`. One 
matcher per stored type (INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING, BYTES). 
`dictId < 0` means the value isn't in the dictionary, which is treated as "no 
match".
   
   `DataSource` is already passed to the main constructor; the test constructor 
gains an optional `@Nullable Dictionary` parameter (the existing 3-arg test 
constructor delegates with `null`).
   
   ## Test plan
   
   - [ ] Add a unit test exercising `REGEXP_LIKE` / `LIKE` against a string 
column with RAW forward index + dictionary + IFST (no inverted) — should match 
correctly instead of throwing.
   - [ ] Verify existing `SVScanDocIdIteratorTest` paths still pass 
(dict-encoded → `DictIdMatcher`, RAW without dictionary → typed raw matcher).
   - [ ] Smoke test integration with iceberg/external-table tables in StarTree 
Cloud.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Fix UnsupportedOperationException in SVScanDocIdIterator for RAW forward index + separate dictionary [pinot]

Reply via email to