deepthi912 commented on PR #18579:
URL: https://github.com/apache/pinot/pull/18579#issuecomment-4540507093

   Confirmed on both points. Tracing concretely:
   
   ### How `_predicateEvaluator` is created
   
   `FilterPlanNode.java:308-330` builds the dict-based evaluator before 
`getLeafFilterOperator` is called:
   
   ```java
   case REGEXP_LIKE:
     boolean caseInsensitive = regexpLikePredicate.isCaseInsensitive();
     if (caseInsensitive) {
       if (dataSource.getIFSTIndex() != null) {
         predicateEvaluator = 
IFSTBasedRegexpPredicateEvaluatorFactory.newIFSTBasedEvaluator(
             regexpLikePredicate, dataSource.getIFSTIndex(), 
dataSource.getDictionary());
       } else {
         predicateEvaluator = 
PredicateEvaluatorProvider.getPredicateEvaluator(predicate, dataSource, 
_queryContext);
       }
     } else {
       if (dataSource.getFSTIndex() != null) {
         predicateEvaluator = 
FSTBasedRegexpPredicateEvaluatorFactory.newFSTBasedEvaluator(...);
       } else {
         predicateEvaluator = 
PredicateEvaluatorProvider.getPredicateEvaluator(predicate, dataSource, 
_queryContext);
       }
     }
     return FilterOperatorUtils.getLeafFilterOperator(_queryContext, 
predicateEvaluator, dataSource, numDocs);
   ```
   
   For the crash repro (case-insensitive + IFST), it produces an 
`IFSTBasedRegexpPredicateEvaluator`. The class chain is:
   
   ```
   IFSTBasedRegexpPredicateEvaluator
     extends BaseDictIdBasedRegexpLikePredicateEvaluator
     extends BaseDictionaryBasedPredicateEvaluator
   ```
   
   So the `_predicateEvaluator instanceof 
BaseDictionaryBasedPredicateEvaluator` guard in 
`SVScanDocIdIterator.getValueMatcher()` matches.
   
   ### What `_predicateEvaluator.applySV(dictId)` does
   
   From `IFSTBasedRegexpPredicateEvaluatorFactory.java:48-72`:
   
   ```java
   private static class IFSTBasedRegexpPredicateEvaluator extends 
BaseDictIdBasedRegexpLikePredicateEvaluator {
     final ImmutableRoaringBitmap _matchingDictIdBitmap;
   
     public IFSTBasedRegexpPredicateEvaluator(RegexpLikePredicate 
regexpLikePredicate,
         TextIndexReader ifstIndexReader, Dictionary dictionary) {
       super(regexpLikePredicate, dictionary);
       String searchQuery = 
RegexpPatternConverterUtils.regexpLikeToLuceneRegExp(regexpLikePredicate.getValue());
       _matchingDictIdBitmap = ifstIndexReader.getDictIds(searchQuery);   // ← 
IFST precomputes matching dict ids
       ...
     }
   
     @Override
     public boolean applySV(int dictId) {
       return _matchingDictIdBitmap.contains(dictId);   // ← O(1) bitmap 
membership check
     }
   }
   ```
   
   So `applySV(int dictId)` is exactly the dict-id membership check against the 
IFST-precomputed bitmap. The IFST consulted the dictionary **once** at 
evaluator construction to enumerate matching dict ids; the matcher then does a 
Roaring bitmap contains per row.
   
   ### Dictionary identity
   
   The dictionary instance is the same object in both places:
   - Evaluator construction: `dataSource.getDictionary()` passed to the factory 
in `FilterPlanNode`.
   - `DictLookupMatcher` runtime: `_dictionary = dataSource.getDictionary()` 
captured in `SVScanDocIdIterator`'s constructor from the same `DataSource`.
   
   So dict ids are consistent: a raw value `"abc"` translates to the same dict 
id the IFST already bucketed at construction time. No risk of dict-id 
divergence.
   
   ### Per-row flow with the patch
   
   ```
   1. forwardIndex.getString(docId)             → "abc"               (RAW read)
   2. dictionary.indexOf("abc")                 → 17                  (separate 
dict lookup)
   3. _predicateEvaluator.applySV(17)
        = IFSTBasedRegexpPredicateEvaluator.applySV(17)
        = _matchingDictIdBitmap.contains(17)    → true / false        
(IFST-precomputed bitmap)
   ```
   
   Step 3 is the dict-based evaluator, exclusively going through `applySV(int 
dictId)`. `applySV(String)` is never called.
   
   Same flow for `FSTBasedRegexpPredicateEvaluator` (case-sensitive FST), 
`DictIdBasedRegexpLikePredicateEvaluator` (small-dict regex), and 
`ScanBasedRegexpLikePredicateEvaluator` (large-dict regex) — all extend 
`BaseDictionaryBasedPredicateEvaluator` and implement `applySV(int)`. The 
matcher serves all of them.
   
   The unit test in this PR (`SVScanDocIdIteratorTest`) verifies this contract 
concretely: the stand-in evaluator inherits the default `applySV(String)` from 
`BaseDictionaryBasedPredicateEvaluator`, which throws. If the matcher routing 
ever regresses to `StringMatcher`, the test fails loudly with 
`UnsupportedOperationException` — the exact production symptom.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to