tarun11Mavani commented on PR #15756:
URL: https://github.com/apache/pinot/pull/15756#issuecomment-2869182301
> Wow, how did I miss this.. By reading the code, this would skip docs
matched with bitmap and produce wrong result if there are multiple filters on
sorted column and some filters on column with inverted index. Can you try to
add a query to verify this?
Great suggestion @Jackie-Jiang .
You are right. This is not just a regression in query performance but also a
correctness issue. I added a sorted column and inverted index in
fineFoodReviews table and wrote a simple query to verify this before and after
the bugfix. I paused the ingestion and reloaded the segments in both cases to
make sure indices are created properly.
Query used:
```
select ProductId, Score, Text
from fineFoodReviews
Where (
REGEXP_LIKE(Text, 'name')
AND
(
ProductId = 'B000F9Z29K'
OR Score > 4
OR Score < 2
)
)
order by ProductId
limit 1000
```
Without the change in this PR (15756), the above query returns 7 docs as it
ignores the BitmapBasedDocIdIterator inside the OrDocIdSet
[#here](https://github.com/apache/pinot/blob/aecd31eafa830da5038ad0d5d8c62fda7b499e02/pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/OrDocIdSet.java#L108)
and only selects the doc based on the
SortedDocIdIterator.
Due to this bug, all records with `Score > 4 OR Score < 2` is selected but
record with `ProductId = 'B000F9Z29K'` is not.
<img width="1307" alt="image"
src="https://github.com/user-attachments/assets/1c0ebfc1-e7fd-411d-aac0-b266cc73d2b7"
/>
After adding the BitmapBasedDocIdIterator into bitmapBasedDocIdIterators as
part of this PR, the code path remains the same but now OrDocIdSet also adds
the matching doc ids into the selected docIds that are returned from OrDocIdSet.
<img width="1265" alt="image"
src="https://github.com/user-attachments/assets/a1f3c9f1-e7a8-4fb6-9ee7-ce56739b8591"
/>
<img width="1290" alt="image"
src="https://github.com/user-attachments/assets/3fb53030-e4a1-4096-bf9d-f83ad8c96882"
/>
This validates your point that,
> Essentially query will get wrong result when numSortedDocIdIterators +
numBitmapBasedDocIdIterators > 1, where with existing code
numBitmapBasedDocIdIterators is always 0. So we need multiple
sortedDocIdIterators to trigger the bug.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]