airborne12 opened a new pull request, #61200:
URL: https://github.com/apache/doris/pull/61200

   ### What problem does this PR solve?
   
   Issue Number: close #DORIS-24681
   
   Problem Summary:
   
   `search('NOT msg:omega')` incorrectly includes NULL rows in the result set, 
while `NOT search('msg:omega')` correctly excludes them.
   
   **Root cause:** `OccurBooleanWeight::complex_scorer()` used `ExcludeScorer` 
for MUST_NOT clause handling. `ExcludeScorer` does not implement 
`has_null_bitmap()` / `get_null_bitmap()`, inheriting the `Scorer` base class 
defaults that always return `false` / `nullptr`. This caused NULL documents to 
be treated as TRUE (matching) rather than NULL, violating SQL three-valued 
logic where `NOT(NULL) = NULL`.
   
   **Fix:** Replace `ExcludeScorer` with `AndNotScorer` which correctly 
implements three-valued logic:
   - `NOT(TRUE) = FALSE` → excluded from results
   - `NOT(FALSE) = TRUE` → included in results  
   - `NOT(NULL) = NULL` → placed in null bitmap, filtered by `mask_out_null()`
   
   Also plumb `binding_keys` from `function_search.cpp` through the query 
builder chain (`OccurBooleanQueryBuilder` → `OccurBooleanQuery` → 
`OccurBooleanWeight`) so that `per_occur_scorers()` can resolve the correct 
logical field for each sub-weight, enabling proper null bitmap fetching from 
the `NullBitmapResolver`.
   
   ### Release note
   
   Fix search('NOT field:value') incorrectly including NULL rows by using 
null-bitmap-aware AndNotScorer instead of ExcludeScorer.
   
   ### Check List (For Author)
   
   - Test
       - [x] Regression test
       - [ ] Unit Test
       - [x] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason
   
   **Manual test steps:**
   ```sql
   CREATE DATABASE IF NOT EXISTS test_search_not;
   USE test_search_not;
   
   CREATE TABLE test_null_handling (
       id INT,
       msg TEXT,
       INDEX idx_msg(msg) USING INVERTED PROPERTIES("parser"="unicode")
   ) ENGINE=OLAP
   DUPLICATE KEY(id)
   DISTRIBUTED BY HASH(id) BUCKETS 1
   PROPERTIES ("replication_num" = "1");
   
   INSERT INTO test_null_handling VALUES (1, NULL), (3, 'hello world'), (4, 
'alpha beta');
   
   -- Before fix: returns id=1,3,4 (WRONG - includes NULL row)
   -- After fix: returns id=3,4 (CORRECT - excludes NULL row)
   SELECT * FROM test_null_handling WHERE search('NOT msg:omega');
   
   -- This always worked correctly (SQL-layer NOT):
   SELECT * FROM test_null_handling WHERE NOT search('msg:omega');
   ```
   
   - Behavior changed:
       - [x] Yes. `search('NOT field:value')` now correctly excludes rows where 
the field is NULL, matching the behavior of `NOT search('field:value')`.
   
   - Does this need documentation?
       - [x] No.
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to