airborne12 opened a new pull request, #60814:
URL: https://github.com/apache/doris/pull/60814

   ### What problem does this PR solve?
   
   Issue Number: close #DORIS-24545
   
   Problem Summary:
   
   In `search()` function's lucene mode, queries with mixed explicit and 
implicit operators produce different results from Elasticsearch. For example:
   
   - Query: `"Sumer" OR Ptolemaic\ dynasty Limonene` with `default_operator=AND`
   - ES result: 1 row
   - Doris result: 0 rows (before fix)
   
   **Root cause:** In Lucene's `QueryParserBase.addClause()`, only explicit 
`CONJ_AND`/`CONJ_OR` modify the preceding term's occur. Implicit conjunction 
(`CONJ_NONE`, i.e., space-separated terms without an explicit operator) only 
affects the **current** term via `default_operator`, without modifying the 
preceding term.
   
   The FE `SearchDslParser.hasExplicitAndBefore()` incorrectly returned `true` 
(based on `default_operator`) when no explicit AND token was found. This caused 
implicit conjunction to be treated identically to explicit AND, making it 
modify the preceding term's occur — diverging from Lucene/ES semantics.
   
   **Example of the bug:**
   
   For `a OR b c` with `default_operator=AND`:
   - Before fix: `SHOULD(a) MUST(b) MUST(c)` — wrong, implicit space before `c` 
incorrectly upgraded `b` from SHOULD to MUST
   - After fix: `SHOULD(a) SHOULD(b) MUST(c)` — correct, matches ES behavior. 
Only `c` gets MUST (from default_operator), `b` retains SHOULD (from the 
preceding OR)
   
   **Fix:** `hasExplicitAndBefore()` now returns `false` when no explicit AND 
token is found, regardless of `default_operator`. Only explicit AND tokens 
trigger the "introduced by AND" logic that modifies preceding terms.
   
   ### Release note
   
   Fix search() lucene mode producing incorrect results when queries mix 
explicit operators (OR/AND) with implicit conjunction (space-separated terms).
   
   ### Check List (For Author)
   
   - Test
       - [x] Regression test
       - [x] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason
   
   - Behavior changed:
       - [x] Yes. Implicit conjunction (space between terms) in lucene mode no 
longer modifies the preceding term's occur. Only explicit AND/OR operators 
modify preceding terms, matching Lucene/ES semantics.
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes.
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to