BiteTheDDDDt opened a new pull request, #62043:
URL: https://github.com/apache/doris/pull/62043

   This pull request fixes a bug in the `window_funnel_v2` aggregate function's 
DEDUPLICATION mode, where chains were incorrectly broken when a single row 
matched multiple events (multi-match rows). The update ensures that only true 
duplicates from different rows break the chain, aligning the behavior with the 
previous version (V1). The changes also include comprehensive regression tests 
to verify the correct handling of both multi-match rows and true duplicates.
   
   **Bug Fixes in Deduplication Logic:**
   
   * Updated the deduplication logic in `WindowFunnelStateV2` 
(`aggregate_function_window_funnel_v2.h`) to skip breaking the chain when a 
"duplicate" event is actually from the same row as an event already in the 
chain, preventing premature chain termination on multi-match rows. 
[[1]](diffhunk://#diff-1a1c09dde1a5d97a9723ffebb33ddb27344131ac2031c986ce6866bf248c5971L426-R433)
 
[[2]](diffhunk://#diff-1a1c09dde1a5d97a9723ffebb33ddb27344131ac2031c986ce6866bf248c5971L437-R448)
   * Added a new helper method `_is_same_row_as_chain` to check if an event is 
from the same row as any event in the current chain, used to distinguish true 
duplicates from multi-match rows.
   
   **Testing Improvements:**
   
   * Added two new unit tests in `vec_window_funnel_v2_test.cpp`:
     * `testDeduplicationSameRowMultiEvent` verifies that multi-match rows do 
not break the chain in DEDUPLICATION mode.
     * `testDeduplicationTrueDuplicateStillBreaks` ensures that a true 
duplicate on a different row still breaks the chain as expected.
   
   **Regression Test Suite Updates:**
   
   * Added regression tests in `window_funnel_v2.groovy` and updated expected 
outputs in `window_funnel_v2.out` to cover the fixed scenarios for 
DEDUPLICATION mode, ensuring both multi-match and true duplicate behaviors are 
validated. 
[[1]](diffhunk://#diff-4c2f6bf42109868e75fcf63937b90874666325cd9bfb291c30637f34c9b11575R489-R549)
 
[[2]](diffhunk://#diff-072918ec4eec0fc9fd2fc66dec06a9ebd56da09d086a690c89a1debedce110f7R89-R94)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to