dyzcs commented on issue #5179: URL: https://github.com/apache/paimon/issues/5179#issuecomment-3832735971
@fantasy2100 I think the key insight is that **`sequence-group` acts as an ordering key, not a version filter**. Let me clarify the behavior: ### Core Mechanism 1. **Ordering, not filtering**: `g_1` is used to **sort** records (ascending) with the same primary key, not to filter out "older" versions. 2. **NULL handling**: Only when `g_1` is NULL, the record is discarded. Non-NULL values are always kept for aggregation. 3. **Aggregation after sorting**: All valid records are aggregated in the order of `g_1`. ### Why the result is `6` instead of `3` For your test case: ```sql INSERT (1, 1, 1, 1); -- g_1=1 INSERT (1, 2, 2, 2); -- g_1=2 INSERT (1, 3, 3, 1); -- g_1=1 ``` Internal execution: Sort by g_1: (1,1,1,1) → (1,3,3,1) → (1,2,2,2) Aggregate sequentially: **sum**: 0→1→4→6 (order doesn't matter due to commutativity, but all records participate) **last_non_null_value**: null→1→3→2 (takes the last value after sorting) Implications sum/product/max/min: Results are order-independent, but all records contribute. last_non_null_value/first_non_null_value/listagg: Results are order-dependent, determined by the sort key. This is by design for eventual consistency with out-of-order streams. Perhaps the documentation should clarify that sequence-group defines aggregation order rather than version precedence. cc @JingsongLi Would it be helpful to add a section in the documentation explaining that `sequence-group` controls the aggregation order, not record filtering (except for NULLs)? This behavior differs from traditional MVCC version checks and might confuse users coming from HBase/Doris/other olap' partial update implementations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
