dyzcs commented on issue #5179:
URL: https://github.com/apache/paimon/issues/5179#issuecomment-3832735971

   @fantasy2100 
   I think the key insight is that **`sequence-group` acts as an ordering key, 
not a version filter**. Let me clarify the behavior:
   
   ### Core Mechanism
   1. **Ordering, not filtering**: `g_1` is used to **sort** records 
(ascending) with the same primary key, not to filter out "older" versions.
   2. **NULL handling**: Only when `g_1` is NULL, the record is discarded. 
Non-NULL values are always kept for aggregation.
   3. **Aggregation after sorting**: All valid records are aggregated in the 
order of `g_1`.
   
   ### Why the result is `6` instead of `3`
   For your test case:
   ```sql
   INSERT (1, 1, 1, 1);  -- g_1=1
   INSERT (1, 2, 2, 2);  -- g_1=2  
   INSERT (1, 3, 3, 1);  -- g_1=1
   ```
   Internal execution:
   Sort by g_1: (1,1,1,1) → (1,3,3,1) → (1,2,2,2)
   
   Aggregate sequentially:
   **sum**: 0→1→4→6 (order doesn't matter due to commutativity, but all records 
participate)
   **last_non_null_value**: null→1→3→2 (takes the last value after sorting)
   
   Implications
   sum/product/max/min: Results are order-independent, but all records 
contribute.
   last_non_null_value/first_non_null_value/listagg: Results are 
order-dependent, determined by the sort key.
   This is by design for eventual consistency with out-of-order streams. 
Perhaps the documentation should clarify that sequence-group defines 
aggregation order rather than version precedence.
   
   cc @JingsongLi 
   Would it be helpful to add a section in the documentation explaining that 
`sequence-group` controls the aggregation order, not record filtering (except 
for NULLs)? This behavior differs from traditional MVCC version checks and 
might confuse users coming from HBase/Doris/other olap' partial update 
implementations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to