Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3585
  
    Hi @sunjincheng121, thanks for this PR.
    
    To be honest, I don't completely understand the implementation of the 
`RowsClauseBoundedOverProcessFunction`. 
    
    I thought about another design, that I would like to discuss (not 
considering process time which should be addressed in a separate 
ProcessFunction, IMO.):
    
    - we have three state objects: 1) the accumulator row, 2) a MapState[Long, 
List[Row]] for not processed data (`toProcess`), 3) a MapState[Long, List[Row]] 
for processed data which needs to be retracted (`toRetract`).
    - processElement() puts the element in the `toProcess` MapState with the 
original timestamp and registers a timer for `currentWatermark() + 1`. Hence, 
we only have a single timer which triggers when the next watermark is reached.
    - onTimer() is called for the next watermark. We get an iterator over the 
`toProcess` MapState. For RocksDB the iterator is sorted on the key. We 
sort-insert the records from the iterator into a `LinkedList` (since the 
iterator is sorted for RocksDB this will be simple append. For other state 
backends it will be more expensive but we can tolerate that, IMO).  We do the 
same for `toRetract` MapState. So we have two sorted lists for data to 
accumulate and to retract. We go over both sorted lists and accumulate and 
retract for each step using the accumulator state. Then we emit the new row and 
move the emitted row from the `toProcess` MapState to the `toRetract` MapState.
    
    This design has the benefit of using RocksDB to sort. Moreover, we could 
also put only those fields into the toRetract state that need to be retracted 
instead of the full row.
    
    What do you think about this approach?
    
    Thanks, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to