Sxnan commented on PR #138:
URL: https://github.com/apache/flink-agents/pull/138#issuecomment-3307846572

   I spotted a few bugs that need to be addressed:
   
   1. There is a fundamental problem that Python async execution cannot work 
with per-action state and checkpoint at the moment. I created an issue 
https://github.com/apache/flink-agents/issues/185. But it should not block this 
PR, as it is not introduced by this PR.
   2. The operator state used to store the `recoveryMarks` has to be handled 
carefully. We should take the `currentProcessingKeysOpState` as an example. 
During recovery, we should pass the unioned list of markers to the 
ActionStateStore, and let the action state store decide where to recover from 
given all the markers.
   3. We cannot simply clean up the state in notifyCheckpointComplete, since we 
don't know if there are states that are yet to be used to recover. Therefore, 
we can only clean the state up to the sequence number for each key. 
   
   For 2 and 3, it requires some advanced usage of the Flink state, so I 
prepare some code for your reference. Feel free to use the code. 
https://github.com/Sxnan/flink-agents/commits/per-stask-state-poc/
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to