Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/3524
  
    Thanks for opening this pull request. Adding a `CompositeStateHandle` and a 
`StateRegistry` is a good idea.
    
    Some thoughts:
    
      - What do you think about making the `StateRegistry` into a 
`SharedStateRegistry` which only contains the handles to state that is shared 
across checkpoints? State that is exclusive to a checkpoint is not handled by 
that registry, but remains only in the checkpoint. That way we "isolate" the 
existing behavior against the coming changes and do not risk regressions in the 
state cleanup code (which is very critical for current users).
    
      - Another reason for the above suggestion is to also bring some other 
code into place that has some "fast paths" and "safety nets" for checkpoint 
cleanups (currently only with non-shared state), for example dropping a 
checkpoint simply by a `rm -r` (see https://github.com/apache/flink/pull/3522 
). We have seen that for various users the state cleanup problems are among the 
biggest problems they have, which we can address very well with the work 
started in the above linked pull request. These things would work together 
seamlessly if the registry deals only with shared state handles.
    
      - I am wondering if it is easier to put the registry into the checkpoint 
coordinator rather than the checkpoint stores. That way we need the code that 
deals with adding / failure handling / etc only once.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to