Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/3801
  
    I am sorry, but before merging I noticed that some tests (e.g. 
`RocksDBStateBackendTest.testCancelRunningSnapshot`) fail sporadically (only on 
Travis). I tracked the problem and I think the cause is a lack of eagerly 
closing the streams in `cancel()` to interrupt blocking IO calls.
    
    I suggest the following fix:
    
    `RocksDBIncrementalSnapshotOperation` should have it’s own 
`CloseableRegistry`. This tracks all the open streams inside the checkpointing 
and is registered with the backends registry for as long as the task runs. 
Then, in cancel, as a first step we can close and unregister that inner 
`CloseableRegistry`. This also prevents races that the current stream gets 
closed asynchronously by `cancel()`, which the checkpointing actually already 
opened the next stream (the registry closes and blocks new streams on 
registration once it is closed)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to