[ 
https://issues.apache.org/jira/browse/FLINK-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-19774:
---------------------------------
    Fix Version/s:     (was: 1.14.0)
                   1.15.0

> Introduce Sub Partition View Version for Approximate Local Recovery
> -------------------------------------------------------------------
>
>                 Key: FLINK-19774
>                 URL: https://issues.apache.org/jira/browse/FLINK-19774
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Task
>            Reporter: Yuan Mei
>            Priority: Major
>              Labels: auto-unassigned
>             Fix For: 1.15.0
>
>
>  
> This ticket is to solve a corner case where a downstream task continuously 
> fails multiple times, or an orphan task execution may exist for a short 
> period of time after new execution is running (as described in the FLIP)
>  
> Here is an idea of how to cleanly and thoroughly solve this kind of problem:
>  # We go with the simplified release view version: only release view before a 
> new creation (in thread2). That says we won't clean up view when downstream 
> task disconnects ({{releaseView}} would not be called from the reference copy 
> of view) (in thread1 or 2).
>  * 
>  ** This would greatly simplify the threading model
>  ** This won't cause any resource leak, since view release is only to notify 
> the upstream result partition to releaseOnConsumption when all subpartitions 
> are consumed in PipelinedSubPartitionView. In our case, we do not release the 
> result partition on consumption any way (the result partition is put in track 
> in JobMaster, similar to the ResultParition.blocking Type).
>       2. Each view is associated with a downstream task execution version
>  * 
>  ** This is making sense because we actually have different versions of view 
> now, corresponding to the vertex.version of the downstream task.
>  ** createView is performed only if the new version to create is greater than 
> the existing one
>  ** If we decide to create a new view, the old view should be released.
> I think this way, we can completely disconnect the old view with the 
> subpartition. Besides that, the working handler in use would always hold the 
> freshest view reference.
>  
> Point 1 has already been addressed in FLINK-19632. This ticket is to address 
> Point 2.
> Details discussion in [https://github.com/apache/flink/pull/13648]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to