[ 
https://issues.apache.org/jira/browse/FLINK-25432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476836#comment-17476836
 ] 

Matthias Pohl commented on FLINK-25432:
---------------------------------------

We're experiencing {{KubernetesHighAvailabilityRecoverFromSavepointITCase}} 
failing because of a dependency between shutting down the {{{}JobMaster{}}}'s 
leader election and cleaning up the HA resources for the finished job.

[KubernetesLeaderElectionDriver:229|https://github.com/apache/flink/blob/9c7e3007eea80d7f4ad602fc33d9f58b676a7722/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/highavailability/KubernetesLeaderElectionDriver.java#L229]
 fails fatally if a {{ConfigMap}} is deleted from some actor not having the 
right leadership ID (i.e. the Dispatcher).

This means that we have to wait for the leadership resources being freed before 
cleaning the HA data for the job, still. This is not necessary when considering 
the work done by FLINK-24038 which introduces a single leader election per 
JobManager.

> Implement cleanup strategy
> --------------------------
>
>                 Key: FLINK-25432
>                 URL: https://issues.apache.org/jira/browse/FLINK-25432
>             Project: Flink
>          Issue Type: Sub-task
>    Affects Versions: 1.15.0
>            Reporter: Matthias Pohl
>            Assignee: Matthias Pohl
>            Priority: Major
>
> We want to combine the job-specific cleanup of the different resources and 
> provide a common {{ResourceCleaner}} taking care of the actual cleanup of all 
> resources.
> This needs to be integrated into the {{Dispatcher}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to