[jira] [Commented] (FLINK-34009) Apache flink: Checkpoint restoration issue on Application Mode of deployment

Vijay (Jira) Sun, 07 Jan 2024 21:31:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-34009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804118#comment-17804118
 ]


Vijay commented on FLINK-34009:
-------------------------------

As flink support multi-job execution on Application mode of deployment (with HA 
being disabled), we need more details of how to enable restoration process via 
checkpointing (when app / flink is upgraded). Please support us to overcome 
this issue. Thanks.

> Apache flink: Checkpoint restoration issue on Application Mode of deployment
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-34009
>                 URL: https://issues.apache.org/jira/browse/FLINK-34009
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.18.0
>         Environment: Flink version: 1.18
> Zookeeper version: 3.7.2
> Env: Custom flink docker image (with embedded application class) deployed 
> over kubernetes (v1.26.11).
>            Reporter: Vijay
>            Priority: Major
>
> Hi Team,
> Good Day. Wish you all a happy new year 2024.
> We are using Flink (1.18) version on our flink cluster. Job manager has been 
> deployed on "Application mode" and HA is disabled (high-availability.type: 
> NONE), under this configuration parameters we are able to start multiple jobs 
> (using env.executeAsync()) of a single application.
> Note: We have also setup checkpoint on a s3 instance with 
> RETAIN_ON_CANCELLATION mode (plus other required settings).
> Lets say now we start two jobs of the same application (ex: Jobidxxx1, 
> jobidxxx2) and they are currently running on the k8s env. If we have to 
> perform Flink minor upgrade (or) upgrade of our application with minor 
> changes, in that case we will stop the Job Manager and Task Managers 
> instances and perform the necessary up-gradation then when we start both Job 
> Manager and Task Managers instance. On startup we expect the job's to be 
> restored back from the last checkpoint, but the job restoration is not 
> happening on Job manager startup. Please let us know if this is an bug (or) 
> its the general behavior of flink under application mode of deployment.
> Additional information: If we enable HA (using Zookeeper) on Application 
> mode, we are able to startup only one job (i.e., per-job behavior). When we 
> perform Flink minor upgrade (or) upgrade of our application with minor 
> changes, the checkpoint restoration is working properly on Job Manager & Task 
> Managers restart process.
> It seems checkpoint restoration and HA are inter-related, but why checkpoint 
> restoration doesn't work when HA is disabled.
>  
> Please let us know if anyone has experienced similar issues or if have any 
> suggestions, it will be highly appreciated. Thanks in advance for your 
> assistance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34009) Apache flink: Checkpoint restoration issue on Application Mode of deployment

Reply via email to