[jira] [Commented] (FLINK-34009) Apache flink: Checkpoint restoration issue on Application Mode of deployment
[ https://issues.apache.org/jira/browse/FLINK-34009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804118#comment-17804118 ] Vijay commented on FLINK-34009: --- As flink support multi-job execution on Application mode of deployment (with HA being disabled), we need more details of how to enable restoration process via checkpointing (when app / flink is upgraded). Please support us to overcome this issue. Thanks. > Apache flink: Checkpoint restoration issue on Application Mode of deployment > > > Key: FLINK-34009 > URL: https://issues.apache.org/jira/browse/FLINK-34009 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.18.0 > Environment: Flink version: 1.18 > Zookeeper version: 3.7.2 > Env: Custom flink docker image (with embedded application class) deployed > over kubernetes (v1.26.11). >Reporter: Vijay >Priority: Major > > Hi Team, > Good Day. Wish you all a happy new year 2024. > We are using Flink (1.18) version on our flink cluster. Job manager has been > deployed on "Application mode" and HA is disabled (high-availability.type: > NONE), under this configuration parameters we are able to start multiple jobs > (using env.executeAsync()) of a single application. > Note: We have also setup checkpoint on a s3 instance with > RETAIN_ON_CANCELLATION mode (plus other required settings). > Lets say now we start two jobs of the same application (ex: Jobidxxx1, > jobidxxx2) and they are currently running on the k8s env. If we have to > perform Flink minor upgrade (or) upgrade of our application with minor > changes, in that case we will stop the Job Manager and Task Managers > instances and perform the necessary up-gradation then when we start both Job > Manager and Task Managers instance. On startup we expect the job's to be > restored back from the last checkpoint, but the job restoration is not > happening on Job manager startup. Please let us know if this is an bug (or) > its the general behavior of flink under application mode of deployment. > Additional information: If we enable HA (using Zookeeper) on Application > mode, we are able to startup only one job (i.e., per-job behavior). When we > perform Flink minor upgrade (or) upgrade of our application with minor > changes, the checkpoint restoration is working properly on Job Manager & Task > Managers restart process. > It seems checkpoint restoration and HA are inter-related, but why checkpoint > restoration doesn't work when HA is disabled. > > Please let us know if anyone has experienced similar issues or if have any > suggestions, it will be highly appreciated. Thanks in advance for your > assistance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34009) Apache flink: Checkpoint restoration issue on Application Mode of deployment
Vijay created FLINK-34009: - Summary: Apache flink: Checkpoint restoration issue on Application Mode of deployment Key: FLINK-34009 URL: https://issues.apache.org/jira/browse/FLINK-34009 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.18.0 Environment: Flink version: 1.18 Zookeeper version: 3.7.2 Env: Custom flink docker image (with embedded application class) deployed over kubernetes (v1.26.11). Reporter: Vijay Hi Team, Good Day. Wish you all a happy new year 2024. We are using Flink (1.18) version on our flink cluster. Job manager has been deployed on "Application mode" and HA is disabled (high-availability.type: NONE), under this configuration parameters we are able to start multiple jobs (using env.executeAsync()) of a single application. Note: We have also setup checkpoint on a s3 instance with RETAIN_ON_CANCELLATION mode (plus other required settings). Lets say now we start two jobs of the same application (ex: Jobidxxx1, jobidxxx2) and they are currently running on the k8s env. If we have to perform Flink minor upgrade (or) upgrade of our application with minor changes, in that case we will stop the Job Manager and Task Managers instances and perform the necessary up-gradation then when we start both Job Manager and Task Managers instance. On startup we expect the job's to be restored back from the last checkpoint, but the job restoration is not happening on Job manager startup. Please let us know if this is an bug (or) its the general behavior of flink under application mode of deployment. Additional information: If we enable HA (using Zookeeper) on Application mode, we are able to startup only one job (i.e., per-job behavior). When we perform Flink minor upgrade (or) upgrade of our application with minor changes, the checkpoint restoration is working properly on Job Manager & Task Managers restart process. It seems checkpoint restoration and HA are inter-related, but why checkpoint restoration doesn't work when HA is disabled. Please let us know if anyone has experienced similar issues or if have any suggestions, it will be highly appreciated. Thanks in advance for your assistance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33944) Apache Flink: Process to restore more than one job on job manager startup from the respective savepoints
[ https://issues.apache.org/jira/browse/FLINK-33944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800927#comment-17800927 ] Vijay commented on FLINK-33944: --- [~martijnvisser] Can we use Aligned checkpointing instead of Savepoint for restore process when flink is upgraded? > Apache Flink: Process to restore more than one job on job manager startup > from the respective savepoints > > > Key: FLINK-33944 > URL: https://issues.apache.org/jira/browse/FLINK-33944 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing >Affects Versions: 1.18.0 >Reporter: Vijay >Priority: Major > > > We are using Flink (1.18) version for our Flink cluster. The job manager has > been deployed in "Application mode" and we are looking for a process to > restore multiple jobs (using their respective savepoint directories) when the > job manager is started. Currently, we have the option to restore only one job > while running "standalone-job.sh" using the --fromSavepoint and > --allowNonRestoredState. However, we need a way to trigger multiple job > executions via Java client (from its respective savepoint location) on > Jobmanager startup. > Note: We are not using a Kubernetes native deployment, but we are using k8s > standalone mode of deployment. > Additional Query: If there is a process to restore multiple jobs from its > respective savepoints on "Application mode" of deployment, is the same > supported on Session mode of deployment or not? > *Expected process:* > # Before starting with the Flink/application image upgrade, trigger the > savepoints for all the current running jobs. > # Once the savepoints process completed for all jobs, will trigger the scale > down of job manager and task manager instances. > # Update the image version on the k8s deployment with the update application > image. > # After image version is updated, scale up the job manager and task manager. > # We need a process to restore the previously running jobs from the > savepoint dir and start all the jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33944) Apache Flink: Process to restore more than one job on job manager startup from the respective savepoints
[ https://issues.apache.org/jira/browse/FLINK-33944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800823#comment-17800823 ] Vijay commented on FLINK-33944: --- [~martijnvisser] Using "application mode" we can run multiple run multiple instance of job executions of a single flink application and "session mode" can also configured with the same, also it supports multiple flink application based job executions. We want to use the "application mode" to trigger savepoint for each job execution and restore for each job executions back once the flink upgrade / image upgrade. Please confirm if the version existing support this requirement on "application mode" or not? > Apache Flink: Process to restore more than one job on job manager startup > from the respective savepoints > > > Key: FLINK-33944 > URL: https://issues.apache.org/jira/browse/FLINK-33944 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing >Affects Versions: 1.18.0 >Reporter: Vijay >Priority: Major > > > We are using Flink (1.18) version for our Flink cluster. The job manager has > been deployed in "Application mode" and we are looking for a process to > restore multiple jobs (using their respective savepoint directories) when the > job manager is started. Currently, we have the option to restore only one job > while running "standalone-job.sh" using the --fromSavepoint and > --allowNonRestoredState. However, we need a way to trigger multiple job > executions via Java client (from its respective savepoint location) on > Jobmanager startup. > Note: We are not using a Kubernetes native deployment, but we are using k8s > standalone mode of deployment. > Additional Query: If there is a process to restore multiple jobs from its > respective savepoints on "Application mode" of deployment, is the same > supported on Session mode of deployment or not? > *Expected process:* > # Before starting with the Flink/application image upgrade, trigger the > savepoints for all the current running jobs. > # Once the savepoints process completed for all jobs, will trigger the scale > down of job manager and task manager instances. > # Update the image version on the k8s deployment with the update application > image. > # After image version is updated, scale up the job manager and task manager. > # We need a process to restore the previously running jobs from the > savepoint dir and start all the jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33943) Apache flink: Issues after configuring HA (using zookeeper setting)
[ https://issues.apache.org/jira/browse/FLINK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800694#comment-17800694 ] Vijay commented on FLINK-33943: --- Thanks [~wanglijie] for your inputs. > Apache flink: Issues after configuring HA (using zookeeper setting) > --- > > Key: FLINK-33943 > URL: https://issues.apache.org/jira/browse/FLINK-33943 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.18.0 > Environment: Flink version: 1.18 > Zookeeper version: 3.7.2 > Env: Custom flink docker image (with embedded application class) deployed > over kubernetes (v1.26.11). > >Reporter: Vijay >Priority: Major > > Hi Team, > *Note:* Not sure whether I have picked the right component while raising the > issue. > Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink > cluster. Job manager has been deployed on "Application mode" and when HA is > disabled (high-availability.type: NONE) we are able to start multiple jobs > (using env.executeAsyn()) for a single application. But when I setup the > Zookeeper as the HA type (high-availability.type: zookeeper), we are only > seeing only one job is getting executed on the Flink dashboard. Following are > the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. > Please let us know if anyone has experienced similar issues and have any > suggestions. Thanks in advance for your assistance. > *Note:* We are using a Streaming application and following are the > flink-config.yaml configurations. > *Additional query:* Does "Session mode" of deployment support HA for multiple > execute() executions? > # high-availability.storageDir: /opt/flink/data > # high-availability.cluster-id: test > # high-availability.zookeeper.quorum: localhost:2181 > # high-availability.type: zookeeper > # high-availability.zookeeper.path.root: /dp/configs/flinkha -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33944) Apache Flink: Process to restore more than one job on job manager startup from the respective savepoints
[ https://issues.apache.org/jira/browse/FLINK-33944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800692#comment-17800692 ] Vijay commented on FLINK-33944: --- [~wanglijie] Do you have any input on this information request for savepoint restore process for multiple jobs (via Java Client) or Job-manager startup (via standalone-job.sh or jobmanager.sh). "standalone-job.sh" supports only one job to be restore from savepoint on Jobmanager startup. > Apache Flink: Process to restore more than one job on job manager startup > from the respective savepoints > > > Key: FLINK-33944 > URL: https://issues.apache.org/jira/browse/FLINK-33944 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.18.0 >Reporter: Vijay >Priority: Major > > > We are using Flink (1.18) version for our Flink cluster. The job manager has > been deployed in "Application mode" and we are looking for a process to > restore multiple jobs (using their respective savepoint directories) when the > job manager is started. Currently, we have the option to restore only one job > while running "standalone-job.sh" using the --fromSavepoint and > --allowNonRestoredState. However, we need a way to trigger multiple job > executions via Java client (from its respective savepoint location) on > Jobmanager startup. > Note: We are not using a Kubernetes native deployment, but we are using k8s > standalone mode of deployment. > Additional Query: If there is a process to restore multiple jobs from its > respective savepoints on "Application mode" of deployment, is the same > supported on Session mode of deployment or not? > *Expected process:* > # Before starting with the Flink/application image upgrade, trigger the > savepoints for all the current running jobs. > # Once the savepoints process completed for all jobs, will trigger the scale > down of job manager and task manager instances. > # Update the image version on the k8s deployment with the update application > image. > # After image version is updated, scale up the job manager and task manager. > # We need a process to restore the previously running jobs from the > savepoint dir and start all the jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33944) Apache Flink: Process to restore more than one job on job manager startup from the respective savepoints
[ https://issues.apache.org/jira/browse/FLINK-33944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated FLINK-33944: -- Description: We are using Flink (1.18) version for our Flink cluster. The job manager has been deployed in "Application mode" and we are looking for a process to restore multiple jobs (using their respective savepoint directories) when the job manager is started. Currently, we have the option to restore only one job while running "standalone-job.sh" using the --fromSavepoint and --allowNonRestoredState. However, we need a way to trigger multiple job executions via Java client (from its respective savepoint location) on Jobmanager startup. Note: We are not using a Kubernetes native deployment, but we are using k8s standalone mode of deployment. Additional Query: If there is a process to restore multiple jobs from its respective savepoints on "Application mode" of deployment, is the same supported on Session mode of deployment or not? *Expected process:* # Before starting with the Flink/application image upgrade, trigger the savepoints for all the current running jobs. # Once the savepoints process completed for all jobs, will trigger the scale down of job manager and task manager instances. # Update the image version on the k8s deployment with the update application image. # After image version is updated, scale up the job manager and task manager. # We need a process to restore the previously running jobs from the savepoint dir and start all the jobs. was: We are using Flink (1.18) version for our Flink cluster. The job manager has been deployed in "Application mode" and we are looking for a process to restore multiple jobs (using their respective savepoint directories) when the job manager is started. Currently, we have the option to restore only one job while running "standalone-job.sh" using the --fromSavepoint and --allowNonRestoredState. However, we need a way to trigger multiple job executions via Java client. Note: We are not using a Kubernetes native deployment, but we are using k8s standalone mode of deployment. Additional Query: If there is a process to restore multiple jobs from its respective savepoints on "Application mode" of deployment, is the same supported on Session mode of deployment or not? *Expected process:* # Before starting with the Flink/application image upgrade, trigger the savepoints for all the current running jobs. # Once the savepoints process completed for all jobs, will trigger the scale down of job manager and task manager instances. # Update the image version on the k8s deployment with the update application image. # After image version is updated, scale up the job manager and task manager. # We need a process to restore the previously running jobs from the savepoint dir and start all the jobs. > Apache Flink: Process to restore more than one job on job manager startup > from the respective savepoints > > > Key: FLINK-33944 > URL: https://issues.apache.org/jira/browse/FLINK-33944 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.18.0 >Reporter: Vijay >Priority: Major > > > We are using Flink (1.18) version for our Flink cluster. The job manager has > been deployed in "Application mode" and we are looking for a process to > restore multiple jobs (using their respective savepoint directories) when the > job manager is started. Currently, we have the option to restore only one job > while running "standalone-job.sh" using the --fromSavepoint and > --allowNonRestoredState. However, we need a way to trigger multiple job > executions via Java client (from its respective savepoint location) on > Jobmanager startup. > Note: We are not using a Kubernetes native deployment, but we are using k8s > standalone mode of deployment. > Additional Query: If there is a process to restore multiple jobs from its > respective savepoints on "Application mode" of deployment, is the same > supported on Session mode of deployment or not? > *Expected process:* > # Before starting with the Flink/application image upgrade, trigger the > savepoints for all the current running jobs. > # Once the savepoints process completed for all jobs, will trigger the scale > down of job manager and task manager instances. > # Update the image version on the k8s deployment with the update application > image. > # After image version is updated, scale up the job manager and task manager. > # We need a process to restore the previously running jobs from the > savepoint dir and start all the jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33943) Apache flink: Issues after configuring HA (using zookeeper setting)
[ https://issues.apache.org/jira/browse/FLINK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800677#comment-17800677 ] Vijay edited comment on FLINK-33943 at 12/27/23 2:48 AM: - [~wanglijie] Thanks for the prompt update. Is there a plan to support of HA functionality on application mode (for multiple exections) in near future versions? (or) is there is technical reason why its not supported currently? was (Author: JIRAUSER303619): [~wanglijie] Thanks for the prompt update. Is there a plan to support of HA functionality on application mode (for multiple exections) be supported in near future versions? (or) is there is technical reason why its not supported currently? > Apache flink: Issues after configuring HA (using zookeeper setting) > --- > > Key: FLINK-33943 > URL: https://issues.apache.org/jira/browse/FLINK-33943 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.18.0 > Environment: Flink version: 1.18 > Zookeeper version: 3.7.2 > Env: Custom flink docker image (with embedded application class) deployed > over kubernetes (v1.26.11). > >Reporter: Vijay >Priority: Major > > Hi Team, > *Note:* Not sure whether I have picked the right component while raising the > issue. > Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink > cluster. Job manager has been deployed on "Application mode" and when HA is > disabled (high-availability.type: NONE) we are able to start multiple jobs > (using env.executeAsyn()) for a single application. But when I setup the > Zookeeper as the HA type (high-availability.type: zookeeper), we are only > seeing only one job is getting executed on the Flink dashboard. Following are > the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. > Please let us know if anyone has experienced similar issues and have any > suggestions. Thanks in advance for your assistance. > *Note:* We are using a Streaming application and following are the > flink-config.yaml configurations. > *Additional query:* Does "Session mode" of deployment support HA for multiple > execute() executions? > # high-availability.storageDir: /opt/flink/data > # high-availability.cluster-id: test > # high-availability.zookeeper.quorum: localhost:2181 > # high-availability.type: zookeeper > # high-availability.zookeeper.path.root: /dp/configs/flinkha -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33943) Apache flink: Issues after configuring HA (using zookeeper setting)
[ https://issues.apache.org/jira/browse/FLINK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800677#comment-17800677 ] Vijay edited comment on FLINK-33943 at 12/27/23 2:48 AM: - [~wanglijie] Thanks for the prompt update. Is there a plan to support of HA functionality on application mode (for multiple exections) in near future versions? (or) is there a technical reasoning why its not supported currently? was (Author: JIRAUSER303619): [~wanglijie] Thanks for the prompt update. Is there a plan to support of HA functionality on application mode (for multiple exections) in near future versions? (or) is there is technical reason why its not supported currently? > Apache flink: Issues after configuring HA (using zookeeper setting) > --- > > Key: FLINK-33943 > URL: https://issues.apache.org/jira/browse/FLINK-33943 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.18.0 > Environment: Flink version: 1.18 > Zookeeper version: 3.7.2 > Env: Custom flink docker image (with embedded application class) deployed > over kubernetes (v1.26.11). > >Reporter: Vijay >Priority: Major > > Hi Team, > *Note:* Not sure whether I have picked the right component while raising the > issue. > Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink > cluster. Job manager has been deployed on "Application mode" and when HA is > disabled (high-availability.type: NONE) we are able to start multiple jobs > (using env.executeAsyn()) for a single application. But when I setup the > Zookeeper as the HA type (high-availability.type: zookeeper), we are only > seeing only one job is getting executed on the Flink dashboard. Following are > the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. > Please let us know if anyone has experienced similar issues and have any > suggestions. Thanks in advance for your assistance. > *Note:* We are using a Streaming application and following are the > flink-config.yaml configurations. > *Additional query:* Does "Session mode" of deployment support HA for multiple > execute() executions? > # high-availability.storageDir: /opt/flink/data > # high-availability.cluster-id: test > # high-availability.zookeeper.quorum: localhost:2181 > # high-availability.type: zookeeper > # high-availability.zookeeper.path.root: /dp/configs/flinkha -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33943) Apache flink: Issues after configuring HA (using zookeeper setting)
[ https://issues.apache.org/jira/browse/FLINK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800677#comment-17800677 ] Vijay edited comment on FLINK-33943 at 12/27/23 2:48 AM: - [~wanglijie] Thanks for the prompt update. Is there a plan to support of HA functionality on application mode (for multiple exections) be supported in near future versions? (or) is there is technical reason why its not supported currently? was (Author: JIRAUSER303619): Thanks for the prompt update. Is there a plan to support of HA functionality on application mode (for multiple exections) be supported in near future versions? (or) is there is technical reason why its not supported currently? > Apache flink: Issues after configuring HA (using zookeeper setting) > --- > > Key: FLINK-33943 > URL: https://issues.apache.org/jira/browse/FLINK-33943 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.18.0 > Environment: Flink version: 1.18 > Zookeeper version: 3.7.2 > Env: Custom flink docker image (with embedded application class) deployed > over kubernetes (v1.26.11). > >Reporter: Vijay >Priority: Major > > Hi Team, > *Note:* Not sure whether I have picked the right component while raising the > issue. > Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink > cluster. Job manager has been deployed on "Application mode" and when HA is > disabled (high-availability.type: NONE) we are able to start multiple jobs > (using env.executeAsyn()) for a single application. But when I setup the > Zookeeper as the HA type (high-availability.type: zookeeper), we are only > seeing only one job is getting executed on the Flink dashboard. Following are > the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. > Please let us know if anyone has experienced similar issues and have any > suggestions. Thanks in advance for your assistance. > *Note:* We are using a Streaming application and following are the > flink-config.yaml configurations. > *Additional query:* Does "Session mode" of deployment support HA for multiple > execute() executions? > # high-availability.storageDir: /opt/flink/data > # high-availability.cluster-id: test > # high-availability.zookeeper.quorum: localhost:2181 > # high-availability.type: zookeeper > # high-availability.zookeeper.path.root: /dp/configs/flinkha -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33943) Apache flink: Issues after configuring HA (using zookeeper setting)
[ https://issues.apache.org/jira/browse/FLINK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800677#comment-17800677 ] Vijay commented on FLINK-33943: --- Thanks for the prompt update. Is there a plan to support of HA functionality on application mode (for multiple exections) be supported in near future versions? (or) is there is technical reason why its not supported currently? > Apache flink: Issues after configuring HA (using zookeeper setting) > --- > > Key: FLINK-33943 > URL: https://issues.apache.org/jira/browse/FLINK-33943 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.18.0 > Environment: Flink version: 1.18 > Zookeeper version: 3.7.2 > Env: Custom flink docker image (with embedded application class) deployed > over kubernetes (v1.26.11). > >Reporter: Vijay >Priority: Major > > Hi Team, > *Note:* Not sure whether I have picked the right component while raising the > issue. > Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink > cluster. Job manager has been deployed on "Application mode" and when HA is > disabled (high-availability.type: NONE) we are able to start multiple jobs > (using env.executeAsyn()) for a single application. But when I setup the > Zookeeper as the HA type (high-availability.type: zookeeper), we are only > seeing only one job is getting executed on the Flink dashboard. Following are > the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. > Please let us know if anyone has experienced similar issues and have any > suggestions. Thanks in advance for your assistance. > *Note:* We are using a Streaming application and following are the > flink-config.yaml configurations. > *Additional query:* Does "Session mode" of deployment support HA for multiple > execute() executions? > # high-availability.storageDir: /opt/flink/data > # high-availability.cluster-id: test > # high-availability.zookeeper.quorum: localhost:2181 > # high-availability.type: zookeeper > # high-availability.zookeeper.path.root: /dp/configs/flinkha -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33944) Apache Flink: Process to restore more than one job on job manager startup from the respective savepoints
[ https://issues.apache.org/jira/browse/FLINK-33944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated FLINK-33944: -- Description: We are using Flink (1.18) version for our Flink cluster. The job manager has been deployed in "Application mode" and we are looking for a process to restore multiple jobs (using their respective savepoint directories) when the job manager is started. Currently, we have the option to restore only one job while running "standalone-job.sh" using the --fromSavepoint and --allowNonRestoredState. However, we need a way to trigger multiple job executions via Java client. Note: We are not using a Kubernetes native deployment, but we are using k8s standalone mode of deployment. Additional Query: If there is a process to restore multiple jobs from its respective savepoints on "Application mode" of deployment, is the same supported on Session mode of deployment or not? *Expected process:* # Before starting with the Flink/application image upgrade, trigger the savepoints for all the current running jobs. # Once the savepoints process completed for all jobs, will trigger the scale down of job manager and task manager instances. # Update the image version on the k8s deployment with the update application image. # After image version is updated, scale up the job manager and task manager. # We need a process to restore the previously running jobs from the savepoint dir and start all the jobs. was: We are using Flink (1.18) version for our Flink cluster. The job manager has been deployed in "Application mode" and we are looking for a process to restore multiple jobs (using their respective savepoint directories) when the job manager is started. Currently, we have the option to restore only one job while running "standalone-job.sh" using the --fromSavepoint and --allowNonRestoredState. However, we need a way to trigger multiple job executions via Java client. Note: We are not using a Kubernetes native deployment, but we are using k8s standalone mode of deployment. *Expected process:* # Before starting with the Flink/application image upgrade, trigger the savepoints for all the current running jobs. # Once the savepoints process completed for all jobs, will trigger the scale down of job manager and task manager instances. # Update the image version on the k8s deployment with the update application image. # After image version is updated, scale up the job manager and task manager. # We need a process to restore the previously running jobs from the savepoint dir and start all the jobs. > Apache Flink: Process to restore more than one job on job manager startup > from the respective savepoints > > > Key: FLINK-33944 > URL: https://issues.apache.org/jira/browse/FLINK-33944 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.18.0 >Reporter: Vijay >Priority: Major > > > We are using Flink (1.18) version for our Flink cluster. The job manager has > been deployed in "Application mode" and we are looking for a process to > restore multiple jobs (using their respective savepoint directories) when the > job manager is started. Currently, we have the option to restore only one job > while running "standalone-job.sh" using the --fromSavepoint and > --allowNonRestoredState. However, we need a way to trigger multiple job > executions via Java client. > Note: We are not using a Kubernetes native deployment, but we are using k8s > standalone mode of deployment. > Additional Query: If there is a process to restore multiple jobs from its > respective savepoints on "Application mode" of deployment, is the same > supported on Session mode of deployment or not? > *Expected process:* > # Before starting with the Flink/application image upgrade, trigger the > savepoints for all the current running jobs. > # Once the savepoints process completed for all jobs, will trigger the scale > down of job manager and task manager instances. > # Update the image version on the k8s deployment with the update application > image. > # After image version is updated, scale up the job manager and task manager. > # We need a process to restore the previously running jobs from the > savepoint dir and start all the jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33944) Apache Flink: Process to restore more than one job on job manager startup from the respective savepoints
Vijay created FLINK-33944: - Summary: Apache Flink: Process to restore more than one job on job manager startup from the respective savepoints Key: FLINK-33944 URL: https://issues.apache.org/jira/browse/FLINK-33944 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.18.0 Reporter: Vijay We are using Flink (1.18) version for our Flink cluster. The job manager has been deployed in "Application mode" and we are looking for a process to restore multiple jobs (using their respective savepoint directories) when the job manager is started. Currently, we have the option to restore only one job while running "standalone-job.sh" using the --fromSavepoint and --allowNonRestoredState. However, we need a way to trigger multiple job executions via Java client. Note: We are not using a Kubernetes native deployment, but we are using k8s standalone mode of deployment. *Expected process:* # Before starting with the Flink/application image upgrade, trigger the savepoints for all the current running jobs. # Once the savepoints process completed for all jobs, will trigger the scale down of job manager and task manager instances. # Update the image version on the k8s deployment with the update application image. # After image version is updated, scale up the job manager and task manager. # We need a process to restore the previously running jobs from the savepoint dir and start all the jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33943) Apache flink: Issues after configuring HA (using zookeeper setting)
[ https://issues.apache.org/jira/browse/FLINK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated FLINK-33943: -- Description: Hi Team, *Note:* Not sure whether I have picked the right component while raising the issue. Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink cluster. Job manager has been deployed on "Application mode" and when HA is disabled (high-availability.type: NONE) we are able to start multiple jobs (using env.executeAsyn()) for a single application. But when I setup the Zookeeper as the HA type (high-availability.type: zookeeper), we are only seeing only one job is getting executed on the Flink dashboard. Following are the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. Please let us know if anyone has experienced similar issues and have any suggestions. Thanks in advance for your assistance. *Note:* We are using a Streaming application and following are the flink-config.yaml configurations. *Additional query:* Does "Session mode" of deployment support HA for multiple execute() executions? # high-availability.storageDir: /opt/flink/data # high-availability.cluster-id: test # high-availability.zookeeper.quorum: localhost:2181 # high-availability.type: zookeeper # high-availability.zookeeper.path.root: /dp/configs/flinkha was: Hi Team, Note: Not sure whether I have picked the right component while raising the issue. Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink cluster. Job manager has been deployed on "Application mode" and when HA is disabled (high-availability.type: NONE) we are able to start multiple jobs (using env.executeAsyn()) for a single application. But when I setup the Zookeeper as the HA type (high-availability.type: zookeeper), we are only seeing only one job is getting executed on the Flink dashboard. Following are the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. Please let us know if anyone has experienced similar issues and have any suggestions. Thanks in advance for your assistance. Note: We are using a Streaming application and following are the flink-config.yaml configurations. # high-availability.storageDir: /opt/flink/data # high-availability.cluster-id: test # high-availability.zookeeper.quorum: localhost:2181 # high-availability.type: zookeeper # high-availability.zookeeper.path.root: /dp/configs/flinkha > Apache flink: Issues after configuring HA (using zookeeper setting) > --- > > Key: FLINK-33943 > URL: https://issues.apache.org/jira/browse/FLINK-33943 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.18.0 > Environment: Flink version: 1.18 > Zookeeper version: 3.7.2 > Env: Custom flink docker image (with embedded application class) deployed > over kubernetes (v1.26.11). > >Reporter: Vijay >Priority: Major > > Hi Team, > *Note:* Not sure whether I have picked the right component while raising the > issue. > Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink > cluster. Job manager has been deployed on "Application mode" and when HA is > disabled (high-availability.type: NONE) we are able to start multiple jobs > (using env.executeAsyn()) for a single application. But when I setup the > Zookeeper as the HA type (high-availability.type: zookeeper), we are only > seeing only one job is getting executed on the Flink dashboard. Following are > the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. > Please let us know if anyone has experienced similar issues and have any > suggestions. Thanks in advance for your assistance. > *Note:* We are using a Streaming application and following are the > flink-config.yaml configurations. > *Additional query:* Does "Session mode" of deployment support HA for multiple > execute() executions? > # high-availability.storageDir: /opt/flink/data > # high-availability.cluster-id: test > # high-availability.zookeeper.quorum: localhost:2181 > # high-availability.type: zookeeper > # high-availability.zookeeper.path.root: /dp/configs/flinkha -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33943) Apache flink: Issues after configuring HA (using zookeeper setting)
[ https://issues.apache.org/jira/browse/FLINK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800670#comment-17800670 ] Vijay edited comment on FLINK-33943 at 12/27/23 2:22 AM: - [~wanglijie] Is the HA in session mode support execution of multiple execute/executeAsync operations? Sorry I am unable to find any documentation related to HA on session mode and its features / limitations. was (Author: JIRAUSER303619): [~wanglijie] Is the HA in session mode support execution of multiple execute/executeAsync operations? > Apache flink: Issues after configuring HA (using zookeeper setting) > --- > > Key: FLINK-33943 > URL: https://issues.apache.org/jira/browse/FLINK-33943 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.18.0 > Environment: Flink version: 1.18 > Zookeeper version: 3.7.2 > Env: Custom flink docker image (with embedded application class) deployed > over kubernetes (v1.26.11). > >Reporter: Vijay >Priority: Major > > Hi Team, > Note: Not sure whether I have picked the right component while raising the > issue. > Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink > cluster. Job manager has been deployed on "Application mode" and when HA is > disabled (high-availability.type: NONE) we are able to start multiple jobs > (using env.executeAsyn()) for a single application. But when I setup the > Zookeeper as the HA type (high-availability.type: zookeeper), we are only > seeing only one job is getting executed on the Flink dashboard. Following are > the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. > Please let us know if anyone has experienced similar issues and have any > suggestions. Thanks in advance for your assistance. > Note: We are using a Streaming application and following are the > flink-config.yaml configurations. > # high-availability.storageDir: /opt/flink/data > # high-availability.cluster-id: test > # high-availability.zookeeper.quorum: localhost:2181 > # high-availability.type: zookeeper > # high-availability.zookeeper.path.root: /dp/configs/flinkha -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33943) Apache flink: Issues after configuring HA (using zookeeper setting)
[ https://issues.apache.org/jira/browse/FLINK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800670#comment-17800670 ] Vijay commented on FLINK-33943: --- [~wanglijie] Is the HA in session mode support execution of multiple execute/executeAsync operations? > Apache flink: Issues after configuring HA (using zookeeper setting) > --- > > Key: FLINK-33943 > URL: https://issues.apache.org/jira/browse/FLINK-33943 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.18.0 > Environment: Flink version: 1.18 > Zookeeper version: 3.7.2 > Env: Custom flink docker image (with embedded application class) deployed > over kubernetes (v1.26.11). > >Reporter: Vijay >Priority: Major > > Hi Team, > Note: Not sure whether I have picked the right component while raising the > issue. > Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink > cluster. Job manager has been deployed on "Application mode" and when HA is > disabled (high-availability.type: NONE) we are able to start multiple jobs > (using env.executeAsyn()) for a single application. But when I setup the > Zookeeper as the HA type (high-availability.type: zookeeper), we are only > seeing only one job is getting executed on the Flink dashboard. Following are > the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. > Please let us know if anyone has experienced similar issues and have any > suggestions. Thanks in advance for your assistance. > Note: We are using a Streaming application and following are the > flink-config.yaml configurations. > # high-availability.storageDir: /opt/flink/data > # high-availability.cluster-id: test > # high-availability.zookeeper.quorum: localhost:2181 > # high-availability.type: zookeeper > # high-availability.zookeeper.path.root: /dp/configs/flinkha -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33943) Apache flink: Issues after configuring HA (using zookeeper setting)
[ https://issues.apache.org/jira/browse/FLINK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800669#comment-17800669 ] Vijay commented on FLINK-33943: --- The issue can be reproduced by enabling high-availability.type: zookeeper (with above config's specified on the issue) and in the flink client code try to call env.executeAsync() for multiple instance of job for the same application. Now open Dashboard and check the number of jobs running(same can be tried via REST api call too), then you will find only one job running. When you disable HA (high-availability.type: NONE), then you can see multiple jobs running (same can be seen via REST api call too). REST api: http://:8081/v1/jobs > Apache flink: Issues after configuring HA (using zookeeper setting) > --- > > Key: FLINK-33943 > URL: https://issues.apache.org/jira/browse/FLINK-33943 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.18.0 > Environment: Flink version: 1.18 > Zookeeper version: 3.7.2 > Env: Custom flink docker image (with embedded application class) deployed > over kubernetes (v1.26.11). > >Reporter: Vijay >Priority: Major > > Hi Team, > Note: Not sure whether I have picked the right component while raising the > issue. > Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink > cluster. Job manager has been deployed on "Application mode" and when HA is > disabled (high-availability.type: NONE) we are able to start multiple jobs > (using env.executeAsyn()) for a single application. But when I setup the > Zookeeper as the HA type (high-availability.type: zookeeper), we are only > seeing only one job is getting executed on the Flink dashboard. Following are > the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. > Please let us know if anyone has experienced similar issues and have any > suggestions. Thanks in advance for your assistance. > Note: We are using a Streaming application and following are the > flink-config.yaml configurations. > # high-availability.storageDir: /opt/flink/data > # high-availability.cluster-id: test > # high-availability.zookeeper.quorum: localhost:2181 > # high-availability.type: zookeeper > # high-availability.zookeeper.path.root: /dp/configs/flinkha -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33943) Apache flink: Issues after configuring HA (using zookeeper setting)
Vijay created FLINK-33943: - Summary: Apache flink: Issues after configuring HA (using zookeeper setting) Key: FLINK-33943 URL: https://issues.apache.org/jira/browse/FLINK-33943 Project: Flink Issue Type: Bug Components: Build System Affects Versions: 1.18.0 Environment: Flink version: 1.18 Zookeeper version: 3.7.2 Env: Custom flink docker image (with embedded application class) deployed over kubernetes (v1.26.11). Reporter: Vijay Hi Team, Note: Not sure whether I have picked the right component while raising the issue. Good Day. I am using Flink (1.18) version and zookeeper (3.7.2) for our flink cluster. Job manager has been deployed on "Application mode" and when HA is disabled (high-availability.type: NONE) we are able to start multiple jobs (using env.executeAsyn()) for a single application. But when I setup the Zookeeper as the HA type (high-availability.type: zookeeper), we are only seeing only one job is getting executed on the Flink dashboard. Following are the parameters setup for the Zookeeper based HA setup on the flink-conf.yaml. Please let us know if anyone has experienced similar issues and have any suggestions. Thanks in advance for your assistance. Note: We are using a Streaming application and following are the flink-config.yaml configurations. # high-availability.storageDir: /opt/flink/data # high-availability.cluster-id: test # high-availability.zookeeper.quorum: localhost:2181 # high-availability.type: zookeeper # high-availability.zookeeper.path.root: /dp/configs/flinkha -- This message was sent by Atlassian Jira (v8.20.10#820010)