[jira] [Commented] (FLINK-28210) FlinkSessionJob fails after FlinkDeployment is updated
[ https://issues.apache.org/jira/browse/FLINK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558270#comment-17558270 ] Daniel Crowe commented on FLINK-28210: -- Thank you. I'll give it a go. > FlinkSessionJob fails after FlinkDeployment is updated > -- > > Key: FLINK-28210 > URL: https://issues.apache.org/jira/browse/FLINK-28210 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.0.0 > Environment: The [quick > start|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/] > was followed to install minikube and the flink operator. > > minikube 1.24.1 > kubectl 1.24.2 > flink operator: 1.0.0 >Reporter: Daniel Crowe >Priority: Major > > I created a flink deployment using this example: > {code} > curl > https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/examples/basic-session-job.yaml > -o basic-session-job.yaml > kubectl create -f basic-session-job.yaml > {code} > Then, I modified the memory allocated to the jobManager and applied the change > {code} > kubectl apply -f basic-session-job.yaml > {code} > The job manager is restarted to apply the change, but the jobs are not. > Looking at the operator logs, it appears that something is failing during job > status observation: > {noformat} > 2022-06-23 03:29:51,189 o.a.f.k.o.c.FlinkSessionJobController [INFO > ][default/basic-session-job-example2] Starting reconciliation > 2022-06-23 03:29:51,190 o.a.f.k.o.o.JobStatusObserver [INFO > ][default/basic-session-job-example2] Observing job status > 2022-06-23 03:29:51,205 o.a.f.k.o.c.FlinkSessionJobController [INFO > ][default/basic-session-job-example] Starting reconciliation > 2022-06-23 03:29:51,206 o.a.f.k.o.o.JobStatusObserver [INFO > ][default/basic-session-job-example] Observing job status > 2022-06-23 03:29:51,208 o.a.f.k.o.c.FlinkDeploymentController [INFO > ][default/basic-session-cluster] Starting reconciliation > 2022-06-23 03:29:51,227 o.a.f.k.o.c.FlinkDeploymentController [INFO > ][default/basic-session-cluster] End of reconciliation > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-28210) FlinkSessionJob fails after FlinkDeployment is updated
[ https://issues.apache.org/jira/browse/FLINK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557875#comment-17557875 ] Gyula Fora commented on FLINK-28210: Yes. At this point in 1.0.0 this is an expected limitation of the session mode. If you enable HA like in [https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml#L30-L31 |https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml#L30-L31] that would hopefully make it work. We will try to improve this behaviour in later versions, this is related to https://issues.apache.org/jira/browse/FLINK-27979 cc [~aitozi] > FlinkSessionJob fails after FlinkDeployment is updated > -- > > Key: FLINK-28210 > URL: https://issues.apache.org/jira/browse/FLINK-28210 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.0.0 > Environment: The [quick > start|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/] > was followed to install minikube and the flink operator. > > minikube 1.24.1 > kubectl 1.24.2 > flink operator: 1.0.0 >Reporter: Daniel Crowe >Priority: Major > > I created a flink deployment using this example: > {code} > curl > https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/examples/basic-session-job.yaml > -o basic-session-job.yaml > kubectl create -f basic-session-job.yaml > {code} > Then, I modified the memory allocated to the jobManager and applied the change > {code} > kubectl apply -f basic-session-job.yaml > {code} > The job manager is restarted to apply the change, but the jobs are not. > Looking at the operator logs, it appears that something is failing during job > status observation: > {noformat} > 2022-06-23 03:29:51,189 o.a.f.k.o.c.FlinkSessionJobController [INFO > ][default/basic-session-job-example2] Starting reconciliation > 2022-06-23 03:29:51,190 o.a.f.k.o.o.JobStatusObserver [INFO > ][default/basic-session-job-example2] Observing job status > 2022-06-23 03:29:51,205 o.a.f.k.o.c.FlinkSessionJobController [INFO > ][default/basic-session-job-example] Starting reconciliation > 2022-06-23 03:29:51,206 o.a.f.k.o.o.JobStatusObserver [INFO > ][default/basic-session-job-example] Observing job status > 2022-06-23 03:29:51,208 o.a.f.k.o.c.FlinkDeploymentController [INFO > ][default/basic-session-cluster] Starting reconciliation > 2022-06-23 03:29:51,227 o.a.f.k.o.c.FlinkDeploymentController [INFO > ][default/basic-session-cluster] End of reconciliation > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-28210) FlinkSessionJob fails after FlinkDeployment is updated
[ https://issues.apache.org/jira/browse/FLINK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557865#comment-17557865 ] Daniel Crowe commented on FLINK-28210: -- Is this the file you are after? {noformat} # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: name: basic-session-cluster spec: image: flink:1.15 flinkVersion: v1_15 jobManager: resource: memory: "2048m" cpu: 1 taskManager: resource: memory: "2048m" cpu: 1 serviceAccount: flink --- apiVersion: flink.apache.org/v1beta1 kind: FlinkSessionJob metadata: name: basic-session-job-example spec: deploymentName: basic-session-cluster job: jarURI: https://repo1.maven.org/maven2/org/apache/flink/flink-examples-streaming_2.12/1.15.0/flink-examples-streaming_2.12-1.15.0-TopSpeedWindowing.jar parallelism: 4 upgradeMode: stateless --- apiVersion: flink.apache.org/v1beta1 kind: FlinkSessionJob metadata: name: basic-session-job-example2 spec: deploymentName: basic-session-cluster job: jarURI: https://repo1.maven.org/maven2/org/apache/flink/flink-examples-streaming_2.12/1.15.0/flink-examples-streaming_2.12-1.15.0.jar parallelism: 2 upgradeMode: stateless entryClass: org.apache.flink.streaming.examples.statemachine.StateMachineExample {noformat} > FlinkSessionJob fails after FlinkDeployment is updated > -- > > Key: FLINK-28210 > URL: https://issues.apache.org/jira/browse/FLINK-28210 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.0.0 > Environment: The [quick > start|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/] > was followed to install minikube and the flink operator. > > minikube 1.24.1 > kubectl 1.24.2 > flink operator: 1.0.0 >Reporter: Daniel Crowe >Priority: Major > > I created a flink deployment using this example: > {code} > curl > https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/examples/basic-session-job.yaml > -o basic-session-job.yaml > kubectl create -f basic-session-job.yaml > {code} > Then, I modified the memory allocated to the jobManager and applied the change > {code} > kubectl apply -f basic-session-job.yaml > {code} > The job manager is restarted to apply the change, but the jobs are not. > Looking at the operator logs, it appears that something is failing during job > status observation: > {noformat} > 2022-06-23 03:29:51,189 o.a.f.k.o.c.FlinkSessionJobController [INFO > ][default/basic-session-job-example2] Starting reconciliation > 2022-06-23 03:29:51,190 o.a.f.k.o.o.JobStatusObserver [INFO > ][default/basic-session-job-example2] Observing job status > 2022-06-23 03:29:51,205 o.a.f.k.o.c.FlinkSessionJobController [INFO > ][default/basic-session-job-example] Starting reconciliation > 2022-06-23 03:29:51,206 o.a.f.k.o.o.JobStatusObserver [INFO > ][default/basic-session-job-example] Observing job status > 2022-06-23 03:29:51,208 o.a.f.k.o.c.FlinkDeploymentController [INFO > ][default/basic-session-cluster] Starting reconciliation > 2022-06-23 03:29:51,227 o.a.f.k.o.c.FlinkDeploymentController [INFO > ][default/basic-session-cluster] End of reconciliation > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-28210) FlinkSessionJob fails after FlinkDeployment is updated
[ https://issues.apache.org/jira/browse/FLINK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557800#comment-17557800 ] Gyula Fora commented on FLINK-28210: This is expected if HA is not configured for the session FlinkDeploymemt. Can you share your session yaml? > FlinkSessionJob fails after FlinkDeployment is updated > -- > > Key: FLINK-28210 > URL: https://issues.apache.org/jira/browse/FLINK-28210 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.0.0 > Environment: The [quick > start|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/] > was followed to install minikube and the flink operator. > > minikube 1.24.1 > kubectl 1.24.2 > flink operator: 1.0.0 >Reporter: Daniel Crowe >Priority: Major > > I created a flink deployment using this example: > {code} > curl > https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/examples/basic-session-job.yaml > -o basic-session-job.yaml > kubectl create -f basic-session-job.yaml > {code} > Then, I modified the memory allocated to the jobManager and applied the change > {code} > kubectl apply -f basic-session-job.yaml > {code} > The job manager is restarted to apply the change, but the jobs are not. > Looking at the operator logs, it appears that something is failing during job > status observation: > {noformat} > 2022-06-23 03:29:51,189 o.a.f.k.o.c.FlinkSessionJobController [INFO > ][default/basic-session-job-example2] Starting reconciliation > 2022-06-23 03:29:51,190 o.a.f.k.o.o.JobStatusObserver [INFO > ][default/basic-session-job-example2] Observing job status > 2022-06-23 03:29:51,205 o.a.f.k.o.c.FlinkSessionJobController [INFO > ][default/basic-session-job-example] Starting reconciliation > 2022-06-23 03:29:51,206 o.a.f.k.o.o.JobStatusObserver [INFO > ][default/basic-session-job-example] Observing job status > 2022-06-23 03:29:51,208 o.a.f.k.o.c.FlinkDeploymentController [INFO > ][default/basic-session-cluster] Starting reconciliation > 2022-06-23 03:29:51,227 o.a.f.k.o.c.FlinkDeploymentController [INFO > ][default/basic-session-cluster] End of reconciliation > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)