[jira] [Commented] (FLINK-33377) When Flink version >= 1.15 and Flink Operator is used, there is a waste of resources when running Flink batch jobs.
[ https://issues.apache.org/jira/browse/FLINK-33377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779984#comment-17779984 ] hjw commented on FLINK-33377: - [~gsomogyi] Can you take a look? > When Flink version >= 1.15 and Flink Operator is used, there is a waste of > resources when running Flink batch jobs. > --- > > Key: FLINK-33377 > URL: https://issues.apache.org/jira/browse/FLINK-33377 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.5.0 >Reporter: hjw >Priority: Major > > According to > [FLINK-29376|https://issues.apache.org/jira/browse/FLINK-29376],SHUTDOWN_ON_APPLICATION_FINISH > always be set false when Flink version 1.15 and above. > However,the JobManager still exists after a Flink batch job runs normally,Is > this a waste of resources? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33377) When Flink version >= 1.15 and Flink Operator is used, there is a waste of resources when running Flink batch jobs.
hjw created FLINK-33377: --- Summary: When Flink version >= 1.15 and Flink Operator is used, there is a waste of resources when running Flink batch jobs. Key: FLINK-33377 URL: https://issues.apache.org/jira/browse/FLINK-33377 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.5.0 Reporter: hjw According to [FLINK-29376|https://issues.apache.org/jira/browse/FLINK-29376],SHUTDOWN_ON_APPLICATION_FINISH always be set false when Flink version 1.15 and above. However,the JobManager still exists after a Flink batch job runs normally,Is this a waste of resources? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-31203) Application upgrade rollbacks failed in Flink Kubernetes Operator
[ https://issues.apache.org/jira/browse/FLINK-31203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693865#comment-17693865 ] hjw edited comment on FLINK-31203 at 2/27/23 8:03 AM: -- [Gyula Fora|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gyfora] Could you help to look at this problem? thx. was (Author: JIRAUSER280998): @[Gyula Fora|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gyfora] > Application upgrade rollbacks failed in Flink Kubernetes Operator > - > > Key: FLINK-31203 > URL: https://issues.apache.org/jira/browse/FLINK-31203 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.3.1 >Reporter: hjw >Priority: Major > > I make a test on the Application upgrade rollback feature, but this function > fails.The Flink application mode job cannot roll back to last stable spec. > As shown in the follow example, I declare a error pod-template without a > container named flink-main-container to test rollback feature. > However, only the error of deploying the flink application job failed without > rollback. > > Error: > org.apache.flink.client.deployment.ClusterDeploymentException: Could not > create Kubernetes cluster "basic-example". > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292) > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: POST at: > https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. > Message: Deployment.apps "basic-example" is invalid: > [spec.template.spec.containers[0].name: Required value, > spec.template.spec.containers[0].image: Required value]. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name, > message=Required value, reason=FieldValueRequired, additionalProperties={}), > StatusCause(field=spec.template.spec.containers[0].image, message=Required > value, reason=FieldValueRequired, additionalProperties={})], group=apps, > kind=Deployment, name=basic-example, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=Deployment.apps > "basic-example" is invalid: [spec.template.spec.containers[0].name: Required > value, spec.template.spec.containers[0].image: Required value], > metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Invalid, status=Failure, additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > > Env: > Flink version:Flink 1.16 > Flink Kubernetes Operator:1.3.1 > > *Last* ** *stable spec:* > apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1] > kind: FlinkDeployment > metadata: > name: basic-example > spec: > image: flink:1.16 > flinkVersion: v1_16 > flinkConfiguration: > taskmanager.numberOfTaskSlots: "2" > kubernetes.operator.deployment.rollback.enabled: true > state.savepoints.dir: s3://flink-data/savepoints > state.checkpoints.dir: s3://flink-data/checkpoints > high-availability: > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > high-availability.storageDir: s3://flink-data/ha > serviceAccount: flink > *podTemplate:* > *spec:* > *containers:* > *- name: flink-main-container* > *env:* > *- name: TZ* > *value: Asia/Shanghai* > jobManager: > resource: > memory: "2048m" > cpu: 1 > taskManager: > resource: > memory: "2048m" > cpu: 1 > job: > jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar > parallelism: 2 > upgradeMode: stateless > > *new Spec:* > apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1] > kind: FlinkDeployment > metadata: > name: basic-example > spec: > image: flink:1.16 > flinkVersion: v1_16 > flinkConfiguration: > taskmanager.numberOfTaskSlots: "2" > kubernetes.operator.deployment.rollback.enabled: true > state.savepoints.dir: s3://flink-data/savepoints > state.checkpoints.dir: s3://flink-data/checkpoints > high-availability: > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > high-availability.storageDir: s3://flink-data/ha > serviceAccount: flink > *podTemplate:* > *spec:* >
[jira] [Commented] (FLINK-31203) Application upgrade rollbacks failed in Flink Kubernetes Operator
[ https://issues.apache.org/jira/browse/FLINK-31203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693865#comment-17693865 ] hjw commented on FLINK-31203: - @[Gyula Fora|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gyfora] > Application upgrade rollbacks failed in Flink Kubernetes Operator > - > > Key: FLINK-31203 > URL: https://issues.apache.org/jira/browse/FLINK-31203 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.3.1 >Reporter: hjw >Priority: Major > > I make a test on the Application upgrade rollback feature, but this function > fails.The Flink application mode job cannot roll back to last stable spec. > As shown in the follow example, I declare a error pod-template without a > container named flink-main-container to test rollback feature. > However, only the error of deploying the flink application job failed without > rollback. > > Error: > org.apache.flink.client.deployment.ClusterDeploymentException: Could not > create Kubernetes cluster "basic-example". > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292) > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: POST at: > https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. > Message: Deployment.apps "basic-example" is invalid: > [spec.template.spec.containers[0].name: Required value, > spec.template.spec.containers[0].image: Required value]. Received status: > Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name, > message=Required value, reason=FieldValueRequired, additionalProperties={}), > StatusCause(field=spec.template.spec.containers[0].image, message=Required > value, reason=FieldValueRequired, additionalProperties={})], group=apps, > kind=Deployment, name=basic-example, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=Deployment.apps > "basic-example" is invalid: [spec.template.spec.containers[0].name: Required > value, spec.template.spec.containers[0].image: Required value], > metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Invalid, status=Failure, additionalProperties={}). > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) > at > io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) > > Env: > Flink version:Flink 1.16 > Flink Kubernetes Operator:1.3.1 > > *Last* ** *stable spec:* > apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1] > kind: FlinkDeployment > metadata: > name: basic-example > spec: > image: flink:1.16 > flinkVersion: v1_16 > flinkConfiguration: > taskmanager.numberOfTaskSlots: "2" > kubernetes.operator.deployment.rollback.enabled: true > state.savepoints.dir: s3://flink-data/savepoints > state.checkpoints.dir: s3://flink-data/checkpoints > high-availability: > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > high-availability.storageDir: s3://flink-data/ha > serviceAccount: flink > *podTemplate:* > *spec:* > *containers:* > *- name: flink-main-container* > *env:* > *- name: TZ* > *value: Asia/Shanghai* > jobManager: > resource: > memory: "2048m" > cpu: 1 > taskManager: > resource: > memory: "2048m" > cpu: 1 > job: > jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar > parallelism: 2 > upgradeMode: stateless > > *new Spec:* > apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1] > kind: FlinkDeployment > metadata: > name: basic-example > spec: > image: flink:1.16 > flinkVersion: v1_16 > flinkConfiguration: > taskmanager.numberOfTaskSlots: "2" > kubernetes.operator.deployment.rollback.enabled: true > state.savepoints.dir: s3://flink-data/savepoints > state.checkpoints.dir: s3://flink-data/checkpoints > high-availability: > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > high-availability.storageDir: s3://flink-data/ha > serviceAccount: flink > *podTemplate:* > *spec:* > *containers:* > *- env:* > *- name: TZ* > *value: Asia/Shanghai* > jobManager: > resource: > memory: "2048m" > cpu: 1 > taskManager: > resource: >
[jira] [Created] (FLINK-31203) Application upgrade rollbacks failed in Flink Kubernetes Operator
hjw created FLINK-31203: --- Summary: Application upgrade rollbacks failed in Flink Kubernetes Operator Key: FLINK-31203 URL: https://issues.apache.org/jira/browse/FLINK-31203 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.3.1 Reporter: hjw I make a test on the Application upgrade rollback feature, but this function fails.The Flink application mode job cannot roll back to last stable spec. As shown in the follow example, I declare a error pod-template without a container named flink-main-container to test rollback feature. However, only the error of deploying the flink application job failed without rollback. Error: org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster "basic-example". at org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292) Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. Message: Deployment.apps "basic-example" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value]. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name, message=Required value, reason=FieldValueRequired, additionalProperties={}), StatusCause(field=spec.template.spec.containers[0].image, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=apps, kind=Deployment, name=basic-example, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "basic-example" is invalid: [spec.template.spec.containers[0].name: Required value, spec.template.spec.containers[0].image: Required value], metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560) Env: Flink version:Flink 1.16 Flink Kubernetes Operator:1.3.1 *Last* ** *stable spec:* apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1] kind: FlinkDeployment metadata: name: basic-example spec: image: flink:1.16 flinkVersion: v1_16 flinkConfiguration: taskmanager.numberOfTaskSlots: "2" kubernetes.operator.deployment.rollback.enabled: true state.savepoints.dir: s3://flink-data/savepoints state.checkpoints.dir: s3://flink-data/checkpoints high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3://flink-data/ha serviceAccount: flink *podTemplate:* *spec:* *containers:* *- name: flink-main-container* *env:* *- name: TZ* *value: Asia/Shanghai* jobManager: resource: memory: "2048m" cpu: 1 taskManager: resource: memory: "2048m" cpu: 1 job: jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar parallelism: 2 upgradeMode: stateless *new Spec:* apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1] kind: FlinkDeployment metadata: name: basic-example spec: image: flink:1.16 flinkVersion: v1_16 flinkConfiguration: taskmanager.numberOfTaskSlots: "2" kubernetes.operator.deployment.rollback.enabled: true state.savepoints.dir: s3://flink-data/savepoints state.checkpoints.dir: s3://flink-data/checkpoints high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: s3://flink-data/ha serviceAccount: flink *podTemplate:* *spec:* *containers:* *- env:* *- name: TZ* *value: Asia/Shanghai* jobManager: resource: memory: "2048m" cpu: 1 taskManager: resource: memory: "2048m" cpu: 1 job: jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar parallelism: 2 upgradeMode: stateless -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28758) Failed to stop with savepoint
[ https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616986#comment-17616986 ] hjw commented on FLINK-28758: - Other people also encounter similar problems posted by Chinese Mailing Lists. !image-2022-10-13-19-47-56-635.png! > Failed to stop with savepoint > -- > > Key: FLINK-28758 > URL: https://issues.apache.org/jira/browse/FLINK-28758 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Runtime / Checkpointing, Runtime / > Task >Affects Versions: 1.15.0 > Environment: Flink version:1.15.0 > deploy mode :K8s applicaiton Mode. local mini cluster also have this > problem. > Kafka Connector : use Kafka SourceFunction . No new Api. >Reporter: hjw >Priority: Major > Attachments: image-2022-10-13-19-47-56-635.png > > > I post a stop with savepoint request to Flink Job throught rest api. > A Error happened in Kafka connector close. > The job will enter restarting . > It is successful to use savepoint command alone. > {code:java} > 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean > (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer > clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed > 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean > (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info > kafka.consumer for consumer-hjw-4 unregistered > 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean > (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer > clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed > 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG > org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask > (operators closed: false, cancelled: false) > 13:33:42.860 [jobmanager-io-thread-4] INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline > checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job > d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ > 127.0.0.1 (dataPort=-1). > org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: > nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed. > at > org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388) > at > org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331) > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343) > Caused by: org.apache.flink.util.SerializedThrowable: > org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > at > java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943) > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) > ... 3 common frames omitted > Caused by: org.apache.flink.util.SerializedThrowable: null > at > org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177) > at > org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945) > at > org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxPro
[jira] [Updated] (FLINK-28758) Failed to stop with savepoint
[ https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-28758: Attachment: image-2022-10-13-19-47-56-635.png > Failed to stop with savepoint > -- > > Key: FLINK-28758 > URL: https://issues.apache.org/jira/browse/FLINK-28758 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Runtime / Checkpointing, Runtime / > Task >Affects Versions: 1.15.0 > Environment: Flink version:1.15.0 > deploy mode :K8s applicaiton Mode. local mini cluster also have this > problem. > Kafka Connector : use Kafka SourceFunction . No new Api. >Reporter: hjw >Priority: Major > Attachments: image-2022-10-13-19-47-56-635.png > > > I post a stop with savepoint request to Flink Job throught rest api. > A Error happened in Kafka connector close. > The job will enter restarting . > It is successful to use savepoint command alone. > {code:java} > 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean > (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer > clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed > 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean > (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info > kafka.consumer for consumer-hjw-4 unregistered > 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean > (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer > clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed > 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG > org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask > (operators closed: false, cancelled: false) > 13:33:42.860 [jobmanager-io-thread-4] INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline > checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job > d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ > 127.0.0.1 (dataPort=-1). > org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: > nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed. > at > org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388) > at > org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331) > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343) > Caused by: org.apache.flink.util.SerializedThrowable: > org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > at > java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943) > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) > ... 3 common frames omitted > Caused by: org.apache.flink.util.SerializedThrowable: null > at > org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177) > at > org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945) > at > org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop
[jira] [Commented] (FLINK-28758) Failed to stop with savepoint
[ https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616951#comment-17616951 ] hjw commented on FLINK-28758: - Hi [~Yanfei Lei] . Sorry, I made a mistake in my description. The right description is I posted the "stop with savepoint" request. I have corrected the error description. > Failed to stop with savepoint > -- > > Key: FLINK-28758 > URL: https://issues.apache.org/jira/browse/FLINK-28758 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Runtime / Checkpointing, Runtime / > Task >Affects Versions: 1.15.0 > Environment: Flink version:1.15.0 > deploy mode :K8s applicaiton Mode. local mini cluster also have this > problem. > Kafka Connector : use Kafka SourceFunction . No new Api. >Reporter: hjw >Priority: Major > > I post a stop with savepoint request to Flink Job throught rest api. > A Error happened in Kafka connector close. > The job will enter restarting . > It is successful to use savepoint command alone. > {code:java} > 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean > (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer > clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed > 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean > (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info > kafka.consumer for consumer-hjw-4 unregistered > 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean > (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer > clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed > 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG > org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask > (operators closed: false, cancelled: false) > 13:33:42.860 [jobmanager-io-thread-4] INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline > checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job > d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ > 127.0.0.1 (dataPort=-1). > org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: > nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed. > at > org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388) > at > org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331) > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343) > Caused by: org.apache.flink.util.SerializedThrowable: > org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > at > java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943) > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) > ... 3 common frames omitted > Caused by: org.apache.flink.util.SerializedThrowable: null > at > org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177) > at > org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945) > at > org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProce
[jira] [Updated] (FLINK-28758) Failed to stop with savepoint
[ https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-28758: Description: I post a stop with savepoint request to Flink Job throught rest api. A Error happened in Kafka connector close. The job will enter restarting . It is successful to use savepoint command alone. {code:java} 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for consumer-hjw-4 unregistered 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask (operators closed: false, cancelled: false) 13:33:42.860 [jobmanager-io-thread-4] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 127.0.0.1 (dataPort=-1). org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed. at org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388) at org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331) at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343) Caused by: org.apache.flink.util.SerializedThrowable: org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943) at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) ... 3 common frames omitted Caused by: org.apache.flink.util.SerializedThrowable: null at org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177) at org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945) at org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) at java.lang.Thread.run(Thread.java:748) 13:34:00.925 [flink-akka.actor.default-dispatcher-21] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request. 13:34:00.926 [flink-akka.actor.default-dispatcher-22] DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger heartbeat request. 13:34:00.926 [flink-akka.actor.default-dispatcher-21] DEBUG org.apache.flink.runtime.taskex
[jira] [Updated] (FLINK-29187) mvn spotless:apply to format code in Windows
[ https://issues.apache.org/jira/browse/FLINK-29187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-29187: Component/s: Build System Build System / CI > mvn spotless:apply to format code in Windows > > > Key: FLINK-29187 > URL: https://issues.apache.org/jira/browse/FLINK-29187 > Project: Flink > Issue Type: Improvement > Components: Build System, Build System / CI >Affects Versions: 1.15.2 >Reporter: hjw >Priority: Major > > When use mvn spotless:apply command to format flink code in Windows, the EOL > of file will be replced to CRLF. > I think we can add a `.gitattributes` file to force the code eol to be LF. > The realtion issue is [EOL CRLF vs > LF|https://github.com/diffplug/spotless/issues/39] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-29187) mvn spotless:apply to format code in Windows
[ https://issues.apache.org/jira/browse/FLINK-29187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599938#comment-17599938 ] hjw edited comment on FLINK-29187 at 9/3/22 2:27 PM: - I'd like to get a ticket to fix it was (Author: JIRAUSER280998): I'd like to get a ticket to fix it if it's confirmed that this is a problem > mvn spotless:apply to format code in Windows > > > Key: FLINK-29187 > URL: https://issues.apache.org/jira/browse/FLINK-29187 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.15.2 >Reporter: hjw >Priority: Major > > When use mvn spotless:apply command to format flink code in Windows, the EOL > of file will be replced to CRLF. > I think we can add a `.gitattributes` file to force the code eol to be LF. > The realtion issue is [EOL CRLF vs > LF|https://github.com/diffplug/spotless/issues/39] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29187) mvn spotless:apply to format code in Windows
[ https://issues.apache.org/jira/browse/FLINK-29187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599938#comment-17599938 ] hjw commented on FLINK-29187: - I'd like to get a ticket to fix it if it's confirmed that this is a problem > mvn spotless:apply to format code in Windows > > > Key: FLINK-29187 > URL: https://issues.apache.org/jira/browse/FLINK-29187 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.15.2 >Reporter: hjw >Priority: Major > > When use mvn spotless:apply command to format flink code in Windows, the EOL > of file will be replced to CRLF. > I think we can add a `.gitattributes` file to force the code eol to be LF. > The realtion issue is [EOL CRLF vs > LF|https://github.com/diffplug/spotless/issues/39] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-29187) mvn spotless:apply to format code in Windows
[ https://issues.apache.org/jira/browse/FLINK-29187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-29187: Description: When use mvn spotless:apply command to format flink code in Windows, the EOL of file will be replced to CRLF. I think we can add a `.gitattributes` file to force the code eol to be LF. The realtion issue is [EOL CRLF vs LF|https://github.com/diffplug/spotless/issues/39] was: When use mvn spotless:apply command to format flink code in Windows, the EOL of file will be replced to CRLF. I think we can add a `.gitattributes` file to force the code eol to be LF. The realtion issue is [链接标题|https://github.com/diffplug/spotless/issues/39] > mvn spotless:apply to format code in Windows > > > Key: FLINK-29187 > URL: https://issues.apache.org/jira/browse/FLINK-29187 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.15.2 >Reporter: hjw >Priority: Major > > When use mvn spotless:apply command to format flink code in Windows, the EOL > of file will be replced to CRLF. > I think we can add a `.gitattributes` file to force the code eol to be LF. > The realtion issue is [EOL CRLF vs > LF|https://github.com/diffplug/spotless/issues/39] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29187) mvn spotless:apply to format code in Windows
hjw created FLINK-29187: --- Summary: mvn spotless:apply to format code in Windows Key: FLINK-29187 URL: https://issues.apache.org/jira/browse/FLINK-29187 Project: Flink Issue Type: Improvement Affects Versions: 1.15.2 Reporter: hjw When use mvn spotless:apply command to format flink code in Windows, the EOL of file will be replced to CRLF. I think we can add a `.gitattributes` file to force the code eol to be LF. The realtion issue is [链接标题|https://github.com/diffplug/spotless/issues/39] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29089) Error when run test case in Windows
[ https://issues.apache.org/jira/browse/FLINK-29089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583712#comment-17583712 ] hjw commented on FLINK-29089: - [~wangyang0918] I'd like to file a PR to fix it. Please help assign the ticket to me. > Error when run test case in Windows > --- > > Key: FLINK-29089 > URL: https://issues.apache.org/jira/browse/FLINK-29089 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes >Affects Versions: 1.15.1 > Environment: deploy env: Windows10 > flink version:1.15 > >Reporter: hjw >Priority: Major > > When I run mvn clean install ,It will run Flink test case . > However , I get Error: > [ERROR] Failures: > [ERROR] > KubernetesClusterDescriptorTest.testDeployApplicationClusterWithNonLocalSchema:155 > Previous method call should have failed but it returned: > org.apache.flink.kubernetes.KubernetesClusterDescriptor$$Lambda$839/1619964974@70e5737f > [ERROR] > AbstractKubernetesParametersTest.testGetLocalHadoopConfigurationDirectoryFromHadoop1HomeEnv:132->runTestWithEmptyEnv:149->lambda$testGetLocalHadoopConfigurationDirectoryFromHadoop1HomeEnv$3:141 > Expected: is "C:\Users\10104\AppData\Local\Temp\junit5662202040601670287/conf" > but: was > "C:\Users\10104\AppData\Local\Temp\junit5662202040601670287\conf" > [ERROR] > AbstractKubernetesParametersTest.testGetLocalHadoopConfigurationDirectoryFromHadoop2HomeEnv:117->runTestWithEmptyEnv:149->lambda$testGetLocalHadoopConfigurationDirectoryFromHadoop2HomeEnv$2:126 > Expected: is > "C:\Users\10104\AppData\Local\Temp\junit7094401822178578683/etc/hadoop" > but: was > "C:\Users\10104\AppData\Local\Temp\junit7094401822178578683\etc\hadoop" > [ERROR] > KubernetesUtilsTest.testLoadPodFromTemplateWithNonExistPathShouldFail:110 > Expected: Expected error message is "Pod template file > /path/of/non-exist.yaml does not exist." > but: The throwable template file \path\of\non-exist.yaml does not exist.> does not contain the > expected error message "Pod template file /path/of/non-exist.yaml does not > exist." > > I judge the error occurred due to different fileSysyem(unix,Windows..etc) > separators. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29089) Error when run test case in Windows
hjw created FLINK-29089: --- Summary: Error when run test case in Windows Key: FLINK-29089 URL: https://issues.apache.org/jira/browse/FLINK-29089 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.15.1 Environment: deploy env: Windows10 flink version:1.15 Reporter: hjw When I run mvn clean install ,It will run Flink test case . However , I get Error: [ERROR] Failures: [ERROR] KubernetesClusterDescriptorTest.testDeployApplicationClusterWithNonLocalSchema:155 Previous method call should have failed but it returned: org.apache.flink.kubernetes.KubernetesClusterDescriptor$$Lambda$839/1619964974@70e5737f [ERROR] AbstractKubernetesParametersTest.testGetLocalHadoopConfigurationDirectoryFromHadoop1HomeEnv:132->runTestWithEmptyEnv:149->lambda$testGetLocalHadoopConfigurationDirectoryFromHadoop1HomeEnv$3:141 Expected: is "C:\Users\10104\AppData\Local\Temp\junit5662202040601670287/conf" but: was "C:\Users\10104\AppData\Local\Temp\junit5662202040601670287\conf" [ERROR] AbstractKubernetesParametersTest.testGetLocalHadoopConfigurationDirectoryFromHadoop2HomeEnv:117->runTestWithEmptyEnv:149->lambda$testGetLocalHadoopConfigurationDirectoryFromHadoop2HomeEnv$2:126 Expected: is "C:\Users\10104\AppData\Local\Temp\junit7094401822178578683/etc/hadoop" but: was "C:\Users\10104\AppData\Local\Temp\junit7094401822178578683\etc\hadoop" [ERROR] KubernetesUtilsTest.testLoadPodFromTemplateWithNonExistPathShouldFail:110 Expected: Expected error message is "Pod template file /path/of/non-exist.yaml does not exist." but: The throwable does not contain the expected error message "Pod template file /path/of/non-exist.yaml does not exist." I judge the error occurred due to different fileSysyem(unix,Windows..etc) separators. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema
[ https://issues.apache.org/jira/browse/FLINK-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582588#comment-17582588 ] hjw commented on FLINK-28915: - [~wangyang0918] Could you tell more about how to use the credential for filesystems(hdfs, S3, OSS, GFS, etc.) ?I have completed the basic development, and have completed the manual testing of part of the filesystem (OSS, s3, local, file). But other filesystems I need environment for testing. > Flink Native k8s mode jar localtion support s3 schema > -- > > Key: FLINK-28915 > URL: https://issues.apache.org/jira/browse/FLINK-28915 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes, flink-contrib >Reporter: hjw >Assignee: hjw >Priority: Major > > As the Flink document show , local is the only supported scheme in Native k8s > deployment. > Is there have a plan to support s3 filesystem? thx. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema
[ https://issues.apache.org/jira/browse/FLINK-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581433#comment-17581433 ] hjw commented on FLINK-28915: - [~wangyang0918] [~aitozi] Does the test case need to cover all Flink supported filesystems? From [https://github.com/apache/flink-kubernetes-operator/pull/168] , only http or https schema are covreed. Do other file systems require manual verification? > Flink Native k8s mode jar localtion support s3 schema > -- > > Key: FLINK-28915 > URL: https://issues.apache.org/jira/browse/FLINK-28915 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes, flink-contrib >Affects Versions: 1.15.0, 1.15.1 >Reporter: hjw >Assignee: hjw >Priority: Major > > As the Flink document show , local is the only supported scheme in Native k8s > deployment. > Is there have a plan to support s3 filesystem? thx. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema
[ https://issues.apache.org/jira/browse/FLINK-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580362#comment-17580362 ] hjw commented on FLINK-28915: - Thanks [~aitozi] for the advice . Thanks [~wangyang0918] assign the ticket. > Flink Native k8s mode jar localtion support s3 schema > -- > > Key: FLINK-28915 > URL: https://issues.apache.org/jira/browse/FLINK-28915 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes, flink-contrib >Affects Versions: 1.15.0, 1.15.1 >Reporter: hjw >Assignee: hjw >Priority: Major > > As the Flink document show , local is the only supported scheme in Native k8s > deployment. > Is there have a plan to support s3 filesystem? thx. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema
[ https://issues.apache.org/jira/browse/FLINK-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579729#comment-17579729 ] hjw commented on FLINK-28915: - [~aitozi] thank you. I'm also interested in implementing this feature,but I never commit a pr in the past.May I be involved in the implementation of this pr please? Or is there something I can do in this pr? I want to participate in the Flink community contribution. > Flink Native k8s mode jar localtion support s3 schema > -- > > Key: FLINK-28915 > URL: https://issues.apache.org/jira/browse/FLINK-28915 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes, flink-contrib >Affects Versions: 1.15.0, 1.15.1 >Reporter: hjw >Priority: Major > > As the Flink document show , local is the only supported scheme in Native k8s > deployment. > Is there have a plan to support s3 filesystem? thx. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema
[ https://issues.apache.org/jira/browse/FLINK-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579603#comment-17579603 ] hjw commented on FLINK-28915: - I successful modify the Class KubernetesApplicationClusterEntryPoint.java to achieve the purpose that support S3 schema. Here the modify logic: 1.read the jar localtion in s3 from pipeline.jars parameters. 2.download the jar from s3 to local path of pod(jobmanager). 3.replace the local path (local schema) to pipeline.jars parameters. I know such an implementation is not elegant and not compatible with other remote DFS schema.(OSS.HDFS .etc). I think the more elegant implementation is to use Flink filesystem to connect each DFS schema. Howerver, I notice the fact that Flink filesystem is configred in Starting flink cluster. But the job PackagedProgram is inited in KubernetesApplicationClusterEntryPoint before the code "ClusterEntrypoint.runclusterEntrypoint(KubernetesApplicationClusterEntryPoint)". > Flink Native k8s mode jar localtion support s3 schema > -- > > Key: FLINK-28915 > URL: https://issues.apache.org/jira/browse/FLINK-28915 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes, flink-contrib >Affects Versions: 1.15.0, 1.15.1 >Reporter: hjw >Priority: Major > > As the Flink document show , local is the only supported scheme in Native k8s > deployment. > Is there have a plan to support s3 filesystem? thx. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema
hjw created FLINK-28915: --- Summary: Flink Native k8s mode jar localtion support s3 schema Key: FLINK-28915 URL: https://issues.apache.org/jira/browse/FLINK-28915 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes, flink-contrib Affects Versions: 1.15.1, 1.15.0 Reporter: hjw As the Flink document show , local is the only supported scheme in Native k8s deployment. Is there have a plan to support s3 filesystem? thx. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-28758) Failed to stop with savepoint
[ https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-28758: Component/s: Runtime / Checkpointing Runtime / Task > Failed to stop with savepoint > -- > > Key: FLINK-28758 > URL: https://issues.apache.org/jira/browse/FLINK-28758 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Runtime / Checkpointing, Runtime / > Task >Affects Versions: 1.15.0 > Environment: Flink version:1.15.0 > deploy mode :K8s applicaiton Mode. local mini cluster also have this > problem. > Kafka Connector : use Kafka SourceFunction . No new Api. >Reporter: hjw >Priority: Major > > I post a save with savepoint request to Flink Job throught rest api. > A Error happened in Kafka connector close. > The job will enter restarting . > It is successful to use savepoint command alone. > {code:java} > 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean > (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer > clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed > 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean > (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info > kafka.consumer for consumer-hjw-4 unregistered > 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean > (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer > clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed > 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG > org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask > (operators closed: false, cancelled: false) > 13:33:42.860 [jobmanager-io-thread-4] INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline > checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job > d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ > 127.0.0.1 (dataPort=-1). > org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: > nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed. > at > org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388) > at > org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331) > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343) > Caused by: org.apache.flink.util.SerializedThrowable: > org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > at > java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943) > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) > ... 3 common frames omitted > Caused by: org.apache.flink.util.SerializedThrowable: null > at > org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177) > at > org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945) > at > org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) > at > org.a
[jira] [Updated] (FLINK-28758) Failed to stop with savepoint
[ https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-28758: Description: I post a save with savepoint request to Flink Job throught rest api. A Error happened in Kafka connector close. The job will enter restarting . It is successful to use savepoint command alone. {code:java} 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for consumer-hjw-4 unregistered 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask (operators closed: false, cancelled: false) 13:33:42.860 [jobmanager-io-thread-4] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 127.0.0.1 (dataPort=-1). org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed. at org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388) at org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331) at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343) Caused by: org.apache.flink.util.SerializedThrowable: org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943) at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) ... 3 common frames omitted Caused by: org.apache.flink.util.SerializedThrowable: null at org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177) at org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945) at org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) at java.lang.Thread.run(Thread.java:748) 13:34:00.925 [flink-akka.actor.default-dispatcher-21] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request. 13:34:00.926 [flink-akka.actor.default-dispatcher-22] DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger heartbeat request. 13:34:00.926 [flink-akka.actor.default-dispatcher-21] DEBUG org.apache.flink.runtime.task
[jira] [Updated] (FLINK-28758) Failed to stop with savepoint
[ https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-28758: Description: I post a save with savepoint request to Flink Job throught rest api. A Error happened in Kafka connector close. The job will enter restarting . {code:java} 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for consumer-hjw-4 unregistered 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask (operators closed: false, cancelled: false) 13:33:42.860 [jobmanager-io-thread-4] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 127.0.0.1 (dataPort=-1). org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed. at org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388) at org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331) at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343) Caused by: org.apache.flink.util.SerializedThrowable: org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943) at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) ... 3 common frames omitted Caused by: org.apache.flink.util.SerializedThrowable: null at org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177) at org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945) at org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) at java.lang.Thread.run(Thread.java:748) 13:34:00.925 [flink-akka.actor.default-dispatcher-21] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request. 13:34:00.926 [flink-akka.actor.default-dispatcher-22] DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger heartbeat request. 13:34:00.926 [flink-akka.actor.default-dispatcher-21] DEBUG org.apache.flink.runtime.taskexecutor.TaskExecutor - Received heartbeat reques
[jira] [Created] (FLINK-28758) Failed to stop with savepoint
hjw created FLINK-28758: --- Summary: Failed to stop with savepoint Key: FLINK-28758 URL: https://issues.apache.org/jira/browse/FLINK-28758 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.15.0 Environment: Flink version:1.15.0 deploy mode :K8s applicaiton Mode. local mini cluster also have this problem. Kafka Connector : use Kafka SourceFunction . No new Api. Reporter: hjw I post a save with savepoint request to Flink Job throught rest api. A Error happened in Kafka connector close. {code:java} 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for consumer-hjw-4 unregistered 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask (operators closed: false, cancelled: false) 13:33:42.860 [jobmanager-io-thread-4] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 127.0.0.1 (dataPort=-1). org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed. at org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388) at org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331) at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343) Caused by: org.apache.flink.util.SerializedThrowable: org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943) at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) ... 3 common frames omitted Caused by: org.apache.flink.util.SerializedThrowable: null at org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177) at org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945) at org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) at java.lang.Thread.run(Thread.java:748) 13:34:00.925 [flink-akka.actor.default-dispatcher-21] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger
[jira] [Comment Edited] (FLINK-27539) support consuming update and delete changes In Windowing TVFs
[ https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535961#comment-17535961 ] hjw edited comment on FLINK-27539 at 5/12/22 8:33 AM: -- [~martijnvisser] yes, I am interested in the total amount of money for a day. In fact, I perfer Calculate Window to TUMBLE Window, because I want need the CUMULATE window result every time step size . But Calculate Window is not supported in Group Window Aggregation. By the way, My source table is a cdc table , so update and delete should be handle correct in Window. thx. was (Author: JIRAUSER280998): [~martijnvisser]yes, I am interested in the total amount of money for a day. In fact, I perfer Calculate Window to TUMBLE Window, because I want need the CUMULATE window result every time step size . But Calculate Window is not supported in Group Window Aggregation. By the way, My source table is a cdc table , so update and delete should be handle correct in Window. thx. > support consuming update and delete changes In Windowing TVFs > - > > Key: FLINK-27539 > URL: https://issues.apache.org/jira/browse/FLINK-27539 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.15.0 >Reporter: hjw >Priority: Major > > custom_kafka is a cdc table > sql: > {code:java} > select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as > total,name > from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' > MINUTES,interval '1' DAY )) > where status='1' > group by name,window_start,window_end; > {code} > Error > {code:java} > Exception in thread "main" org.apache.flink.table.api.TableException: > StreamPhysicalWindowAggregate doesn't support consuming update and delete > changes which is produced by node TableSourceScan(table=[[default_catalog, > default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL > SECOND)]]], fields=[name, money, status, createtime, operation_ts]) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:315) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:353) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:342) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:341) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) > at scala.collection.immutable.Range.foreach(Range.scala:155) > at scala.collection.TraversableLike.map(TraversableLike.scala:233) > at scala.collection.TraversableLike.map$(TraversableLike.scala:226) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:341) > {code} > But I found Group Window Aggregation is works when use cdc table > {code:java} > select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') > as date_str,sum(money) as total,name > from custom_kafka > where status='1' > group by name,TUMBLE(createtime,interval '10' MINUTES) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (FLINK-27539) support consuming update and delete changes In Windowing TVFs
[ https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535961#comment-17535961 ] hjw edited comment on FLINK-27539 at 5/12/22 8:21 AM: -- [~martijnvisser]yes, I am interested in the total amount of money for a day. In fact, I perfer Calculate Window to TUMBLE Window, because I want need the CUMULATE window result every time step size . But Calculate Window is not supported in Group Window Aggregation. By the way, My source table is a cdc table , so update and delete should be handle correct in Window. thx. was (Author: JIRAUSER280998): [~martijnvisser]yes, I am interested in the total amount of money for a day. In fact, I perfer Calculate Window to TUMBLE Window, because I want need the CUMULATE window result every time step size . But Calculate Window is not supported in Group Window Aggregation. > support consuming update and delete changes In Windowing TVFs > - > > Key: FLINK-27539 > URL: https://issues.apache.org/jira/browse/FLINK-27539 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.15.0 >Reporter: hjw >Priority: Major > > custom_kafka is a cdc table > sql: > {code:java} > select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as > total,name > from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' > MINUTES,interval '1' DAY )) > where status='1' > group by name,window_start,window_end; > {code} > Error > {code:java} > Exception in thread "main" org.apache.flink.table.api.TableException: > StreamPhysicalWindowAggregate doesn't support consuming update and delete > changes which is produced by node TableSourceScan(table=[[default_catalog, > default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL > SECOND)]]], fields=[name, money, status, createtime, operation_ts]) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:315) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:353) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:342) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:341) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) > at scala.collection.immutable.Range.foreach(Range.scala:155) > at scala.collection.TraversableLike.map(TraversableLike.scala:233) > at scala.collection.TraversableLike.map$(TraversableLike.scala:226) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:341) > {code} > But I found Group Window Aggregation is works when use cdc table > {code:java} > select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') > as date_str,sum(money) as total,name > from custom_kafka > where status='1' > group by name,TUMBLE(createtime,interval '10' MINUTES) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-27539) support consuming update and delete changes In Windowing TVFs
[ https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535961#comment-17535961 ] hjw commented on FLINK-27539: - [~martijnvisser]yes, I am interested in the total amount of money for a day. In fact, I perfer Calculate Window to TUMBLE Window, because I want need the CUMULATE window result every time step size . But Calculate Window is not supported in Group Window Aggregation. > support consuming update and delete changes In Windowing TVFs > - > > Key: FLINK-27539 > URL: https://issues.apache.org/jira/browse/FLINK-27539 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.15.0 >Reporter: hjw >Priority: Major > > custom_kafka is a cdc table > sql: > {code:java} > select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as > total,name > from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' > MINUTES,interval '1' DAY )) > where status='1' > group by name,window_start,window_end; > {code} > Error > {code:java} > Exception in thread "main" org.apache.flink.table.api.TableException: > StreamPhysicalWindowAggregate doesn't support consuming update and delete > changes which is produced by node TableSourceScan(table=[[default_catalog, > default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL > SECOND)]]], fields=[name, money, status, createtime, operation_ts]) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:315) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:353) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:342) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:341) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) > at scala.collection.immutable.Range.foreach(Range.scala:155) > at scala.collection.TraversableLike.map(TraversableLike.scala:233) > at scala.collection.TraversableLike.map$(TraversableLike.scala:226) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:341) > {code} > But I found Group Window Aggregation is works when use cdc table > {code:java} > select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') > as date_str,sum(money) as total,name > from custom_kafka > where status='1' > group by name,TUMBLE(createtime,interval '10' MINUTES) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-27539) support consuming update and delete changes In Windowing TVFs
[ https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534193#comment-17534193 ] hjw commented on FLINK-27539: - Thanks for your response [~martijnvisser] I want to I want to count the total amount of money in a day. But the money of record could be changed. Ex: name money createtime a 100 2022-05-10 15:23:09 The window result name total a100 money changed a 100->200 2022-05-10 15:23:09 The window result name total a100->200 > support consuming update and delete changes In Windowing TVFs > - > > Key: FLINK-27539 > URL: https://issues.apache.org/jira/browse/FLINK-27539 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.15.0 >Reporter: hjw >Priority: Major > > custom_kafka is a cdc table > sql: > {code:java} > select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as > total,name > from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' > MINUTES,interval '1' DAY )) > where status='1' > group by name,window_start,window_end; > {code} > Error > {code:java} > Exception in thread "main" org.apache.flink.table.api.TableException: > StreamPhysicalWindowAggregate doesn't support consuming update and delete > changes which is produced by node TableSourceScan(table=[[default_catalog, > default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL > SECOND)]]], fields=[name, money, status, createtime, operation_ts]) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:315) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:353) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:342) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:341) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) > at scala.collection.immutable.Range.foreach(Range.scala:155) > at scala.collection.TraversableLike.map(TraversableLike.scala:233) > at scala.collection.TraversableLike.map$(TraversableLike.scala:226) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:341) > {code} > But I found Group Window Aggregation is works when use cdc table > {code:java} > select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') > as date_str,sum(money) as total,name > from custom_kafka > where status='1' > group by name,TUMBLE(createtime,interval '10' MINUTES) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (FLINK-27539) support consuming update and delete changes In Windowing TVFs
[ https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-27539: Description: custom_kafka is a cdc table sql: {code:java} select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as total,name from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' MINUTES,interval '1' DAY )) where status='1' group by name,window_start,window_end; {code} Error {code:java} Exception in thread "main" org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate doesn't support consuming update and delete changes which is produced by node TableSourceScan(table=[[default_catalog, default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL SECOND)]]], fields=[name, money, status, createtime, operation_ts]) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:315) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:353) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:342) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:341) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) at scala.collection.immutable.Range.foreach(Range.scala:155) at scala.collection.TraversableLike.map(TraversableLike.scala:233) at scala.collection.TraversableLike.map$(TraversableLike.scala:226) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:341) {code} But I found Group Window Aggregation is works when use cdc table {code:java} select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') as date_str,sum(money) as total,name from custom_kafka where status='1' group by name,TUMBLE(createtime,interval '10' MINUTES) {code} was: custom_kafka is a cdc table sql: {code:java} select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as total,name from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' MINUTES,interval '1' DAY )) where status='1' group by name,window_start,window_end; {code} Error {code:java} org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate doesn't support consuming update and delete changes which is produced by node TableSourceScan(table=[[default_catalog, default_database,custom_kafka]], fields=[name, money, status,createtime,operation_ts]) {code} But I found Group Window Aggregation is works when use cdc table {code:java} select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') as date_str,sum(money) as total,name from custom_kafka where status='1' group by name,TUMBLE(createtime,interval '10' MINUTES) {code} > support consuming update and delete changes In Windowing TVFs > - > > Key: FLINK-27539 > URL: https://issues.apache.org/jira/browse/FLINK-27539 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.15.0 >Reporter: hjw >Priority: Major > > custom_kafka is a cdc table > sql: > {code:java} > select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as > total,name > from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' > MINUTES,interval '1' DAY )) > where status='1' > group by name,window_start,window_end; > {code} > Error > {code:java} > Exception in thread "main" org.apache.flink.table.api.TableException: > StreamPhysicalWindowAggregate doesn't support consuming update and delete > changes which is produced by node TableSourceScan(table=[[default_catalog, > default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL > SECOND)]]], fields=[name, money, status, createtime, operation_ts]) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyK
[jira] [Updated] (FLINK-27539) support consuming update and delete changes In Windowing TVFs
[ https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-27539: Description: custom_kafka is a cdc table sql: {code:java} select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as total,name from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' MINUTES,interval '1' DAY )) where status='1' group by name,window_start,window_end; {code} Error {code:java} org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate doesn't support consuming update and delete changes which is produced by node TableSourceScan(table=[[default_catalog, default_database,custom_kafka]], fields=[name, money, status,createtime,operation_ts]) {code} But I found Group Window Aggregation is works when use cdc table {code:java} select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') as date_str,sum(money) as total,name from custom_kafka where status='1' group by name,TUMBLE(createtime,interval '10' MINUTES) {code} was: custom_kafka is a cdc table sql: {code:java} select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as total,name from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' MINUTES,interval '1' DAY )) where status='1' group by name,window_start,window_end; {code} Error org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate doesn't support consuming update and delete changes which is produced by node TableSourceScan(table=[[default_catalog, default_database,custom_kafka]], fields=[name, money, status,createtime,operation_ts]) But I found Group Window Aggregation is works when use cdc table select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') as date_str,sum(money) as total,name from custom_kafka where status='1' group by name,TUMBLE(createtime,interval '10' MINUTES) > support consuming update and delete changes In Windowing TVFs > - > > Key: FLINK-27539 > URL: https://issues.apache.org/jira/browse/FLINK-27539 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Affects Versions: 1.15.0 >Reporter: hjw >Priority: Major > > custom_kafka is a cdc table > sql: > {code:java} > select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as > total,name > from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' > MINUTES,interval '1' DAY )) > where status='1' > group by name,window_start,window_end; > {code} > Error > {code:java} > org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate > doesn't support consuming update and delete changes which is produced by node > TableSourceScan(table=[[default_catalog, default_database,custom_kafka]], > fields=[name, money, status,createtime,operation_ts]) > {code} > But I found Group Window Aggregation is works when use cdc table > {code:java} > select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') > as date_str,sum(money) as total,name > from custom_kafka > where status='1' > group by name,TUMBLE(createtime,interval '10' MINUTES) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27539) support consuming update and delete changes In Windowing TVFs
hjw created FLINK-27539: --- Summary: support consuming update and delete changes In Windowing TVFs Key: FLINK-27539 URL: https://issues.apache.org/jira/browse/FLINK-27539 Project: Flink Issue Type: Improvement Components: Table SQL / API Affects Versions: 1.15.0 Reporter: hjw custom_kafka is a cdc table sql: {code:java} select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as total,name from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' MINUTES,interval '1' DAY )) where status='1' group by name,window_start,window_end; {code} Error org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate doesn't support consuming update and delete changes which is produced by node TableSourceScan(table=[[default_catalog, default_database,custom_kafka]], fields=[name, money, status,createtime,operation_ts]) But I found Group Window Aggregation is works when use cdc table select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') as date_str,sum(money) as total,name from custom_kafka where status='1' group by name,TUMBLE(createtime,interval '10' MINUTES) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception
[ https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523336#comment-17523336 ] hjw commented on FLINK-27245: - hi。[~trohrmann] Can you help me look at this problem. thank you > Flink job on Yarn cannot revover when zookeeper in Exception > > > Key: FLINK-27245 > URL: https://issues.apache.org/jira/browse/FLINK-27245 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.7.2 > Environment: Flink :1.7.2 > Hdfs:3.1.1 > zookeeper:3.5.1 > HA defined in Flink-conf,yaml: > flink.security.enable: true > fs.output.always-create-directory: false > fs.overwrite-files: false > high-availability.job.delay: 10 s > high-availability.storageDir: hdfs:///flink/recovery > high-availability.zookeeper.client.acl: creator > high-availability.zookeeper.client.connection-timeout: 15000 > high-availability.zookeeper.client.max-retry-attempts: 3 > high-availability.zookeeper.client.retry-wait: 5000 > high-availability.zookeeper.client.session-timeout: 6 > high-availability.zookeeper.path.root: /flink > high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002 > high-availability: zookeeper >Reporter: hjw >Priority: Major > Attachments: Job-failed.txt, Job-recover-failed.txt, > zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log > > > Flink job cannot revover when zookeeper in Exception. > I noticed that the data in high-availability.storageDir deleled when Job > failed , resulting in failure when pulling up again. > Ps: The Job restart is done automatically by yarn, not manually > {code:java} > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,769 | INFO | [BlobServer shutdown hook] | > FileSystemBlobStore cleaning up > hdfs:/flink/recovery/application_1625720467511_45233. | > org.apache.flink.runtime.blob.FileSystemBlobStor > {code} > {code:java} > 2022-04-07 19:55:29,452 | INFO | [flink-akka.actor.default-dispatcher-4] | > Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore > (ZooKeeperSubmittedJobGraphStore.java:215) > 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | > Fatal error occurred in the cluster entrypoint. | > org.apache.flink.runtime.entrypoint.ClusterEntrypoint > (ClusterEntrypoint.java:408) > java.lang.RuntimeException: > org.apache.flink.runtime.client.JobExecutionException: Could not set up > JobManager > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >
[jira] [Updated] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception
[ https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-27245: Description: Flink job cannot revover when zookeeper in Exception. I noticed that the data in high-availability.storageDir deleled when Job failed , resulting in failure when pulling up again. Ps: The Job restart is done automatically by yarn, not manually {code:java} (SmarterLeaderLatch.java:570) 2022-04-07 19:54:29,002 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,769 | INFO | [BlobServer shutdown hook] | FileSystemBlobStore cleaning up hdfs:/flink/recovery/application_1625720467511_45233. | org.apache.flink.runtime.blob.FileSystemBlobStor {code} {code:java} 2022-04-07 19:55:29,452 | INFO | [flink-akka.actor.default-dispatcher-4] | Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore (ZooKeeperSubmittedJobGraphStore.java:215) 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | Fatal error occurred in the cluster entrypoint. | org.apache.flink.runtime.entrypoint.ClusterEntrypoint (ClusterEntrypoint.java:408) java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager at org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:176) at org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058) at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:308) at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34) ... 7 common frames omitted Caused by: java.lang.Exception: Cannot set up the user code libraries: File does not exist: /flink/recovery/application_1625720467511_45233/blob/job_1898637f2d11429bd5f5767ea1daaf79/blob_p-7128d0ae4a06a277e3b1182c99eb616ffd45b590-c90586d4a5d4641fcc0c9e4cab31c131 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:153) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesyste
[jira] [Commented] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception
[ https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522191#comment-17522191 ] hjw commented on FLINK-27245: - Hi.[~Zhanghao Chen] thank you for your response. I notice that this parameter is added to this issue [issue 10052|https://issues.apache.org/jira/browse/FLINK-10052] . I think that this parameter is used to solve the problem that the job will not restart when the curator status is suspended.But I confuse that the job exited abnormally and the high-availability.storageDir/cluster-id data was deleted, resulting in restart failure > Flink job on Yarn cannot revover when zookeeper in Exception > > > Key: FLINK-27245 > URL: https://issues.apache.org/jira/browse/FLINK-27245 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.7.2 > Environment: Flink :1.7.2 > Hdfs:3.1.1 > zookeeper:3.5.1 > HA defined in Flink-conf,yaml: > flink.security.enable: true > fs.output.always-create-directory: false > fs.overwrite-files: false > high-availability.job.delay: 10 s > high-availability.storageDir: hdfs:///flink/recovery > high-availability.zookeeper.client.acl: creator > high-availability.zookeeper.client.connection-timeout: 15000 > high-availability.zookeeper.client.max-retry-attempts: 3 > high-availability.zookeeper.client.retry-wait: 5000 > high-availability.zookeeper.client.session-timeout: 6 > high-availability.zookeeper.path.root: /flink > high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002 > high-availability: zookeeper >Reporter: hjw >Priority: Major > Attachments: Job-failed.txt, Job-recover-failed.txt, > zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log > > > Flink job cannot revover when zookeeper in Exception. > I noticed that the data in high-availability.storageDir deleled when Job > failed , resulting in failure when pulling up again. > {code:java} > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,769 | INFO | [BlobServer shutdown hook] | > FileSystemBlobStore cleaning up > hdfs:/flink/recovery/application_1625720467511_45233. | > org.apache.flink.runtime.blob.FileSystemBlobStor > {code} > {code:java} > 2022-04-07 19:55:29,452 | INFO | [flink-akka.actor.default-dispatcher-4] | > Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore > (ZooKeeperSubmittedJobGraphStore.java:215) > 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | > Fatal error occurred in the cluster entrypoint. | > org.apache.flink.runtime.entrypoint.ClusterEntrypoint > (ClusterEntrypoint.java:408) > java.lang.RuntimeException: > org.apache.flink.runtime.client.JobExecutionException: Could not set up > JobManager > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) > at > java.util.concurrent.CompletableFuture$Async
[jira] [Reopened] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception
[ https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw reopened FLINK-27245: - Is this problem fixed in what version of Flink?Thx > Flink job on Yarn cannot revover when zookeeper in Exception > > > Key: FLINK-27245 > URL: https://issues.apache.org/jira/browse/FLINK-27245 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.7.2 > Environment: Flink :1.7.2 > Hdfs:3.1.1 > zookeeper:3.5.1 > HA defined in Flink-conf,yaml: > flink.security.enable: true > fs.output.always-create-directory: false > fs.overwrite-files: false > high-availability.job.delay: 10 s > high-availability.storageDir: hdfs:///flink/recovery > high-availability.zookeeper.client.acl: creator > high-availability.zookeeper.client.connection-timeout: 15000 > high-availability.zookeeper.client.max-retry-attempts: 3 > high-availability.zookeeper.client.retry-wait: 5000 > high-availability.zookeeper.client.session-timeout: 6 > high-availability.zookeeper.path.root: /flink > high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002 > high-availability: zookeeper >Reporter: hjw >Priority: Major > Attachments: Job-failed.txt, Job-recover-failed.txt, > zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log > > > Flink job cannot revover when zookeeper in Exception. > I noticed that the data in high-availability.storageDir deleled when Job > failed , resulting in failure when pulling up again. > {code:java} > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,769 | INFO | [BlobServer shutdown hook] | > FileSystemBlobStore cleaning up > hdfs:/flink/recovery/application_1625720467511_45233. | > org.apache.flink.runtime.blob.FileSystemBlobStor > {code} > {code:java} > 2022-04-07 19:55:29,452 | INFO | [flink-akka.actor.default-dispatcher-4] | > Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore > (ZooKeeperSubmittedJobGraphStore.java:215) > 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | > Fatal error occurred in the cluster entrypoint. | > org.apache.flink.runtime.entrypoint.ClusterEntrypoint > (ClusterEntrypoint.java:408) > java.lang.RuntimeException: > org.apache.flink.runtime.client.JobExecutionException: Could not set up > JobManager > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJo
[jira] [Commented] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception
[ https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522139#comment-17522139 ] hjw commented on FLINK-27245: - Hi.[~mapohl].thank you for your response. Unfortunately, I can't upgrade for the time being.Is this problem fixed in what version of Flink? I only found the relevant issue: https://issues.apache.org/jira/browse/FLINK-10255 However, this issue 1.7 has been fixed. > Flink job on Yarn cannot revover when zookeeper in Exception > > > Key: FLINK-27245 > URL: https://issues.apache.org/jira/browse/FLINK-27245 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.7.2 > Environment: Flink :1.7.2 > Hdfs:3.1.1 > zookeeper:3.5.1 > HA defined in Flink-conf,yaml: > flink.security.enable: true > fs.output.always-create-directory: false > fs.overwrite-files: false > high-availability.job.delay: 10 s > high-availability.storageDir: hdfs:///flink/recovery > high-availability.zookeeper.client.acl: creator > high-availability.zookeeper.client.connection-timeout: 15000 > high-availability.zookeeper.client.max-retry-attempts: 3 > high-availability.zookeeper.client.retry-wait: 5000 > high-availability.zookeeper.client.session-timeout: 6 > high-availability.zookeeper.path.root: /flink > high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002 > high-availability: zookeeper >Reporter: hjw >Priority: Major > Attachments: Job-failed.txt, Job-recover-failed.txt, > zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log > > > Flink job cannot revover when zookeeper in Exception. > I noticed that the data in high-availability.storageDir deleled when Job > failed , resulting in failure when pulling up again. > {code:java} > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,769 | INFO | [BlobServer shutdown hook] | > FileSystemBlobStore cleaning up > hdfs:/flink/recovery/application_1625720467511_45233. | > org.apache.flink.runtime.blob.FileSystemBlobStor > {code} > {code:java} > 2022-04-07 19:55:29,452 | INFO | [flink-akka.actor.default-dispatcher-4] | > Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore > (ZooKeeperSubmittedJobGraphStore.java:215) > 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | > Fatal error occurred in the cluster entrypoint. | > org.apache.flink.runtime.entrypoint.ClusterEntrypoint > (ClusterEntrypoint.java:408) > java.lang.RuntimeException: > org.apache.flink.runtime.client.JobExecutionException: Could not set up > JobManager > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) > at > akka.dispatch.ForkJoinExecutorConfigurator$Akk
[jira] [Updated] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception
[ https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-27245: Description: Flink job cannot revover when zookeeper in Exception. I noticed that the data in high-availability.storageDir deleled when Job failed , resulting in failure when pulling up again. {code:java} (SmarterLeaderLatch.java:570) 2022-04-07 19:54:29,002 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,769 | INFO | [BlobServer shutdown hook] | FileSystemBlobStore cleaning up hdfs:/flink/recovery/application_1625720467511_45233. | org.apache.flink.runtime.blob.FileSystemBlobStor {code} {code:java} 2022-04-07 19:55:29,452 | INFO | [flink-akka.actor.default-dispatcher-4] | Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore (ZooKeeperSubmittedJobGraphStore.java:215) 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | Fatal error occurred in the cluster entrypoint. | org.apache.flink.runtime.entrypoint.ClusterEntrypoint (ClusterEntrypoint.java:408) java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager at org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:176) at org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058) at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:308) at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34) ... 7 common frames omitted Caused by: java.lang.Exception: Cannot set up the user code libraries: File does not exist: /flink/recovery/application_1625720467511_45233/blob/job_1898637f2d11429bd5f5767ea1daaf79/blob_p-7128d0ae4a06a277e3b1182c99eb616ffd45b590-c90586d4a5d4641fcc0c9e4cab31c131 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:153) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1951) at org.apache.hadoop.hdfs.server.namenode.
[jira] [Updated] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception
[ https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-27245: Attachment: Job-recover-failed.txt Job-failed.txt zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log > Flink job on Yarn cannot revover when zookeeper in Exception > > > Key: FLINK-27245 > URL: https://issues.apache.org/jira/browse/FLINK-27245 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.17.0 > Environment: Flink :1.17.2 > Hdfs:3.1.1 > zookeeper:3.5.1 > HA defined in Flink-conf,yaml: > flink.security.enable: true > fs.output.always-create-directory: false > fs.overwrite-files: false > high-availability.job.delay: 10 s > high-availability.storageDir: hdfs:///flink/recovery > high-availability.zookeeper.client.acl: creator > high-availability.zookeeper.client.connection-timeout: 15000 > high-availability.zookeeper.client.max-retry-attempts: 3 > high-availability.zookeeper.client.retry-wait: 5000 > high-availability.zookeeper.client.session-timeout: 6 > high-availability.zookeeper.path.root: /flink > high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002 > high-availability: zookeeper >Reporter: hjw >Priority: Major > Attachments: Job-failed.txt, Job-recover-failed.txt, > zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log > > > Flink job cannot revover when zookeeper in Exception. > I noticed that the data in high-availability.storageDir deleled when Job > failed , resulting in failure when pulling up again. > {code:java} > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | > Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 > seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch > (SmarterLeaderLatch.java:570) > 2022-04-07 19:54:30,769 | INFO | [BlobServer shutdown hook] | > FileSystemBlobStore cleaning up > hdfs:/flink/recovery/application_1625720467511_45233. | > org.apache.flink.runtime.blob.FileSystemBlobStor > {code} > {code:java} > 2022-04-07 19:55:29,452 | INFO | [flink-akka.actor.default-dispatcher-4] | > Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore > (ZooKeeperSubmittedJobGraphStore.java:215) > 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | > Fatal error occurred in the cluster entrypoint. | > org.apache.flink.runtime.entrypoint.ClusterEntrypoint > (ClusterEntrypoint.java:408) > java.lang.RuntimeException: > org.apache.flink.runtime.client.JobExecutionException: Could not set up > JobManager > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.For
[jira] [Created] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception
hjw created FLINK-27245: --- Summary: Flink job on Yarn cannot revover when zookeeper in Exception Key: FLINK-27245 URL: https://issues.apache.org/jira/browse/FLINK-27245 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.17.0 Environment: Flink :1.17.2 Hdfs:3.1.1 zookeeper:3.5.1 HA defined in Flink-conf,yaml: flink.security.enable: true fs.output.always-create-directory: false fs.overwrite-files: false high-availability.job.delay: 10 s high-availability.storageDir: hdfs:///flink/recovery high-availability.zookeeper.client.acl: creator high-availability.zookeeper.client.connection-timeout: 15000 high-availability.zookeeper.client.max-retry-attempts: 3 high-availability.zookeeper.client.retry-wait: 5000 high-availability.zookeeper.client.session-timeout: 6 high-availability.zookeeper.path.root: /flink high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002 high-availability: zookeeper Reporter: hjw Flink job cannot revover when zookeeper in Exception. I noticed that the data in high-availability.storageDir deleled when Job failed , resulting in failure when pulling up again. {code:java} (SmarterLeaderLatch.java:570) 2022-04-07 19:54:29,002 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:29,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,002 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,004 | INFO | [Suspend state waiting handler] | Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch (SmarterLeaderLatch.java:570) 2022-04-07 19:54:30,769 | INFO | [BlobServer shutdown hook] | FileSystemBlobStore cleaning up hdfs:/flink/recovery/application_1625720467511_45233. | org.apache.flink.runtime.blob.FileSystemBlobStor {code} {code:java} 2022-04-07 19:55:29,452 | INFO | [flink-akka.actor.default-dispatcher-4] | Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore (ZooKeeperSubmittedJobGraphStore.java:215) 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | Fatal error occurred in the cluster entrypoint. | org.apache.flink.runtime.entrypoint.ClusterEntrypoint (ClusterEntrypoint.java:408) java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager at org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:176) at org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058) at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJ
[jira] [Closed] (FLINK-26452) Flink deploy on k8s https SSLPeerUnverifiedException
[ https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw closed FLINK-26452. --- Release Note: .kube/config add a line insecure-skip-tls-verify : true Resolution: Fixed > Flink deploy on k8s https SSLPeerUnverifiedException > - > > Key: FLINK-26452 > URL: https://issues.apache.org/jira/browse/FLINK-26452 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.6 >Reporter: hjw >Priority: Major > > ~/.kube/config > apiVersion:v1 > kind:config > cluster: > -name: "yf-dev-cluster1" > cluster: > server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; > certificate-authority-data : “……" > {code:java} > 2022-03-02 18:59:30 | OkHttp > https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener > > Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname > in-acpmanager.test.yfzx.cn not verified: > certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= > DN: CN=in-acpmanager.test.yfzx.cn > subjectAltNames: [] > io.fabric8.kubernetes.client.KubernetesClientException: Failed to start > websocket > at > io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) > at > org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Suppressed: java.lang.Throwable: waiting here > at > io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) > at > io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) > at > io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: > {code} > {code:java} > Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname > in-acpmanager.test.yfzx.cn not verified: > certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= > DN: CN=in-acpmanager.test.yfzx.cn > subjectAltNames: [] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:350) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealIntercep
[jira] [Updated] (FLINK-26452) Flink deploy on k8s https SSLPeerUnverifiedException
[ https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-26452: Description: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | OkHttp https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} {code:java} Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:350) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) at org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInter
[jira] [Updated] (FLINK-26452) Flink deploy on k8s https SSLPeerUnverifiedException
[ https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-26452: Summary: Flink deploy on k8s https SSLPeerUnverifiedException (was: Flink deploy on k8s when kubeconfig server is hostname not ip) > Flink deploy on k8s https SSLPeerUnverifiedException > - > > Key: FLINK-26452 > URL: https://issues.apache.org/jira/browse/FLINK-26452 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.13.6 >Reporter: hjw >Priority: Major > > ~/.kube/config > apiVersion:v1 > kind:config > cluster: > -name: "yf-dev-cluster1" > cluster: > server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; > certificate-authority-data : “……" > {code:java} > 2022-03-02 18:59:30 | OkHttp > https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener > > Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname > in-acpmanager.test.yfzx.cn not verified: > certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= > DN: CN=in-acpmanager.test.yfzx.cn > subjectAltNames: [] > io.fabric8.kubernetes.client.KubernetesClientException: Failed to start > websocket > at > io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) > at > org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Suppressed: java.lang.Throwable: waiting here > at > io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) > at > io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) > at > io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: > {code} > {code:java} > Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname > in-acpmanager.test.yfzx.cn not verified: > certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= > DN: CN=in-acpmanager.test.yfzx.cn > subjectAltNames: [] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:350) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) > at > org.apache.flink.kubernete
[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip
[ https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-26452: Description: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | OkHttp https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} {code:java} Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:350) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) at org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) at org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) at org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at org.apache.flink.kubernetes.shade {code} By the way . "kubectl get pod -n namespace" command is success in this node. The node is configured with DNS. was: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | OkHttp https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.clien
[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip
[ https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-26452: Description: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | OkHttp https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} By the way . "kubectl get pod -n namespace" command is success in this node. was: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | OkHttp https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} By the way . > Flink deploy on k8s when kubeconfig server is hostname not ip > ---
[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip
[ https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-26452: Description: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | OkHttp https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} By the way . "kubectl get pod -n namespace" command is success in this node. The node is configured with DNS. was: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | OkHttp https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} By the way . "kubectl get pod -n namespace" command is success in this node. > Flink deploy on k8s
[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip
[ https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-26452: Description: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | OkHttp https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} By the way . was: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | OkHttp https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} > Flink deploy on k8s when kubeconfig server is hostname not ip > > > Key: FLINK-26452 > URL: https
[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip
[ https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-26452: Description: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | OkHttp https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} was: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp https://in-acpmanager.test.yfzx.cn/...^[[0;39m | ^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76] Exec Failure javax.net.ssl.SSLPeerUnverifi edException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} > Flink deploy on k8s when kubeconfig server is hostname not ip > > > K
[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip
[ https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-26452: Description: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp https://in-acpmanager.test.yfzx.cn/...^[[0;39m | ^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76] Exec Failure javax.net.ssl.SSLPeerUnverifi edException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} was: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.*.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp https://in-acpmanager.test.*.cn/...^[[0;39m | ^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76] Exec Failure javax.net.ssl.SSLPeerUnverifi edException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.*.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} > Flink deploy on k8s when kubeconfig server is hostname not ip > ---
[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip
[ https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hjw updated FLINK-26452: Description: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.*.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp https://in-acpmanager.test.*.cn/...^[[0;39m | ^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76] Exec Failure javax.net.ssl.SSLPeerUnverifi edException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.*.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} was: ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.*.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp https://in-acpmanager.test.*.cn/...^[[0;39m | ^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76] Exec Failure javax.net.ssl.SSLPeerUnverifi edException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} > Flink deploy on k8s when kubeconfig server is hostname not ip > -
[jira] [Created] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip
hjw created FLINK-26452: --- Summary: Flink deploy on k8s when kubeconfig server is hostname not ip Key: FLINK-26452 URL: https://issues.apache.org/jira/browse/FLINK-26452 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.13.6 Reporter: hjw ~/.kube/config apiVersion:v1 kind:config cluster: -name: "yf-dev-cluster1" cluster: server: "https://in-acpmanager.test.*.cn/k8s/clusters/c-t5h2t"; certificate-authority-data : “……" {code:java} 2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp https://in-acpmanager.test.*.cn/...^[[0;39m | ^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76] Exec Failure javax.net.ssl.SSLPeerUnverifi edException Hostname in-acpmanager.test.yfzx.cn not verified: certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA= DN: CN=in-acpmanager.test.yfzx.cn subjectAltNames: [] io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216) at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180) at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164) at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java: {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26196) error when Incremental Checkpoints by RocksDb
[ https://issues.apache.org/jira/browse/FLINK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495439#comment-17495439 ] hjw commented on FLINK-26196: - [~yunta] Thank for your response. Flink version:1.13.2 Environment: Flink on native k8s session, stateBackends: RocksDb state-checkpoint-storage: filesystem (Nas ) > error when Incremental Checkpoints by RocksDb > --- > > Key: FLINK-26196 > URL: https://issues.apache.org/jira/browse/FLINK-26196 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / State Backends >Affects Versions: 1.13.2 >Reporter: hjw >Priority: Critical > > When I use Incremental Checkpoints by RocksDb , errors happen occasionally. > Fortunately,Flink job is running normally > Log: > {code:java} > java.io.IOException: Could not perform checkpoint 2804 for operator > cc-rule-keyByAndReduceStream (2/8)#1. > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1045) > at > org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:135) > at > org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.triggerCheckpoint(SingleCheckpointBarrierHandler.java:250) > at > org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.access$100(SingleCheckpointBarrierHandler.java:61) > at > org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler$ControllerImpl.triggerGlobalCheckpoint(SingleCheckpointBarrierHandler.java:431) > at > org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.barrierReceived(AbstractAlignedBarrierHandlerState.java:61) > at > org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:227) > at > org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:180) > at > org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:158) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not > complete snapshot 2804 for operator cc-rule-keyByAndReduceStream (2/8)#1. > Failure reason: Checkpoint was declined. > at > org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:264) > at > org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:169) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371) > at > org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:706) > at > org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:627) > at > org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:590) > at > org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:312) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:1089) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1073) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1029) > ...
[jira] [Commented] (FLINK-26196) error when Incremental Checkpoints by RocksDb
[ https://issues.apache.org/jira/browse/FLINK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495437#comment-17495437 ] hjw commented on FLINK-26196: - [~mayuehappy] sorry.There are not any mistake before this take.This mistake is like a ghost. > error when Incremental Checkpoints by RocksDb > --- > > Key: FLINK-26196 > URL: https://issues.apache.org/jira/browse/FLINK-26196 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / State Backends >Affects Versions: 1.13.2 >Reporter: hjw >Priority: Critical > > When I use Incremental Checkpoints by RocksDb , errors happen occasionally. > Fortunately,Flink job is running normally > Log: > {code:java} > java.io.IOException: Could not perform checkpoint 2804 for operator > cc-rule-keyByAndReduceStream (2/8)#1. > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1045) > at > org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:135) > at > org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.triggerCheckpoint(SingleCheckpointBarrierHandler.java:250) > at > org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.access$100(SingleCheckpointBarrierHandler.java:61) > at > org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler$ControllerImpl.triggerGlobalCheckpoint(SingleCheckpointBarrierHandler.java:431) > at > org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.barrierReceived(AbstractAlignedBarrierHandlerState.java:61) > at > org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:227) > at > org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:180) > at > org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:158) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not > complete snapshot 2804 for operator cc-rule-keyByAndReduceStream (2/8)#1. > Failure reason: Checkpoint was declined. > at > org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:264) > at > org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:169) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371) > at > org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:706) > at > org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:627) > at > org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:590) > at > org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:312) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:1089) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1073) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1029) > ... 19 more > Caused by: org.rocksdb.RocksDBException: while link file to
[jira] [Created] (FLINK-26196) error when Incremental Checkpoints by RocksDb
hjw created FLINK-26196: --- Summary: error when Incremental Checkpoints by RocksDb Key: FLINK-26196 URL: https://issues.apache.org/jira/browse/FLINK-26196 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing, Runtime / State Backends Affects Versions: 1.13.2 Reporter: hjw When I use Incremental Checkpoints by RocksDb , errors happen occasionally. Fortunately,Flink job is running normally Log: {code:java} java.io.IOException: Could not perform checkpoint 2804 for operator cc-rule-keyByAndReduceStream (2/8)#1. at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1045) at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:135) at org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.triggerCheckpoint(SingleCheckpointBarrierHandler.java:250) at org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.access$100(SingleCheckpointBarrierHandler.java:61) at org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler$ControllerImpl.triggerGlobalCheckpoint(SingleCheckpointBarrierHandler.java:431) at org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.barrierReceived(AbstractAlignedBarrierHandlerState.java:61) at org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:227) at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:180) at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:158) at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681) at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636) at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete snapshot 2804 for operator cc-rule-keyByAndReduceStream (2/8)#1. Failure reason: Checkpoint was declined. at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:264) at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:169) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371) at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:706) at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:627) at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:590) at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:312) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:1089) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1073) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1029) ... 19 more Caused by: org.rocksdb.RocksDBException: while link file to /opt/flink/log/iodir/flink-io-1c4c28bd-c5ce-4c07-9d33-81d480ec5216/job_b9a574334a212349298b39d98567b519_op_WindowOperator_306d8342cb5b2ad8b53f1be57f65bee8__2_8__uuid_c9adab75-3696-4342-9019-e8477cf0a7ca/chk-2804.tmp/000279.sst: /opt/flink/log/iodir/flink-io-1c4c28bd-c5ce-4c07-9d33-81d480ec5216/job_b9a574334a212349298b39d98567b519_op_WindowOperator_306d8342cb5b2ad8b53f1be57f65bee8__2_8__uuid_c9adab75-3696-4342-9019-e8477cf0a7ca/db/000279.sst: File exists at org.rocksdb.Checkpoint.cr
[jira] [Comment Edited] (FLINK-15656) Support user-specified pod templates
[ https://issues.apache.org/jira/browse/FLINK-15656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463167#comment-17463167 ] hjw edited comment on FLINK-15656 at 12/21/21, 11:21 AM: - The note about pod template [1] file path must be packaged in image and set by -Dkubernetes.pod-template-file=/opt/flink-templeate.yaml? The pod template could be loaded by fileSystem? thx. [1]. [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-pod-template-file] was (Author: JIRAUSER280998): The note about pod template [1] file path must be packaged in image and set by -Dkubernetes.pod-template-file=/opt/flink-templeate.yaml? The pod template could be load by fileSystem? thx. [1]. https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-pod-template-file > Support user-specified pod templates > > > Key: FLINK-15656 > URL: https://issues.apache.org/jira/browse/FLINK-15656 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Reporter: Canbin Zheng >Assignee: Yang Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0 > > > The current approach of introducing new configuration options for each aspect > of pod specification a user might wish is becoming unwieldy, we have to > maintain more and more Flink side Kubernetes configuration options and users > have to learn the gap between the declarative model used by Kubernetes and > the configuration model used by Flink. It's a great improvement to allow > users to specify pod templates as central places for all customization needs > for the jobmanager and taskmanager pods. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-15656) Support user-specified pod templates
[ https://issues.apache.org/jira/browse/FLINK-15656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463167#comment-17463167 ] hjw commented on FLINK-15656: - The note about pod template [1] file path must be packaged in image and set by -Dkubernetes.pod-template-file=/opt/flink-templeate.yaml? The pod template could be load by fileSystem? thx. [1]. https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-pod-template-file > Support user-specified pod templates > > > Key: FLINK-15656 > URL: https://issues.apache.org/jira/browse/FLINK-15656 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Reporter: Canbin Zheng >Assignee: Yang Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0 > > > The current approach of introducing new configuration options for each aspect > of pod specification a user might wish is becoming unwieldy, we have to > maintain more and more Flink side Kubernetes configuration options and users > have to learn the gap between the declarative model used by Kubernetes and > the configuration model used by Flink. It's a great improvement to allow > users to specify pod templates as central places for all customization needs > for the jobmanager and taskmanager pods. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-15656) Support user-specified pod templates
[ https://issues.apache.org/jira/browse/FLINK-15656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462730#comment-17462730 ] hjw commented on FLINK-15656: - [~wangyang0918] set 'hostAlias' for jobmanager and taskmanage in pod template. Does this feature support native k8s ? Or only support standalone by k8s. > Support user-specified pod templates > > > Key: FLINK-15656 > URL: https://issues.apache.org/jira/browse/FLINK-15656 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Reporter: Canbin Zheng >Assignee: Yang Wang >Priority: Major > Labels: pull-request-available > Fix For: 1.13.0 > > > The current approach of introducing new configuration options for each aspect > of pod specification a user might wish is becoming unwieldy, we have to > maintain more and more Flink side Kubernetes configuration options and users > have to learn the gap between the declarative model used by Kubernetes and > the configuration model used by Flink. It's a great improvement to allow > users to specify pod templates as central places for all customization needs > for the jobmanager and taskmanager pods. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25115) Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond always equal 0
[ https://issues.apache.org/jira/browse/FLINK-25115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451823#comment-17451823 ] hjw commented on FLINK-25115: - [~chesnay] thank you very much. > Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond > always equal 0 > -- > > Key: FLINK-25115 > URL: https://issues.apache.org/jira/browse/FLINK-25115 > Project: Flink > Issue Type: Bug >Affects Versions: 1.13.2 >Reporter: hjw >Priority: Major > Attachments: image-2021-11-30-23-56-26-222.png > > > I submit a Flink-sql job .I found that the numRecordsOut and > numRecordsOutPerSecond indicators are always 0. > > !image-2021-11-30-23-56-26-222.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25115) Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond always equal 0
[ https://issues.apache.org/jira/browse/FLINK-25115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451812#comment-17451812 ] hjw commented on FLINK-25115: - [~chesnay] Could you provide me related release notes in version 1.14 ? thx. > Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond > always equal 0 > -- > > Key: FLINK-25115 > URL: https://issues.apache.org/jira/browse/FLINK-25115 > Project: Flink > Issue Type: Bug >Affects Versions: 1.13.2 >Reporter: hjw >Priority: Major > Attachments: image-2021-11-30-23-56-26-222.png > > > I submit a Flink-sql job .I found that the numRecordsOut and > numRecordsOutPerSecond indicators are always 0. > > !image-2021-11-30-23-56-26-222.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25115) Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond always equal 0
hjw created FLINK-25115: --- Summary: Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond always equal 0 Key: FLINK-25115 URL: https://issues.apache.org/jira/browse/FLINK-25115 Project: Flink Issue Type: Bug Affects Versions: 1.13.2 Reporter: hjw Attachments: image-2021-11-30-23-56-26-222.png I submit a Flink-sql job .I found that the numRecordsOut and numRecordsOutPerSecond indicators are always 0. !image-2021-11-30-23-56-26-222.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)