[jira] [Commented] (FLINK-33377) When Flink version >= 1.15 and Flink Operator is used, there is a waste of resources when running Flink batch jobs.

2023-10-26 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779984#comment-17779984
 ] 

hjw commented on FLINK-33377:
-

[~gsomogyi] Can you take a look?

> When Flink version >= 1.15 and Flink Operator is used, there is a waste of 
> resources when running Flink batch jobs.
> ---
>
> Key: FLINK-33377
> URL: https://issues.apache.org/jira/browse/FLINK-33377
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.5.0
>Reporter: hjw
>Priority: Major
>
> According to 
> [FLINK-29376|https://issues.apache.org/jira/browse/FLINK-29376],SHUTDOWN_ON_APPLICATION_FINISH
>  always be set false when Flink  version 1.15 and above.
> However,the JobManager still exists after a Flink batch job runs normally,Is 
> this a waste of resources?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33377) When Flink version >= 1.15 and Flink Operator is used, there is a waste of resources when running Flink batch jobs.

2023-10-26 Thread hjw (Jira)
hjw created FLINK-33377:
---

 Summary: When Flink version >= 1.15 and Flink Operator is used, 
there is a waste of resources when running Flink batch jobs.
 Key: FLINK-33377
 URL: https://issues.apache.org/jira/browse/FLINK-33377
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.5.0
Reporter: hjw


According to 
[FLINK-29376|https://issues.apache.org/jira/browse/FLINK-29376],SHUTDOWN_ON_APPLICATION_FINISH
 always be set false when Flink  version 1.15 and above.

However,the JobManager still exists after a Flink batch job runs normally,Is 
this a waste of resources?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-31203) Application upgrade rollbacks failed in Flink Kubernetes Operator

2023-02-27 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693865#comment-17693865
 ] 

hjw edited comment on FLINK-31203 at 2/27/23 8:03 AM:
--

[Gyula Fora|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gyfora] 
 Could you help to look at this problem? thx.

 


was (Author: JIRAUSER280998):
@[Gyula 
Fora|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gyfora] 

 

> Application upgrade rollbacks failed in Flink Kubernetes Operator
> -
>
> Key: FLINK-31203
> URL: https://issues.apache.org/jira/browse/FLINK-31203
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.3.1
>Reporter: hjw
>Priority: Major
>
> I make a test on the Application upgrade rollback feature, but this function 
> fails.The Flink application mode job cannot roll back to  last stable spec.
> As shown in the follow example, I declare a error pod-template without a 
> container named flink-main-container to test rollback feature.
> However, only the error of deploying the flink application job failed without 
> rollback.
>  
> Error:
> org.apache.flink.client.deployment.ClusterDeploymentException: Could not 
> create Kubernetes cluster "basic-example".
>  at 
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: 
> https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments.
>  Message: Deployment.apps "basic-example" is invalid: 
> [spec.template.spec.containers[0].name: Required value, 
> spec.template.spec.containers[0].image: Required value]. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name,
>  message=Required value, reason=FieldValueRequired, additionalProperties={}), 
> StatusCause(field=spec.template.spec.containers[0].image, message=Required 
> value, reason=FieldValueRequired, additionalProperties={})], group=apps, 
> kind=Deployment, name=basic-example, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=Deployment.apps 
> "basic-example" is invalid: [spec.template.spec.containers[0].name: Required 
> value, spec.template.spec.containers[0].image: Required value], 
> metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).
>  at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
>  at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
>  at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
>  
> Env:
> Flink version:Flink 1.16
> Flink Kubernetes Operator:1.3.1
>  
> *Last* ** *stable  spec:*
> apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1]
> kind: FlinkDeployment
> metadata:
>   name: basic-example
> spec:
>   image: flink:1.16
>   flinkVersion: v1_16
>   flinkConfiguration:
>     taskmanager.numberOfTaskSlots: "2"
>     kubernetes.operator.deployment.rollback.enabled: true
>     state.savepoints.dir: s3://flink-data/savepoints
>     state.checkpoints.dir: s3://flink-data/checkpoints
>     high-availability: 
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>     high-availability.storageDir: s3://flink-data/ha
>   serviceAccount: flink
>   *podTemplate:*
>     *spec:*
>       *containers:*
>         *- name: flink-main-container*      
>           *env:*
>           *- name: TZ*
>             *value: Asia/Shanghai*
>   jobManager:
>     resource:
>       memory: "2048m"
>       cpu: 1
>   taskManager:
>     resource:
>       memory: "2048m"
>       cpu: 1
>   job:
>     jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
>     parallelism: 2
>     upgradeMode: stateless
>  
> *new Spec:*
> apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1]
> kind: FlinkDeployment
> metadata:
>   name: basic-example
> spec:
>   image: flink:1.16
>   flinkVersion: v1_16
>   flinkConfiguration:
>     taskmanager.numberOfTaskSlots: "2"
>     kubernetes.operator.deployment.rollback.enabled: true
>     state.savepoints.dir: s3://flink-data/savepoints
>     state.checkpoints.dir: s3://flink-data/checkpoints
>     high-availability: 
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>     high-availability.storageDir: s3://flink-data/ha
>   serviceAccount: flink
>   *podTemplate:*
>     *spec:*
>

[jira] [Commented] (FLINK-31203) Application upgrade rollbacks failed in Flink Kubernetes Operator

2023-02-27 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693865#comment-17693865
 ] 

hjw commented on FLINK-31203:
-

@[Gyula 
Fora|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gyfora] 

 

> Application upgrade rollbacks failed in Flink Kubernetes Operator
> -
>
> Key: FLINK-31203
> URL: https://issues.apache.org/jira/browse/FLINK-31203
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.3.1
>Reporter: hjw
>Priority: Major
>
> I make a test on the Application upgrade rollback feature, but this function 
> fails.The Flink application mode job cannot roll back to  last stable spec.
> As shown in the follow example, I declare a error pod-template without a 
> container named flink-main-container to test rollback feature.
> However, only the error of deploying the flink application job failed without 
> rollback.
>  
> Error:
> org.apache.flink.client.deployment.ClusterDeploymentException: Could not 
> create Kubernetes cluster "basic-example".
>  at 
> org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: 
> https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments.
>  Message: Deployment.apps "basic-example" is invalid: 
> [spec.template.spec.containers[0].name: Required value, 
> spec.template.spec.containers[0].image: Required value]. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name,
>  message=Required value, reason=FieldValueRequired, additionalProperties={}), 
> StatusCause(field=spec.template.spec.containers[0].image, message=Required 
> value, reason=FieldValueRequired, additionalProperties={})], group=apps, 
> kind=Deployment, name=basic-example, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=Deployment.apps 
> "basic-example" is invalid: [spec.template.spec.containers[0].name: Required 
> value, spec.template.spec.containers[0].image: Required value], 
> metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).
>  at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
>  at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
>  at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
>  
> Env:
> Flink version:Flink 1.16
> Flink Kubernetes Operator:1.3.1
>  
> *Last* ** *stable  spec:*
> apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1]
> kind: FlinkDeployment
> metadata:
>   name: basic-example
> spec:
>   image: flink:1.16
>   flinkVersion: v1_16
>   flinkConfiguration:
>     taskmanager.numberOfTaskSlots: "2"
>     kubernetes.operator.deployment.rollback.enabled: true
>     state.savepoints.dir: s3://flink-data/savepoints
>     state.checkpoints.dir: s3://flink-data/checkpoints
>     high-availability: 
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>     high-availability.storageDir: s3://flink-data/ha
>   serviceAccount: flink
>   *podTemplate:*
>     *spec:*
>       *containers:*
>         *- name: flink-main-container*      
>           *env:*
>           *- name: TZ*
>             *value: Asia/Shanghai*
>   jobManager:
>     resource:
>       memory: "2048m"
>       cpu: 1
>   taskManager:
>     resource:
>       memory: "2048m"
>       cpu: 1
>   job:
>     jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
>     parallelism: 2
>     upgradeMode: stateless
>  
> *new Spec:*
> apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1]
> kind: FlinkDeployment
> metadata:
>   name: basic-example
> spec:
>   image: flink:1.16
>   flinkVersion: v1_16
>   flinkConfiguration:
>     taskmanager.numberOfTaskSlots: "2"
>     kubernetes.operator.deployment.rollback.enabled: true
>     state.savepoints.dir: s3://flink-data/savepoints
>     state.checkpoints.dir: s3://flink-data/checkpoints
>     high-availability: 
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>     high-availability.storageDir: s3://flink-data/ha
>   serviceAccount: flink
>   *podTemplate:*
>     *spec:*
>       *containers:*
>         *-   env:*
>           *- name: TZ*
>             *value: Asia/Shanghai*
>   jobManager:
>     resource:
>       memory: "2048m"
>       cpu: 1
>   taskManager:
>     resource:
>  

[jira] [Created] (FLINK-31203) Application upgrade rollbacks failed in Flink Kubernetes Operator

2023-02-23 Thread hjw (Jira)
hjw created FLINK-31203:
---

 Summary: Application upgrade rollbacks failed in Flink Kubernetes 
Operator
 Key: FLINK-31203
 URL: https://issues.apache.org/jira/browse/FLINK-31203
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.3.1
Reporter: hjw


I make a test on the Application upgrade rollback feature, but this function 
fails.The Flink application mode job cannot roll back to  last stable spec.
As shown in the follow example, I declare a error pod-template without a 
container named flink-main-container to test rollback feature.
However, only the error of deploying the flink application job failed without 
rollback.
 
Error:
org.apache.flink.client.deployment.ClusterDeploymentException: Could not create 
Kubernetes cluster "basic-example".
 at 
org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployClusterInternal(KubernetesClusterDescriptor.java:292)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
executing: POST at: 
https://*/k8s/clusters/c-fwkxh/apis/apps/v1/namespaces/test-flink/deployments. 
Message: Deployment.apps "basic-example" is invalid: 
[spec.template.spec.containers[0].name: Required value, 
spec.template.spec.containers[0].image: Required value]. Received status: 
Status(apiVersion=v1, code=422, 
details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].name,
 message=Required value, reason=FieldValueRequired, additionalProperties={}), 
StatusCause(field=spec.template.spec.containers[0].image, message=Required 
value, reason=FieldValueRequired, additionalProperties={})], group=apps, 
kind=Deployment, name=basic-example, retryAfterSeconds=null, uid=null, 
additionalProperties={}), kind=Status, message=Deployment.apps "basic-example" 
is invalid: [spec.template.spec.containers[0].name: Required value, 
spec.template.spec.containers[0].image: Required value], 
metadata=ListMeta(_continue=null, remainingItemCount=null, 
resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, 
status=Failure, additionalProperties={}).
 at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
 at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
 at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:560)
 
Env:
Flink version:Flink 1.16
Flink Kubernetes Operator:1.3.1
 
*Last* ** *stable  spec:*
apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1]
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: flink:1.16
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    kubernetes.operator.deployment.rollback.enabled: true
    state.savepoints.dir: s3://flink-data/savepoints
    state.checkpoints.dir: s3://flink-data/checkpoints
    high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: s3://flink-data/ha
  serviceAccount: flink
  *podTemplate:*
    *spec:*
      *containers:*
        *- name: flink-main-container*      
          *env:*
          *- name: TZ*
            *value: Asia/Shanghai*
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless
 
*new Spec:*
apiVersion: [flink.apache.org/v1beta1|http://flink.apache.org/v1beta1]
kind: FlinkDeployment
metadata:
  name: basic-example
spec:
  image: flink:1.16
  flinkVersion: v1_16
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
    kubernetes.operator.deployment.rollback.enabled: true
    state.savepoints.dir: s3://flink-data/savepoints
    state.checkpoints.dir: s3://flink-data/checkpoints
    high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: s3://flink-data/ha
  serviceAccount: flink
  *podTemplate:*
    *spec:*
      *containers:*
        *-   env:*
          *- name: TZ*
            *value: Asia/Shanghai*
  jobManager:
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      memory: "2048m"
      cpu: 1
  job:
    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
    parallelism: 2
    upgradeMode: stateless



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28758) Failed to stop with savepoint

2022-10-13 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616986#comment-17616986
 ] 

hjw commented on FLINK-28758:
-

Other people also encounter similar problems posted by Chinese  Mailing Lists.

!image-2022-10-13-19-47-56-635.png!

> Failed to stop with savepoint 
> --
>
> Key: FLINK-28758
> URL: https://issues.apache.org/jira/browse/FLINK-28758
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Runtime / Checkpointing, Runtime / 
> Task
>Affects Versions: 1.15.0
> Environment: Flink version:1.15.0
> deploy mode :K8s applicaiton Mode.   local mini cluster also have this 
> problem.
> Kafka Connector : use Kafka SourceFunction . No new Api.
>Reporter: hjw
>Priority: Major
> Attachments: image-2022-10-13-19-47-56-635.png
>
>
> I post a stop with savepoint request to Flink Job throught rest api.
> A Error happened in Kafka connector close.
> The job will enter restarting .
> It is successful to use savepoint command alone.
> {code:java}
> 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info 
> kafka.consumer for consumer-hjw-4 unregistered
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
> 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
> org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
> (operators closed: false, cancelled: false)
> 13:33:42.860 [jobmanager-io-thread-4] INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline 
> checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job 
> d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
> 127.0.0.1 (dataPort=-1).
> org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
> nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
>  at 
> org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
>  at 
> org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>  at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
> Caused by: org.apache.flink.util.SerializedThrowable: 
> org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
>  at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
>  at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>  ... 3 common frames omitted
> Caused by: org.apache.flink.util.SerializedThrowable: null
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
>  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
>  at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxPro

[jira] [Updated] (FLINK-28758) Failed to stop with savepoint

2022-10-13 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-28758:

Attachment: image-2022-10-13-19-47-56-635.png

> Failed to stop with savepoint 
> --
>
> Key: FLINK-28758
> URL: https://issues.apache.org/jira/browse/FLINK-28758
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Runtime / Checkpointing, Runtime / 
> Task
>Affects Versions: 1.15.0
> Environment: Flink version:1.15.0
> deploy mode :K8s applicaiton Mode.   local mini cluster also have this 
> problem.
> Kafka Connector : use Kafka SourceFunction . No new Api.
>Reporter: hjw
>Priority: Major
> Attachments: image-2022-10-13-19-47-56-635.png
>
>
> I post a stop with savepoint request to Flink Job throught rest api.
> A Error happened in Kafka connector close.
> The job will enter restarting .
> It is successful to use savepoint command alone.
> {code:java}
> 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info 
> kafka.consumer for consumer-hjw-4 unregistered
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
> 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
> org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
> (operators closed: false, cancelled: false)
> 13:33:42.860 [jobmanager-io-thread-4] INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline 
> checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job 
> d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
> 127.0.0.1 (dataPort=-1).
> org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
> nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
>  at 
> org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
>  at 
> org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>  at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
> Caused by: org.apache.flink.util.SerializedThrowable: 
> org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
>  at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
>  at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>  ... 3 common frames omitted
> Caused by: org.apache.flink.util.SerializedThrowable: null
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
>  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
>  at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop

[jira] [Commented] (FLINK-28758) Failed to stop with savepoint

2022-10-13 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616951#comment-17616951
 ] 

hjw commented on FLINK-28758:
-

Hi [~Yanfei Lei] . Sorry, I made a mistake in my description.

The right description is I posted the "stop with savepoint" request.

I have corrected the error description.

> Failed to stop with savepoint 
> --
>
> Key: FLINK-28758
> URL: https://issues.apache.org/jira/browse/FLINK-28758
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Runtime / Checkpointing, Runtime / 
> Task
>Affects Versions: 1.15.0
> Environment: Flink version:1.15.0
> deploy mode :K8s applicaiton Mode.   local mini cluster also have this 
> problem.
> Kafka Connector : use Kafka SourceFunction . No new Api.
>Reporter: hjw
>Priority: Major
>
> I post a stop with savepoint request to Flink Job throught rest api.
> A Error happened in Kafka connector close.
> The job will enter restarting .
> It is successful to use savepoint command alone.
> {code:java}
> 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info 
> kafka.consumer for consumer-hjw-4 unregistered
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
> 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
> org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
> (operators closed: false, cancelled: false)
> 13:33:42.860 [jobmanager-io-thread-4] INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline 
> checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job 
> d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
> 127.0.0.1 (dataPort=-1).
> org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
> nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
>  at 
> org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
>  at 
> org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>  at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
> Caused by: org.apache.flink.util.SerializedThrowable: 
> org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
>  at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
>  at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>  ... 3 common frames omitted
> Caused by: org.apache.flink.util.SerializedThrowable: null
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
>  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
>  at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProce

[jira] [Updated] (FLINK-28758) Failed to stop with savepoint

2022-10-13 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-28758:

Description: 
I post a stop with savepoint request to Flink Job throught rest api.
A Error happened in Kafka connector close.
The job will enter restarting .
It is successful to use savepoint command alone.
{code:java}
13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean (1/1)#0] 
DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] 
INFO org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for 
consumer-hjw-4 unregistered
13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] 
DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
(operators closed: false, cancelled: false)
13:33:42.860 [jobmanager-io-thread-4] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 
5 by task eeefcb27475446241861ad8db3f33144 of job 
d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
127.0.0.1 (dataPort=-1).
org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
 at org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
 at 
org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
 at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
 at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
 at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
Caused by: org.apache.flink.util.SerializedThrowable: 
org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
 at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
 at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
 ... 3 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: null
 at 
org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
 at 
org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
 at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
 at 
org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
 at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
 at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
 at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
 at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
 at java.lang.Thread.run(Thread.java:748)
13:34:00.925 [flink-akka.actor.default-dispatcher-21] DEBUG 
org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.
13:34:00.926 [flink-akka.actor.default-dispatcher-22] DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger 
heartbeat request.
13:34:00.926 [flink-akka.actor.default-dispatcher-21] DEBUG 
org.apache.flink.runtime.taskex

[jira] [Updated] (FLINK-29187) mvn spotless:apply to format code in Windows

2022-09-24 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-29187:

Component/s: Build System
 Build System / CI

> mvn spotless:apply to format code in Windows
> 
>
> Key: FLINK-29187
> URL: https://issues.apache.org/jira/browse/FLINK-29187
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System, Build System / CI
>Affects Versions: 1.15.2
>Reporter: hjw
>Priority: Major
>
> When use mvn spotless:apply command to format flink code in Windows, the EOL 
> of file will be replced to CRLF.
> I think we can add a `.gitattributes` file to force the code eol to be LF.
> The realtion issue is [EOL CRLF vs 
> LF|https://github.com/diffplug/spotless/issues/39]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-29187) mvn spotless:apply to format code in Windows

2022-09-03 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599938#comment-17599938
 ] 

hjw edited comment on FLINK-29187 at 9/3/22 2:27 PM:
-

I'd like to get a ticket to fix it


was (Author: JIRAUSER280998):
I'd like to get a ticket to fix it if it's confirmed that this is a problem

> mvn spotless:apply to format code in Windows
> 
>
> Key: FLINK-29187
> URL: https://issues.apache.org/jira/browse/FLINK-29187
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.15.2
>Reporter: hjw
>Priority: Major
>
> When use mvn spotless:apply command to format flink code in Windows, the EOL 
> of file will be replced to CRLF.
> I think we can add a `.gitattributes` file to force the code eol to be LF.
> The realtion issue is [EOL CRLF vs 
> LF|https://github.com/diffplug/spotless/issues/39]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29187) mvn spotless:apply to format code in Windows

2022-09-03 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599938#comment-17599938
 ] 

hjw commented on FLINK-29187:
-

I'd like to get a ticket to fix it if it's confirmed that this is a problem

> mvn spotless:apply to format code in Windows
> 
>
> Key: FLINK-29187
> URL: https://issues.apache.org/jira/browse/FLINK-29187
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.15.2
>Reporter: hjw
>Priority: Major
>
> When use mvn spotless:apply command to format flink code in Windows, the EOL 
> of file will be replced to CRLF.
> I think we can add a `.gitattributes` file to force the code eol to be LF.
> The realtion issue is [EOL CRLF vs 
> LF|https://github.com/diffplug/spotless/issues/39]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29187) mvn spotless:apply to format code in Windows

2022-09-03 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-29187:

Description: 
When use mvn spotless:apply command to format flink code in Windows, the EOL of 
file will be replced to CRLF.

I think we can add a `.gitattributes` file to force the code eol to be LF.

The realtion issue is [EOL CRLF vs 
LF|https://github.com/diffplug/spotless/issues/39]

 

  was:
When use mvn spotless:apply command to format flink code in Windows, the EOL of 
file will be replced to CRLF.

I think we can add a `.gitattributes` file to force the code eol to be LF.

The realtion issue is [链接标题|https://github.com/diffplug/spotless/issues/39]

 


> mvn spotless:apply to format code in Windows
> 
>
> Key: FLINK-29187
> URL: https://issues.apache.org/jira/browse/FLINK-29187
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.15.2
>Reporter: hjw
>Priority: Major
>
> When use mvn spotless:apply command to format flink code in Windows, the EOL 
> of file will be replced to CRLF.
> I think we can add a `.gitattributes` file to force the code eol to be LF.
> The realtion issue is [EOL CRLF vs 
> LF|https://github.com/diffplug/spotless/issues/39]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29187) mvn spotless:apply to format code in Windows

2022-09-03 Thread hjw (Jira)
hjw created FLINK-29187:
---

 Summary: mvn spotless:apply to format code in Windows
 Key: FLINK-29187
 URL: https://issues.apache.org/jira/browse/FLINK-29187
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.15.2
Reporter: hjw


When use mvn spotless:apply command to format flink code in Windows, the EOL of 
file will be replced to CRLF.

I think we can add a `.gitattributes` file to force the code eol to be LF.

The realtion issue is [链接标题|https://github.com/diffplug/spotless/issues/39]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29089) Error when run test case in Windows

2022-08-23 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583712#comment-17583712
 ] 

hjw commented on FLINK-29089:
-

[~wangyang0918] I'd like to file a PR to fix it. Please help assign the ticket 
to me.

> Error when run test case in Windows
> ---
>
> Key: FLINK-29089
> URL: https://issues.apache.org/jira/browse/FLINK-29089
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.15.1
> Environment: deploy env: Windows10
> flink version:1.15
>  
>Reporter: hjw
>Priority: Major
>
> When I run mvn clean install ,It will run Flink test case .
> However , I get Error:
> [ERROR] Failures:
> [ERROR]   
> KubernetesClusterDescriptorTest.testDeployApplicationClusterWithNonLocalSchema:155
>  Previous method call should have failed but it returned: 
> org.apache.flink.kubernetes.KubernetesClusterDescriptor$$Lambda$839/1619964974@70e5737f
> [ERROR]   
> AbstractKubernetesParametersTest.testGetLocalHadoopConfigurationDirectoryFromHadoop1HomeEnv:132->runTestWithEmptyEnv:149->lambda$testGetLocalHadoopConfigurationDirectoryFromHadoop1HomeEnv$3:141
> Expected: is "C:\Users\10104\AppData\Local\Temp\junit5662202040601670287/conf"
>      but: was 
> "C:\Users\10104\AppData\Local\Temp\junit5662202040601670287\conf"
> [ERROR]   
> AbstractKubernetesParametersTest.testGetLocalHadoopConfigurationDirectoryFromHadoop2HomeEnv:117->runTestWithEmptyEnv:149->lambda$testGetLocalHadoopConfigurationDirectoryFromHadoop2HomeEnv$2:126
> Expected: is 
> "C:\Users\10104\AppData\Local\Temp\junit7094401822178578683/etc/hadoop"
>      but: was 
> "C:\Users\10104\AppData\Local\Temp\junit7094401822178578683\etc\hadoop"
> [ERROR]   
> KubernetesUtilsTest.testLoadPodFromTemplateWithNonExistPathShouldFail:110
> Expected: Expected error message is "Pod template file 
> /path/of/non-exist.yaml does not exist."
>      but: The throwable  template file \path\of\non-exist.yaml does not exist.> does not contain the 
> expected error message "Pod template file /path/of/non-exist.yaml does not 
> exist."
>  
> I judge the error occurred due to different fileSysyem(unix,Windows..etc) 
> separators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29089) Error when run test case in Windows

2022-08-23 Thread hjw (Jira)
hjw created FLINK-29089:
---

 Summary: Error when run test case in Windows
 Key: FLINK-29089
 URL: https://issues.apache.org/jira/browse/FLINK-29089
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.15.1
 Environment: deploy env: Windows10

flink version:1.15

 
Reporter: hjw


When I run mvn clean install ,It will run Flink test case .
However , I get Error:
[ERROR] Failures:
[ERROR]   
KubernetesClusterDescriptorTest.testDeployApplicationClusterWithNonLocalSchema:155
 Previous method call should have failed but it returned: 
org.apache.flink.kubernetes.KubernetesClusterDescriptor$$Lambda$839/1619964974@70e5737f
[ERROR]   
AbstractKubernetesParametersTest.testGetLocalHadoopConfigurationDirectoryFromHadoop1HomeEnv:132->runTestWithEmptyEnv:149->lambda$testGetLocalHadoopConfigurationDirectoryFromHadoop1HomeEnv$3:141
Expected: is "C:\Users\10104\AppData\Local\Temp\junit5662202040601670287/conf"
     but: was "C:\Users\10104\AppData\Local\Temp\junit5662202040601670287\conf"
[ERROR]   
AbstractKubernetesParametersTest.testGetLocalHadoopConfigurationDirectoryFromHadoop2HomeEnv:117->runTestWithEmptyEnv:149->lambda$testGetLocalHadoopConfigurationDirectoryFromHadoop2HomeEnv$2:126
Expected: is 
"C:\Users\10104\AppData\Local\Temp\junit7094401822178578683/etc/hadoop"
     but: was 
"C:\Users\10104\AppData\Local\Temp\junit7094401822178578683\etc\hadoop"
[ERROR]   
KubernetesUtilsTest.testLoadPodFromTemplateWithNonExistPathShouldFail:110
Expected: Expected error message is "Pod template file /path/of/non-exist.yaml 
does not exist."
     but: The throwable  does not contain the 
expected error message "Pod template file /path/of/non-exist.yaml does not 
exist."
 
I judge the error occurred due to different fileSysyem(unix,Windows..etc) 
separators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema

2022-08-21 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582588#comment-17582588
 ] 

hjw commented on FLINK-28915:
-

[~wangyang0918] Could you tell more about how to use the credential for 
filesystems(hdfs, S3, OSS, GFS, etc.) ?I have completed the basic development, 
and have completed the manual testing of part of the filesystem (OSS, s3, 
local, file). But other filesystems I need environment for testing.

> Flink Native k8s mode jar localtion support s3 schema 
> --
>
> Key: FLINK-28915
> URL: https://issues.apache.org/jira/browse/FLINK-28915
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, flink-contrib
>Reporter: hjw
>Assignee: hjw
>Priority: Major
>
> As the Flink document show , local is the only supported scheme in Native k8s 
> deployment.
> Is there have a plan to support s3 filesystem? thx.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema

2022-08-18 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581433#comment-17581433
 ] 

hjw commented on FLINK-28915:
-

[~wangyang0918] [~aitozi] Does the test case need to cover all Flink supported 
filesystems? From 
[https://github.com/apache/flink-kubernetes-operator/pull/168] , only http or 
https schema are covreed. Do other file systems require manual verification?

> Flink Native k8s mode jar localtion support s3 schema 
> --
>
> Key: FLINK-28915
> URL: https://issues.apache.org/jira/browse/FLINK-28915
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, flink-contrib
>Affects Versions: 1.15.0, 1.15.1
>Reporter: hjw
>Assignee: hjw
>Priority: Major
>
> As the Flink document show , local is the only supported scheme in Native k8s 
> deployment.
> Is there have a plan to support s3 filesystem? thx.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema

2022-08-16 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580362#comment-17580362
 ] 

hjw commented on FLINK-28915:
-

Thanks  [~aitozi]  for the advice .

Thanks [~wangyang0918]   assign the ticket.

> Flink Native k8s mode jar localtion support s3 schema 
> --
>
> Key: FLINK-28915
> URL: https://issues.apache.org/jira/browse/FLINK-28915
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, flink-contrib
>Affects Versions: 1.15.0, 1.15.1
>Reporter: hjw
>Assignee: hjw
>Priority: Major
>
> As the Flink document show , local is the only supported scheme in Native k8s 
> deployment.
> Is there have a plan to support s3 filesystem? thx.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema

2022-08-15 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579729#comment-17579729
 ] 

hjw commented on FLINK-28915:
-

[~aitozi] thank you. I'm also interested in implementing this feature,but I 
never commit a pr in the past.May I be involved in the implementation of this 
pr please?

Or is there something I can do in this pr?  I want to participate in the Flink 
community contribution.

> Flink Native k8s mode jar localtion support s3 schema 
> --
>
> Key: FLINK-28915
> URL: https://issues.apache.org/jira/browse/FLINK-28915
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, flink-contrib
>Affects Versions: 1.15.0, 1.15.1
>Reporter: hjw
>Priority: Major
>
> As the Flink document show , local is the only supported scheme in Native k8s 
> deployment.
> Is there have a plan to support s3 filesystem? thx.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema

2022-08-15 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579603#comment-17579603
 ] 

hjw commented on FLINK-28915:
-

I successful modify the Class KubernetesApplicationClusterEntryPoint.java to 
achieve the purpose that support S3 schema.

Here the modify logic:
1.read the jar localtion in s3 from pipeline.jars parameters. 
2.download the jar from s3 to local path of pod(jobmanager).
3.replace the local path (local schema) to pipeline.jars parameters.

I know such an implementation is not elegant and not compatible with other 
remote DFS schema.(OSS.HDFS .etc). I think the more elegant implementation is 
to use Flink filesystem to connect each DFS schema.

Howerver, I notice the fact that Flink filesystem is configred in Starting 
flink cluster. But the job PackagedProgram is inited in 
KubernetesApplicationClusterEntryPoint before the code 
"ClusterEntrypoint.runclusterEntrypoint(KubernetesApplicationClusterEntryPoint)".

> Flink Native k8s mode jar localtion support s3 schema 
> --
>
> Key: FLINK-28915
> URL: https://issues.apache.org/jira/browse/FLINK-28915
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, flink-contrib
>Affects Versions: 1.15.0, 1.15.1
>Reporter: hjw
>Priority: Major
>
> As the Flink document show , local is the only supported scheme in Native k8s 
> deployment.
> Is there have a plan to support s3 filesystem? thx.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28915) Flink Native k8s mode jar localtion support s3 schema

2022-08-10 Thread hjw (Jira)
hjw created FLINK-28915:
---

 Summary: Flink Native k8s mode jar localtion support s3 schema 
 Key: FLINK-28915
 URL: https://issues.apache.org/jira/browse/FLINK-28915
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes, flink-contrib
Affects Versions: 1.15.1, 1.15.0
Reporter: hjw


As the Flink document show , local is the only supported scheme in Native k8s 
deployment.
Is there have a plan to support s3 filesystem? thx.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-28758) Failed to stop with savepoint

2022-08-01 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-28758:

Component/s: Runtime / Checkpointing
 Runtime / Task

> Failed to stop with savepoint 
> --
>
> Key: FLINK-28758
> URL: https://issues.apache.org/jira/browse/FLINK-28758
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Runtime / Checkpointing, Runtime / 
> Task
>Affects Versions: 1.15.0
> Environment: Flink version:1.15.0
> deploy mode :K8s applicaiton Mode.   local mini cluster also have this 
> problem.
> Kafka Connector : use Kafka SourceFunction . No new Api.
>Reporter: hjw
>Priority: Major
>
> I post a save with savepoint request to Flink Job throught rest api.
> A Error happened in Kafka connector close.
> The job will enter restarting .
> It is successful to use savepoint command alone.
> {code:java}
> 13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] INFO org.apache.kafka.common.utils.AppInfoParser - App info 
> kafka.consumer for consumer-hjw-4 unregistered
> 13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean 
> (1/1)#0] DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
> clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
> 13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
> org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
> (operators closed: false, cancelled: false)
> 13:33:42.860 [jobmanager-io-thread-4] INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline 
> checkpoint 5 by task eeefcb27475446241861ad8db3f33144 of job 
> d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
> 127.0.0.1 (dataPort=-1).
> org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
> nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
>  at 
> org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
>  at 
> org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>  at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
> Caused by: org.apache.flink.util.SerializedThrowable: 
> org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
>  at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
>  at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>  ... 3 common frames omitted
> Caused by: org.apache.flink.util.SerializedThrowable: null
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
>  at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
>  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
>  at 
> org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
>  at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
>  at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
>  at 
> org.a

[jira] [Updated] (FLINK-28758) Failed to stop with savepoint

2022-07-31 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-28758:

Description: 
I post a save with savepoint request to Flink Job throught rest api.
A Error happened in Kafka connector close.
The job will enter restarting .
It is successful to use savepoint command alone.


{code:java}
13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean (1/1)#0] 
DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] 
INFO org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for 
consumer-hjw-4 unregistered
13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] 
DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
(operators closed: false, cancelled: false)
13:33:42.860 [jobmanager-io-thread-4] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 
5 by task eeefcb27475446241861ad8db3f33144 of job 
d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
127.0.0.1 (dataPort=-1).
org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
 at org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
 at 
org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
 at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
 at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
 at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
Caused by: org.apache.flink.util.SerializedThrowable: 
org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
 at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
 at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
 ... 3 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: null
 at 
org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
 at 
org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
 at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
 at 
org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
 at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
 at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
 at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
 at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
 at java.lang.Thread.run(Thread.java:748)
13:34:00.925 [flink-akka.actor.default-dispatcher-21] DEBUG 
org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.
13:34:00.926 [flink-akka.actor.default-dispatcher-22] DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger 
heartbeat request.
13:34:00.926 [flink-akka.actor.default-dispatcher-21] DEBUG 
org.apache.flink.runtime.task

[jira] [Updated] (FLINK-28758) Failed to stop with savepoint

2022-07-31 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-28758:

Description: 
I post a save with savepoint request to Flink Job throught rest api.
A Error happened in Kafka connector close.
The job will enter restarting .


{code:java}
13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean (1/1)#0] 
DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] 
INFO org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for 
consumer-hjw-4 unregistered
13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] 
DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
(operators closed: false, cancelled: false)
13:33:42.860 [jobmanager-io-thread-4] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 
5 by task eeefcb27475446241861ad8db3f33144 of job 
d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
127.0.0.1 (dataPort=-1).
org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
 at org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
 at 
org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
 at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
 at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
 at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
Caused by: org.apache.flink.util.SerializedThrowable: 
org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
 at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
 at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
 ... 3 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: null
 at 
org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
 at 
org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
 at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
 at 
org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
 at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
 at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
 at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
 at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
 at java.lang.Thread.run(Thread.java:748)
13:34:00.925 [flink-akka.actor.default-dispatcher-21] DEBUG 
org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.
13:34:00.926 [flink-akka.actor.default-dispatcher-22] DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger 
heartbeat request.
13:34:00.926 [flink-akka.actor.default-dispatcher-21] DEBUG 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Received heartbeat reques

[jira] [Created] (FLINK-28758) Failed to stop with savepoint

2022-07-31 Thread hjw (Jira)
hjw created FLINK-28758:
---

 Summary: Failed to stop with savepoint 
 Key: FLINK-28758
 URL: https://issues.apache.org/jira/browse/FLINK-28758
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.15.0
 Environment: Flink version:1.15.0
deploy mode :K8s applicaiton Mode.   local mini cluster also have this problem.
Kafka Connector : use Kafka SourceFunction . No new Api.

Reporter: hjw


I post a save with savepoint request to Flink Job throught rest api.
A Error happened in Kafka connector close.


{code:java}
13:33:42.857 [Kafka Fetcher for Source: nlp-kafka-source -> nlp-clean (1/1)#0] 
DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
clientId=consumer-hjw-3, groupId=hjw] Kafka consumer has been closed
13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] 
INFO org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for 
consumer-hjw-4 unregistered
13:33:42.857 [Kafka Fetcher for Source: cpp-kafka-source -> cpp-clean (1/1)#0] 
DEBUG org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer 
clientId=consumer-hjw-4, groupId=hjw] Kafka consumer has been closed
13:33:42.860 [Source: nlp-kafka-source -> nlp-clean (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.StreamTask - Cleanup StreamTask 
(operators closed: false, cancelled: false)
13:33:42.860 [jobmanager-io-thread-4] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 
5 by task eeefcb27475446241861ad8db3f33144 of job 
d6fed247feab1c0bcc1b0dcc2cfb4736 at 79edfa88-ccc3-4140-b0e1-7ce8a7f8669f @ 
127.0.0.1 (dataPort=-1).
org.apache.flink.util.SerializedThrowable: Task name with subtask : Source: 
nlp-kafka-source -> nlp-clean (1/1)#0 Failure reason: Task has failed.
 at org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1388)
 at 
org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1331)
 at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
 at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
 at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
 at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343)
Caused by: org.apache.flink.util.SerializedThrowable: 
org.apache.flink.streaming.connectors.kafka.internals.Handover$ClosedException
 at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
 at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
 ... 3 common frames omitted
Caused by: org.apache.flink.util.SerializedThrowable: null
 at 
org.apache.flink.streaming.connectors.kafka.internals.Handover.close(Handover.java:177)
 at 
org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.cancel(KafkaFetcher.java:164)
 at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:945)
 at 
org.apache.flink.streaming.api.operators.StreamSource.stop(StreamSource.java:128)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.stopOperatorForStopWithSavepoint(SourceStreamTask.java:305)
 at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.lambda$triggerStopWithSavepointAsync$1(SourceStreamTask.java:285)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
 at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
 at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
 at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
 at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
 at java.lang.Thread.run(Thread.java:748)
13:34:00.925 [flink-akka.actor.default-dispatcher-21] DEBUG 
org.apache.flink.runtime.jobmaster.JobMaster - Trigger 

[jira] [Comment Edited] (FLINK-27539) support consuming update and delete changes In Windowing TVFs

2022-05-12 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535961#comment-17535961
 ] 

hjw edited comment on FLINK-27539 at 5/12/22 8:33 AM:
--

[~martijnvisser]  yes, I  am  interested in the total amount of money for a 
day. In fact, I perfer Calculate Window to
TUMBLE Window, because  I want need the CUMULATE window result every time step 
size .  But  Calculate Window is not supported in Group Window Aggregation.
By the way, My source table is a cdc table , so  update and delete  should be  
handle correct in Window.
 thx.


was (Author: JIRAUSER280998):
[~martijnvisser]yes, I  am  interested in the total amount of money for a day. 
In fact, I perfer Calculate Window to
TUMBLE Window, because  I want need the CUMULATE window result every time step 
size .  But  Calculate Window is not supported in Group Window Aggregation.
By the way, My source table is a cdc table , so  update and delete  should be  
handle correct in Window.
 thx.

> support consuming update and delete changes In Windowing TVFs
> -
>
> Key: FLINK-27539
> URL: https://issues.apache.org/jira/browse/FLINK-27539
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.15.0
>Reporter: hjw
>Priority: Major
>
> custom_kafka is a cdc table
> sql:
> {code:java}
> select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as 
> total,name
> from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
> MINUTES,interval '1' DAY ))
> where status='1'
> group by name,window_start,window_end;
> {code}
> Error
> {code:java}
> Exception in thread "main" org.apache.flink.table.api.TableException: 
> StreamPhysicalWindowAggregate doesn't support consuming update and delete 
> changes which is produced by node TableSourceScan(table=[[default_catalog, 
> default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL 
> SECOND)]]], fields=[name, money, status, createtime, operation_ts])
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:315)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:353)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:342)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:341)
>  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
>  at scala.collection.immutable.Range.foreach(Range.scala:155)
>  at scala.collection.TraversableLike.map(TraversableLike.scala:233)
>  at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
>  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:341)
> {code}
> But I found Group Window Aggregation is works when use cdc table
> {code:java}
> select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') 
> as date_str,sum(money) as total,name
> from custom_kafka
> where status='1'
> group by name,TUMBLE(createtime,interval '10' MINUTES)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (FLINK-27539) support consuming update and delete changes In Windowing TVFs

2022-05-12 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535961#comment-17535961
 ] 

hjw edited comment on FLINK-27539 at 5/12/22 8:21 AM:
--

[~martijnvisser]yes, I  am  interested in the total amount of money for a day. 
In fact, I perfer Calculate Window to
TUMBLE Window, because  I want need the CUMULATE window result every time step 
size .  But  Calculate Window is not supported in Group Window Aggregation.
By the way, My source table is a cdc table , so  update and delete  should be  
handle correct in Window.
 thx.


was (Author: JIRAUSER280998):
[~martijnvisser]yes, I  am  interested in the total amount of money for a day. 
In fact, I perfer Calculate Window to
TUMBLE Window, because  I want need the CUMULATE window result every time step 
size .  But  Calculate Window is not supported in Group Window Aggregation.

  

> support consuming update and delete changes In Windowing TVFs
> -
>
> Key: FLINK-27539
> URL: https://issues.apache.org/jira/browse/FLINK-27539
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.15.0
>Reporter: hjw
>Priority: Major
>
> custom_kafka is a cdc table
> sql:
> {code:java}
> select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as 
> total,name
> from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
> MINUTES,interval '1' DAY ))
> where status='1'
> group by name,window_start,window_end;
> {code}
> Error
> {code:java}
> Exception in thread "main" org.apache.flink.table.api.TableException: 
> StreamPhysicalWindowAggregate doesn't support consuming update and delete 
> changes which is produced by node TableSourceScan(table=[[default_catalog, 
> default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL 
> SECOND)]]], fields=[name, money, status, createtime, operation_ts])
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:315)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:353)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:342)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:341)
>  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
>  at scala.collection.immutable.Range.foreach(Range.scala:155)
>  at scala.collection.TraversableLike.map(TraversableLike.scala:233)
>  at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
>  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:341)
> {code}
> But I found Group Window Aggregation is works when use cdc table
> {code:java}
> select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') 
> as date_str,sum(money) as total,name
> from custom_kafka
> where status='1'
> group by name,TUMBLE(createtime,interval '10' MINUTES)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27539) support consuming update and delete changes In Windowing TVFs

2022-05-12 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535961#comment-17535961
 ] 

hjw commented on FLINK-27539:
-

[~martijnvisser]yes, I  am  interested in the total amount of money for a day. 
In fact, I perfer Calculate Window to
TUMBLE Window, because  I want need the CUMULATE window result every time step 
size .  But  Calculate Window is not supported in Group Window Aggregation.

  

> support consuming update and delete changes In Windowing TVFs
> -
>
> Key: FLINK-27539
> URL: https://issues.apache.org/jira/browse/FLINK-27539
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.15.0
>Reporter: hjw
>Priority: Major
>
> custom_kafka is a cdc table
> sql:
> {code:java}
> select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as 
> total,name
> from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
> MINUTES,interval '1' DAY ))
> where status='1'
> group by name,window_start,window_end;
> {code}
> Error
> {code:java}
> Exception in thread "main" org.apache.flink.table.api.TableException: 
> StreamPhysicalWindowAggregate doesn't support consuming update and delete 
> changes which is produced by node TableSourceScan(table=[[default_catalog, 
> default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL 
> SECOND)]]], fields=[name, money, status, createtime, operation_ts])
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:315)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:353)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:342)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:341)
>  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
>  at scala.collection.immutable.Range.foreach(Range.scala:155)
>  at scala.collection.TraversableLike.map(TraversableLike.scala:233)
>  at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
>  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:341)
> {code}
> But I found Group Window Aggregation is works when use cdc table
> {code:java}
> select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') 
> as date_str,sum(money) as total,name
> from custom_kafka
> where status='1'
> group by name,TUMBLE(createtime,interval '10' MINUTES)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27539) support consuming update and delete changes In Windowing TVFs

2022-05-10 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534193#comment-17534193
 ] 

hjw commented on FLINK-27539:
-

Thanks for your response [~martijnvisser]
I want to I want to count the total amount of money in a day. But the money of 
record could be changed.
Ex:
name money  createtime
a 100   2022-05-10 15:23:09

The  window result 
name total  
a100

money changed
a 100->200   2022-05-10 15:23:09

The  window result 
name total  
a100->200


> support consuming update and delete changes In Windowing TVFs
> -
>
> Key: FLINK-27539
> URL: https://issues.apache.org/jira/browse/FLINK-27539
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.15.0
>Reporter: hjw
>Priority: Major
>
> custom_kafka is a cdc table
> sql:
> {code:java}
> select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as 
> total,name
> from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
> MINUTES,interval '1' DAY ))
> where status='1'
> group by name,window_start,window_end;
> {code}
> Error
> {code:java}
> Exception in thread "main" org.apache.flink.table.api.TableException: 
> StreamPhysicalWindowAggregate doesn't support consuming update and delete 
> changes which is produced by node TableSourceScan(table=[[default_catalog, 
> default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL 
> SECOND)]]], fields=[name, money, status, createtime, operation_ts])
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:315)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:353)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:342)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:341)
>  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
>  at scala.collection.immutable.Range.foreach(Range.scala:155)
>  at scala.collection.TraversableLike.map(TraversableLike.scala:233)
>  at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
>  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:341)
> {code}
> But I found Group Window Aggregation is works when use cdc table
> {code:java}
> select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') 
> as date_str,sum(money) as total,name
> from custom_kafka
> where status='1'
> group by name,TUMBLE(createtime,interval '10' MINUTES)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FLINK-27539) support consuming update and delete changes In Windowing TVFs

2022-05-07 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-27539:

Description: 
custom_kafka is a cdc table

sql:
{code:java}
select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as total,name
from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
MINUTES,interval '1' DAY ))
where status='1'
group by name,window_start,window_end;
{code}

Error

{code:java}
Exception in thread "main" org.apache.flink.table.api.TableException: 
StreamPhysicalWindowAggregate doesn't support consuming update and delete 
changes which is produced by node TableSourceScan(table=[[default_catalog, 
default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL 
SECOND)]]], fields=[name, money, status, createtime, operation_ts])
 at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396)
 at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:315)
 at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:353)
 at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:342)
 at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:341)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
 at scala.collection.immutable.Range.foreach(Range.scala:155)
 at scala.collection.TraversableLike.map(TraversableLike.scala:233)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
 at scala.collection.AbstractTraversable.map(Traversable.scala:104)
 at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:341)
{code}


But I found Group Window Aggregation is works when use cdc table
{code:java}
select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') 
as date_str,sum(money) as total,name
from custom_kafka
where status='1'
group by name,TUMBLE(createtime,interval '10' MINUTES)
{code}


  was:
custom_kafka is a cdc table

sql:
{code:java}
select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as total,name
from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
MINUTES,interval '1' DAY ))
where status='1'
group by name,window_start,window_end;
{code}

Error

{code:java}
org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate 
doesn't support consuming update and delete changes which is produced by node 
TableSourceScan(table=[[default_catalog, default_database,custom_kafka]], 
fields=[name, money, status,createtime,operation_ts])
{code}


But I found Group Window Aggregation is works when use cdc table
{code:java}
select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') 
as date_str,sum(money) as total,name
from custom_kafka
where status='1'
group by name,TUMBLE(createtime,interval '10' MINUTES)
{code}



> support consuming update and delete changes In Windowing TVFs
> -
>
> Key: FLINK-27539
> URL: https://issues.apache.org/jira/browse/FLINK-27539
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.15.0
>Reporter: hjw
>Priority: Major
>
> custom_kafka is a cdc table
> sql:
> {code:java}
> select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as 
> total,name
> from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
> MINUTES,interval '1' DAY ))
> where status='1'
> group by name,window_start,window_end;
> {code}
> Error
> {code:java}
> Exception in thread "main" org.apache.flink.table.api.TableException: 
> StreamPhysicalWindowAggregate doesn't support consuming update and delete 
> changes which is produced by node TableSourceScan(table=[[default_catalog, 
> default_database, custom_kafka, watermark=[-(createtime, 5000:INTERVAL 
> SECOND)]]], fields=[name, money, status, createtime, operation_ts])
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:396)
>  at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyK

[jira] [Updated] (FLINK-27539) support consuming update and delete changes In Windowing TVFs

2022-05-07 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-27539:

Description: 
custom_kafka is a cdc table

sql:
{code:java}
select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as total,name
from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
MINUTES,interval '1' DAY ))
where status='1'
group by name,window_start,window_end;
{code}

Error

{code:java}
org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate 
doesn't support consuming update and delete changes which is produced by node 
TableSourceScan(table=[[default_catalog, default_database,custom_kafka]], 
fields=[name, money, status,createtime,operation_ts])
{code}


But I found Group Window Aggregation is works when use cdc table
{code:java}
select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') 
as date_str,sum(money) as total,name
from custom_kafka
where status='1'
group by name,TUMBLE(createtime,interval '10' MINUTES)
{code}


  was:
custom_kafka is a cdc table

sql:
{code:java}

select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as total,name
from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
MINUTES,interval '1' DAY ))
where status='1'
group by name,window_start,window_end;
{code}

Error

org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate 
doesn't support consuming update and delete changes which is produced by node 
TableSourceScan(table=[[default_catalog, default_database,custom_kafka]], 
fields=[name, money, status,createtime,operation_ts])

But I found Group Window Aggregation is works when use cdc table

select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') 
as date_str,sum(money) as total,name
from custom_kafka
where status='1'
group by name,TUMBLE(createtime,interval '10' MINUTES)


> support consuming update and delete changes In Windowing TVFs
> -
>
> Key: FLINK-27539
> URL: https://issues.apache.org/jira/browse/FLINK-27539
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.15.0
>Reporter: hjw
>Priority: Major
>
> custom_kafka is a cdc table
> sql:
> {code:java}
> select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as 
> total,name
> from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
> MINUTES,interval '1' DAY ))
> where status='1'
> group by name,window_start,window_end;
> {code}
> Error
> {code:java}
> org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate 
> doesn't support consuming update and delete changes which is produced by node 
> TableSourceScan(table=[[default_catalog, default_database,custom_kafka]], 
> fields=[name, money, status,createtime,operation_ts])
> {code}
> But I found Group Window Aggregation is works when use cdc table
> {code:java}
> select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') 
> as date_str,sum(money) as total,name
> from custom_kafka
> where status='1'
> group by name,TUMBLE(createtime,interval '10' MINUTES)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27539) support consuming update and delete changes In Windowing TVFs

2022-05-07 Thread hjw (Jira)
hjw created FLINK-27539:
---

 Summary: support consuming update and delete changes In Windowing 
TVFs
 Key: FLINK-27539
 URL: https://issues.apache.org/jira/browse/FLINK-27539
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.15.0
Reporter: hjw


custom_kafka is a cdc table

sql:
{code:java}

select DATE_FORMAT(window_end,'-MM-dd') as date_str,sum(money) as total,name
from TABLE(CUMULATE(TABLE custom_kafka,descriptor(createtime),interval '1' 
MINUTES,interval '1' DAY ))
where status='1'
group by name,window_start,window_end;
{code}

Error

org.apache.flink.table.api.TableException: StreamPhysicalWindowAggregate 
doesn't support consuming update and delete changes which is produced by node 
TableSourceScan(table=[[default_catalog, default_database,custom_kafka]], 
fields=[name, money, status,createtime,operation_ts])

But I found Group Window Aggregation is works when use cdc table

select DATE_FORMAT(TUMBLE_END(createtime,interval '10' MINUTES),'-MM-dd') 
as date_str,sum(money) as total,name
from custom_kafka
where status='1'
group by name,TUMBLE(createtime,interval '10' MINUTES)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception

2022-04-17 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17523336#comment-17523336
 ] 

hjw commented on FLINK-27245:
-

hi。[~trohrmann]  Can you help me look at this problem. thank you

> Flink job on Yarn cannot revover when zookeeper in Exception
> 
>
> Key: FLINK-27245
> URL: https://issues.apache.org/jira/browse/FLINK-27245
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.7.2
> Environment: Flink :1.7.2
> Hdfs:3.1.1
> zookeeper:3.5.1
> HA defined in Flink-conf,yaml:
> flink.security.enable: true
> fs.output.always-create-directory: false
> fs.overwrite-files: false
> high-availability.job.delay: 10 s
> high-availability.storageDir: hdfs:///flink/recovery
> high-availability.zookeeper.client.acl: creator
> high-availability.zookeeper.client.connection-timeout: 15000
> high-availability.zookeeper.client.max-retry-attempts: 3
> high-availability.zookeeper.client.retry-wait: 5000
> high-availability.zookeeper.client.session-timeout: 6
> high-availability.zookeeper.path.root: /flink
> high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002
> high-availability: zookeeper
>Reporter: hjw
>Priority: Major
> Attachments: Job-failed.txt, Job-recover-failed.txt, 
> zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log
>
>
> Flink job cannot revover  when zookeeper in Exception.
> I noticed that the data in high-availability.storageDir deleled  when Job 
> failed , resulting in failure when pulling up again.
> Ps: The Job restart is done automatically by yarn, not manually
> {code:java}
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,769 | INFO  | [BlobServer shutdown hook] | 
> FileSystemBlobStore cleaning up 
> hdfs:/flink/recovery/application_1625720467511_45233. | 
> org.apache.flink.runtime.blob.FileSystemBlobStor
> {code}
> {code:java}
> 2022-04-07 19:55:29,452 | INFO  | [flink-akka.actor.default-dispatcher-4] | 
> Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | 
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore 
> (ZooKeeperSubmittedJobGraphStore.java:215) 
> 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | 
> Fatal error occurred in the cluster entrypoint. | 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint 
> (ClusterEntrypoint.java:408) 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobExecutionException: Could not set up 
> JobManager
>   at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 

[jira] [Updated] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception

2022-04-14 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-27245:

Description: 
Flink job cannot revover  when zookeeper in Exception.
I noticed that the data in high-availability.storageDir deleled  when Job 
failed , resulting in failure when pulling up again.
Ps: The Job restart is done automatically by yarn, not manually

{code:java}
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:29,002 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,769 | INFO  | [BlobServer shutdown hook] | 
FileSystemBlobStore cleaning up 
hdfs:/flink/recovery/application_1625720467511_45233. | 
org.apache.flink.runtime.blob.FileSystemBlobStor
{code}


{code:java}
2022-04-07 19:55:29,452 | INFO  | [flink-akka.actor.default-dispatcher-4] | 
Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | 
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore 
(ZooKeeperSubmittedJobGraphStore.java:215) 
2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | 
Fatal error occurred in the cluster entrypoint. | 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint 
(ClusterEntrypoint.java:408) 
java.lang.RuntimeException: 
org.apache.flink.runtime.client.JobExecutionException: Could not set up 
JobManager
at 
org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set 
up JobManager
at 
org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:176)
at 
org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058)
at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:308)
at 
org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
... 7 common frames omitted
Caused by: java.lang.Exception: Cannot set up the user code libraries: File 
does not exist: 
/flink/recovery/application_1625720467511_45233/blob/job_1898637f2d11429bd5f5767ea1daaf79/blob_p-7128d0ae4a06a277e3b1182c99eb616ffd45b590-c90586d4a5d4641fcc0c9e4cab31c131
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:153)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesyste

[jira] [Commented] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception

2022-04-14 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522191#comment-17522191
 ] 

hjw commented on FLINK-27245:
-

Hi.[~Zhanghao Chen] thank you for your response.
I notice that this parameter is added to this issue  [issue 
10052|https://issues.apache.org/jira/browse/FLINK-10052] .
I think that this parameter is used to solve the problem that the job will not 
restart when the curator status is suspended.But I confuse that  the job exited 
abnormally and the  high-availability.storageDir/cluster-id data was deleted, 
resulting in restart failure


> Flink job on Yarn cannot revover when zookeeper in Exception
> 
>
> Key: FLINK-27245
> URL: https://issues.apache.org/jira/browse/FLINK-27245
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.7.2
> Environment: Flink :1.7.2
> Hdfs:3.1.1
> zookeeper:3.5.1
> HA defined in Flink-conf,yaml:
> flink.security.enable: true
> fs.output.always-create-directory: false
> fs.overwrite-files: false
> high-availability.job.delay: 10 s
> high-availability.storageDir: hdfs:///flink/recovery
> high-availability.zookeeper.client.acl: creator
> high-availability.zookeeper.client.connection-timeout: 15000
> high-availability.zookeeper.client.max-retry-attempts: 3
> high-availability.zookeeper.client.retry-wait: 5000
> high-availability.zookeeper.client.session-timeout: 6
> high-availability.zookeeper.path.root: /flink
> high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002
> high-availability: zookeeper
>Reporter: hjw
>Priority: Major
> Attachments: Job-failed.txt, Job-recover-failed.txt, 
> zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log
>
>
> Flink job cannot revover  when zookeeper in Exception.
> I noticed that the data in high-availability.storageDir deleled  when Job 
> failed , resulting in failure when pulling up again.
> {code:java}
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,769 | INFO  | [BlobServer shutdown hook] | 
> FileSystemBlobStore cleaning up 
> hdfs:/flink/recovery/application_1625720467511_45233. | 
> org.apache.flink.runtime.blob.FileSystemBlobStor
> {code}
> {code:java}
> 2022-04-07 19:55:29,452 | INFO  | [flink-akka.actor.default-dispatcher-4] | 
> Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | 
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore 
> (ZooKeeperSubmittedJobGraphStore.java:215) 
> 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | 
> Fatal error occurred in the cluster entrypoint. | 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint 
> (ClusterEntrypoint.java:408) 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobExecutionException: Could not set up 
> JobManager
>   at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
>   at 
> java.util.concurrent.CompletableFuture$Async

[jira] [Reopened] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception

2022-04-14 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw reopened FLINK-27245:
-

Is this problem fixed in what version of Flink?Thx

> Flink job on Yarn cannot revover when zookeeper in Exception
> 
>
> Key: FLINK-27245
> URL: https://issues.apache.org/jira/browse/FLINK-27245
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.7.2
> Environment: Flink :1.7.2
> Hdfs:3.1.1
> zookeeper:3.5.1
> HA defined in Flink-conf,yaml:
> flink.security.enable: true
> fs.output.always-create-directory: false
> fs.overwrite-files: false
> high-availability.job.delay: 10 s
> high-availability.storageDir: hdfs:///flink/recovery
> high-availability.zookeeper.client.acl: creator
> high-availability.zookeeper.client.connection-timeout: 15000
> high-availability.zookeeper.client.max-retry-attempts: 3
> high-availability.zookeeper.client.retry-wait: 5000
> high-availability.zookeeper.client.session-timeout: 6
> high-availability.zookeeper.path.root: /flink
> high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002
> high-availability: zookeeper
>Reporter: hjw
>Priority: Major
> Attachments: Job-failed.txt, Job-recover-failed.txt, 
> zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log
>
>
> Flink job cannot revover  when zookeeper in Exception.
> I noticed that the data in high-availability.storageDir deleled  when Job 
> failed , resulting in failure when pulling up again.
> {code:java}
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,769 | INFO  | [BlobServer shutdown hook] | 
> FileSystemBlobStore cleaning up 
> hdfs:/flink/recovery/application_1625720467511_45233. | 
> org.apache.flink.runtime.blob.FileSystemBlobStor
> {code}
> {code:java}
> 2022-04-07 19:55:29,452 | INFO  | [flink-akka.actor.default-dispatcher-4] | 
> Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | 
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore 
> (ZooKeeperSubmittedJobGraphStore.java:215) 
> 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | 
> Fatal error occurred in the cluster entrypoint. | 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint 
> (ClusterEntrypoint.java:408) 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobExecutionException: Could not set up 
> JobManager
>   at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJo

[jira] [Commented] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception

2022-04-14 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522139#comment-17522139
 ] 

hjw commented on FLINK-27245:
-

Hi.[~mapohl].thank you for your response. Unfortunately, I can't upgrade for 
the time being.Is this problem fixed in what version of Flink? 
I only found the relevant issue: 
https://issues.apache.org/jira/browse/FLINK-10255
However, this issue 1.7 has been fixed.

> Flink job on Yarn cannot revover when zookeeper in Exception
> 
>
> Key: FLINK-27245
> URL: https://issues.apache.org/jira/browse/FLINK-27245
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.7.2
> Environment: Flink :1.7.2
> Hdfs:3.1.1
> zookeeper:3.5.1
> HA defined in Flink-conf,yaml:
> flink.security.enable: true
> fs.output.always-create-directory: false
> fs.overwrite-files: false
> high-availability.job.delay: 10 s
> high-availability.storageDir: hdfs:///flink/recovery
> high-availability.zookeeper.client.acl: creator
> high-availability.zookeeper.client.connection-timeout: 15000
> high-availability.zookeeper.client.max-retry-attempts: 3
> high-availability.zookeeper.client.retry-wait: 5000
> high-availability.zookeeper.client.session-timeout: 6
> high-availability.zookeeper.path.root: /flink
> high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002
> high-availability: zookeeper
>Reporter: hjw
>Priority: Major
> Attachments: Job-failed.txt, Job-recover-failed.txt, 
> zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log
>
>
> Flink job cannot revover  when zookeeper in Exception.
> I noticed that the data in high-availability.storageDir deleled  when Job 
> failed , resulting in failure when pulling up again.
> {code:java}
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,769 | INFO  | [BlobServer shutdown hook] | 
> FileSystemBlobStore cleaning up 
> hdfs:/flink/recovery/application_1625720467511_45233. | 
> org.apache.flink.runtime.blob.FileSystemBlobStor
> {code}
> {code:java}
> 2022-04-07 19:55:29,452 | INFO  | [flink-akka.actor.default-dispatcher-4] | 
> Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | 
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore 
> (ZooKeeperSubmittedJobGraphStore.java:215) 
> 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | 
> Fatal error occurred in the cluster entrypoint. | 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint 
> (ClusterEntrypoint.java:408) 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobExecutionException: Could not set up 
> JobManager
>   at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$Akk

[jira] [Updated] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception

2022-04-14 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-27245:

Description: 
Flink job cannot revover  when zookeeper in Exception.
I noticed that the data in high-availability.storageDir deleled  when Job 
failed , resulting in failure when pulling up again.

{code:java}
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:29,002 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,769 | INFO  | [BlobServer shutdown hook] | 
FileSystemBlobStore cleaning up 
hdfs:/flink/recovery/application_1625720467511_45233. | 
org.apache.flink.runtime.blob.FileSystemBlobStor
{code}


{code:java}
2022-04-07 19:55:29,452 | INFO  | [flink-akka.actor.default-dispatcher-4] | 
Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | 
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore 
(ZooKeeperSubmittedJobGraphStore.java:215) 
2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | 
Fatal error occurred in the cluster entrypoint. | 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint 
(ClusterEntrypoint.java:408) 
java.lang.RuntimeException: 
org.apache.flink.runtime.client.JobExecutionException: Could not set up 
JobManager
at 
org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set 
up JobManager
at 
org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:176)
at 
org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058)
at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:308)
at 
org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
... 7 common frames omitted
Caused by: java.lang.Exception: Cannot set up the user code libraries: File 
does not exist: 
/flink/recovery/application_1625720467511_45233/blob/job_1898637f2d11429bd5f5767ea1daaf79/blob_p-7128d0ae4a06a277e3b1182c99eb616ffd45b590-c90586d4a5d4641fcc0c9e4cab31c131
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:153)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1951)
at 
org.apache.hadoop.hdfs.server.namenode.

[jira] [Updated] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception

2022-04-14 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-27245:

Attachment: Job-recover-failed.txt
Job-failed.txt
zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log

> Flink job on Yarn cannot revover when zookeeper in Exception
> 
>
> Key: FLINK-27245
> URL: https://issues.apache.org/jira/browse/FLINK-27245
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.0
> Environment: Flink :1.17.2
> Hdfs:3.1.1
> zookeeper:3.5.1
> HA defined in Flink-conf,yaml:
> flink.security.enable: true
> fs.output.always-create-directory: false
> fs.overwrite-files: false
> high-availability.job.delay: 10 s
> high-availability.storageDir: hdfs:///flink/recovery
> high-availability.zookeeper.client.acl: creator
> high-availability.zookeeper.client.connection-timeout: 15000
> high-availability.zookeeper.client.max-retry-attempts: 3
> high-availability.zookeeper.client.retry-wait: 5000
> high-availability.zookeeper.client.session-timeout: 6
> high-availability.zookeeper.path.root: /flink
> high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002
> high-availability: zookeeper
>Reporter: hjw
>Priority: Major
> Attachments: Job-failed.txt, Job-recover-failed.txt, 
> zookeeper-omm-server-a-dsj-ghficn01.2022-04-07_20-09-25.[1].log
>
>
> Flink job cannot revover  when zookeeper in Exception.
> I noticed that the data in high-availability.storageDir deleled  when Job 
> failed , resulting in failure when pulling up again.
> {code:java}
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | 
> Connection to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 
> seconds. | org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
> (SmarterLeaderLatch.java:570) 
> 2022-04-07 19:54:30,769 | INFO  | [BlobServer shutdown hook] | 
> FileSystemBlobStore cleaning up 
> hdfs:/flink/recovery/application_1625720467511_45233. | 
> org.apache.flink.runtime.blob.FileSystemBlobStor
> {code}
> {code:java}
> 2022-04-07 19:55:29,452 | INFO  | [flink-akka.actor.default-dispatcher-4] | 
> Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | 
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore 
> (ZooKeeperSubmittedJobGraphStore.java:215) 
> 2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | 
> Fatal error occurred in the cluster entrypoint. | 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint 
> (ClusterEntrypoint.java:408) 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobExecutionException: Could not set up 
> JobManager
>   at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.For

[jira] [Created] (FLINK-27245) Flink job on Yarn cannot revover when zookeeper in Exception

2022-04-14 Thread hjw (Jira)
hjw created FLINK-27245:
---

 Summary: Flink job on Yarn cannot revover when zookeeper in 
Exception
 Key: FLINK-27245
 URL: https://issues.apache.org/jira/browse/FLINK-27245
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.17.0
 Environment: Flink :1.17.2
Hdfs:3.1.1
zookeeper:3.5.1

HA defined in Flink-conf,yaml:
flink.security.enable: true
fs.output.always-create-directory: false
fs.overwrite-files: false
high-availability.job.delay: 10 s
high-availability.storageDir: hdfs:///flink/recovery
high-availability.zookeeper.client.acl: creator
high-availability.zookeeper.client.connection-timeout: 15000
high-availability.zookeeper.client.max-retry-attempts: 3
high-availability.zookeeper.client.retry-wait: 5000
high-availability.zookeeper.client.session-timeout: 6
high-availability.zookeeper.path.root: /flink
high-availability.zookeeper.quorum: zk01:24002,zk02:24002,zk03:24002
high-availability: zookeeper
Reporter: hjw


Flink job cannot revover  when zookeeper in Exception.
I noticed that the data in high-availability.storageDir deleled  when Job 
failed , resulting in failure when pulling up again.

{code:java}
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:29,002 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:29,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 10 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,002 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,004 | INFO  | [Suspend state waiting handler] | Connection 
to Zookeeper is SUSPENDED. Wait it to be back. Already waited 11 seconds. | 
org.apache.flink.runtime.leaderelection.SmarterLeaderLatch 
(SmarterLeaderLatch.java:570) 
2022-04-07 19:54:30,769 | INFO  | [BlobServer shutdown hook] | 
FileSystemBlobStore cleaning up 
hdfs:/flink/recovery/application_1625720467511_45233. | 
org.apache.flink.runtime.blob.FileSystemBlobStor
{code}


{code:java}
2022-04-07 19:55:29,452 | INFO  | [flink-akka.actor.default-dispatcher-4] | 
Recovered SubmittedJobGraph(1898637f2d11429bd5f5767ea1daaf79, null). | 
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore 
(ZooKeeperSubmittedJobGraphStore.java:215) 
2022-04-07 19:55:29,467 | ERROR | [flink-akka.actor.default-dispatcher-17] | 
Fatal error occurred in the cluster entrypoint. | 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint 
(ClusterEntrypoint.java:408) 
java.lang.RuntimeException: 
org.apache.flink.runtime.client.JobExecutionException: Could not set up 
JobManager
at 
org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set 
up JobManager
at 
org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:176)
at 
org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058)
at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJ

[jira] [Closed] (FLINK-26452) Flink deploy on k8s https SSLPeerUnverifiedException

2022-03-04 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw closed FLINK-26452.
---
Release Note: 
.kube/config add a line

insecure-skip-tls-verify : true
  Resolution: Fixed

>  Flink deploy on k8s https SSLPeerUnverifiedException
> -
>
> Key: FLINK-26452
> URL: https://issues.apache.org/jira/browse/FLINK-26452
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.6
>Reporter: hjw
>Priority: Major
>
> ~/.kube/config
> apiVersion:v1
> kind:config
> cluster:
> -name: "yf-dev-cluster1"
>   cluster:
> server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
> certificate-authority-data : “……"
> {code:java}
> 2022-03-02 18:59:30 | OkHttp 
> https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
>   
> Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
> in-acpmanager.test.yfzx.cn not verified:
> certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
> DN: CN=in-acpmanager.test.yfzx.cn
> subjectAltNames: []
> io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
> websocket
> at 
> io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.lang.Throwable: waiting here
> at 
> io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
> at 
> io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
> at 
> io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
> at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
> {code}
> {code:java}
> Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname 
> in-acpmanager.test.yfzx.cn not verified:
> certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
> DN: CN=in-acpmanager.test.yfzx.cn
> subjectAltNames: []
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:350)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealIntercep

[jira] [Updated] (FLINK-26452) Flink deploy on k8s https SSLPeerUnverifiedException

2022-03-04 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-26452:

Description: 
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | OkHttp 
https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
  
Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}



{code:java}
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:350)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInter

[jira] [Updated] (FLINK-26452) Flink deploy on k8s https SSLPeerUnverifiedException

2022-03-03 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-26452:

Summary:  Flink deploy on k8s https SSLPeerUnverifiedException  (was:  
Flink deploy on k8s when  kubeconfig  server is hostname not ip)

>  Flink deploy on k8s https SSLPeerUnverifiedException
> -
>
> Key: FLINK-26452
> URL: https://issues.apache.org/jira/browse/FLINK-26452
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.6
>Reporter: hjw
>Priority: Major
>
> ~/.kube/config
> apiVersion:v1
> kind:config
> cluster:
> -name: "yf-dev-cluster1"
>   cluster:
> server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
> certificate-authority-data : “……"
> {code:java}
> 2022-03-02 18:59:30 | OkHttp 
> https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
>   
> Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
> in-acpmanager.test.yfzx.cn not verified:
> certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
> DN: CN=in-acpmanager.test.yfzx.cn
> subjectAltNames: []
> io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
> websocket
> at 
> io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.lang.Throwable: waiting here
> at 
> io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
> at 
> io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
> at 
> io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
> at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
> {code}
> {code:java}
> Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname 
> in-acpmanager.test.yfzx.cn not verified:
> certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
> DN: CN=in-acpmanager.test.yfzx.cn
> subjectAltNames: []
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:350)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
> at 
> org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
> at 
> org.apache.flink.kubernete

[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip

2022-03-02 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-26452:

Description: 
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | OkHttp 
https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
  
Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}



{code:java}
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:350)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at org.apache.flink.kubernetes.shade
{code}


By the way . "kubectl get pod -n namespace" command is success in this node.  
The node is configured with DNS.


  was:
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | OkHttp 
https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.clien

[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip

2022-03-02 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-26452:

Description: 
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | OkHttp 
https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
  
Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}


By the way . "kubectl get pod -n namespace" command is success in this node.


  was:
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | OkHttp 
https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
  
Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}


By the way . 



>  Flink deploy on k8s when  kubeconfig  server is hostname not ip
> ---

[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip

2022-03-02 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-26452:

Description: 
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | OkHttp 
https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
  
Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}


By the way . "kubectl get pod -n namespace" command is success in this node.  
The node is configured with DNS.


  was:
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | OkHttp 
https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
  
Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}


By the way . "kubectl get pod -n namespace" command is success in this node.



>  Flink deploy on k8s

[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip

2022-03-02 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-26452:

Description: 
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | OkHttp 
https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
  
Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}


By the way . 


  was:
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | OkHttp 
https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
  
Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}



>  Flink deploy on k8s when  kubeconfig  server is hostname not ip
> 
>
> Key: FLINK-26452
> URL: https

[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip

2022-03-02 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-26452:

Description: 
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | OkHttp 
https://in-acpmanager.test.yfzx.cn/...io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
  
Exec Failure javax.net.ssl.SSLPeerUnverifiedException Hostname 
in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}


  was:
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp 
https://in-acpmanager.test.yfzx.cn/...^[[0;39m | 
^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76]
 Exec Failure javax.net.ssl.SSLPeerUnverifi
edException Hostname in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}



>  Flink deploy on k8s when  kubeconfig  server is hostname not ip
> 
>
> K

[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip

2022-03-02 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-26452:

Description: 
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.yfzx.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp 
https://in-acpmanager.test.yfzx.cn/...^[[0;39m | 
^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76]
 Exec Failure javax.net.ssl.SSLPeerUnverifi
edException Hostname in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}


  was:
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.*.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp 
https://in-acpmanager.test.*.cn/...^[[0;39m | 
^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76]
 Exec Failure javax.net.ssl.SSLPeerUnverifi
edException Hostname in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.*.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}



>  Flink deploy on k8s when  kubeconfig  server is hostname not ip
> ---

[jira] [Updated] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip

2022-03-02 Thread hjw (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hjw updated FLINK-26452:

Description: 
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.*.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp 
https://in-acpmanager.test.*.cn/...^[[0;39m | 
^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76]
 Exec Failure javax.net.ssl.SSLPeerUnverifi
edException Hostname in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.*.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}


  was:
~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.*.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp 
https://in-acpmanager.test.*.cn/...^[[0;39m | 
^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76]
 Exec Failure javax.net.ssl.SSLPeerUnverifi
edException Hostname in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}



>  Flink deploy on k8s when  kubeconfig  server is hostname not ip
> -

[jira] [Created] (FLINK-26452) Flink deploy on k8s when kubeconfig server is hostname not ip

2022-03-02 Thread hjw (Jira)
hjw created FLINK-26452:
---

 Summary:  Flink deploy on k8s when  kubeconfig  server is hostname 
not ip
 Key: FLINK-26452
 URL: https://issues.apache.org/jira/browse/FLINK-26452
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.13.6
Reporter: hjw


~/.kube/config

apiVersion:v1
kind:config
cluster:
-name: "yf-dev-cluster1"
  cluster:
server: "https://in-acpmanager.test.*.cn/k8s/clusters/c-t5h2t";
certificate-authority-data : “……"


{code:java}
2022-03-02 18:59:30 | ^[[31mWARN ^[[0;39m | ^[[1;33mOkHttp 
https://in-acpmanager.test.*.cn/...^[[0;39m | 
^[[1;32mio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener^[[0;39m:76]
 Exec Failure javax.net.ssl.SSLPeerUnverifi
edException Hostname in-acpmanager.test.yfzx.cn not verified:
certificate: sha256/cw2T2s+Swhl7z+H35/3C1dTLxL26OOMO5VoEN9kAZCA=
DN: CN=in-acpmanager.test.yfzx.cn
subjectAltNames: []
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start 
websocket
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onFailure(WatcherWebSocketListener.java:77)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$1.onFailure(RealWebSocket.java:216)
at 
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:180)
at 
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.Throwable: waiting here
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:164)
at 
io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:175)
at 
io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.waitUntilReady(WatcherWebSocketListener.java:120)
at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:82)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:705)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:678)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26196) error when Incremental Checkpoints by RocksDb

2022-02-21 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495439#comment-17495439
 ] 

hjw commented on FLINK-26196:
-

[~yunta] 
Thank for your response.
Flink version:1.13.2
Environment: Flink on native k8s session,
stateBackends: RocksDb
state-checkpoint-storage: filesystem (Nas )

> error when Incremental Checkpoints  by RocksDb 
> ---
>
> Key: FLINK-26196
> URL: https://issues.apache.org/jira/browse/FLINK-26196
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.13.2
>Reporter: hjw
>Priority: Critical
>
> When I use Incremental Checkpoints by RocksDb , errors happen occasionally. 
> Fortunately,Flink job is running normally
> Log:
> {code:java}
> java.io.IOException: Could not perform checkpoint 2804 for operator 
> cc-rule-keyByAndReduceStream (2/8)#1.
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1045)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:135)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.triggerCheckpoint(SingleCheckpointBarrierHandler.java:250)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.access$100(SingleCheckpointBarrierHandler.java:61)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler$ControllerImpl.triggerGlobalCheckpoint(SingleCheckpointBarrierHandler.java:431)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.barrierReceived(AbstractAlignedBarrierHandlerState.java:61)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:227)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:180)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:158)
>  at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
>  at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620)
>  at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
>  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not 
> complete snapshot 2804 for operator cc-rule-keyByAndReduceStream (2/8)#1. 
> Failure reason: Checkpoint was declined.
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:264)
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:169)
>  at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:706)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:627)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:590)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:312)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:1089)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1073)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1029)
>  ... 

[jira] [Commented] (FLINK-26196) error when Incremental Checkpoints by RocksDb

2022-02-21 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495437#comment-17495437
 ] 

hjw commented on FLINK-26196:
-

[~mayuehappy] sorry.There are not any  mistake before this take.This mistake is 
like a ghost.

> error when Incremental Checkpoints  by RocksDb 
> ---
>
> Key: FLINK-26196
> URL: https://issues.apache.org/jira/browse/FLINK-26196
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.13.2
>Reporter: hjw
>Priority: Critical
>
> When I use Incremental Checkpoints by RocksDb , errors happen occasionally. 
> Fortunately,Flink job is running normally
> Log:
> {code:java}
> java.io.IOException: Could not perform checkpoint 2804 for operator 
> cc-rule-keyByAndReduceStream (2/8)#1.
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1045)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:135)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.triggerCheckpoint(SingleCheckpointBarrierHandler.java:250)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.access$100(SingleCheckpointBarrierHandler.java:61)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler$ControllerImpl.triggerGlobalCheckpoint(SingleCheckpointBarrierHandler.java:431)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.barrierReceived(AbstractAlignedBarrierHandlerState.java:61)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:227)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:180)
>  at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:158)
>  at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
>  at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620)
>  at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
>  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not 
> complete snapshot 2804 for operator cc-rule-keyByAndReduceStream (2/8)#1. 
> Failure reason: Checkpoint was declined.
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:264)
>  at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:169)
>  at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:706)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:627)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:590)
>  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:312)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:1089)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1073)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1029)
>  ... 19 more
> Caused by: org.rocksdb.RocksDBException: while link file to 

[jira] [Created] (FLINK-26196) error when Incremental Checkpoints by RocksDb

2022-02-16 Thread hjw (Jira)
hjw created FLINK-26196:
---

 Summary: error when Incremental Checkpoints  by RocksDb 
 Key: FLINK-26196
 URL: https://issues.apache.org/jira/browse/FLINK-26196
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing, Runtime / State Backends
Affects Versions: 1.13.2
Reporter: hjw


When I use Incremental Checkpoints by RocksDb , errors happen occasionally. 
Fortunately,Flink job is running normally
Log:
{code:java}
java.io.IOException: Could not perform checkpoint 2804 for operator 
cc-rule-keyByAndReduceStream (2/8)#1.
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1045)
 at 
org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:135)
 at 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.triggerCheckpoint(SingleCheckpointBarrierHandler.java:250)
 at 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.access$100(SingleCheckpointBarrierHandler.java:61)
 at 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler$ControllerImpl.triggerGlobalCheckpoint(SingleCheckpointBarrierHandler.java:431)
 at 
org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.barrierReceived(AbstractAlignedBarrierHandlerState.java:61)
 at 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:227)
 at 
org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:180)
 at 
org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:158)
 at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
 at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620)
 at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
 at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not 
complete snapshot 2804 for operator cc-rule-keyByAndReduceStream (2/8)#1. 
Failure reason: Checkpoint was declined.
 at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:264)
 at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:169)
 at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:371)
 at 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:706)
 at 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:627)
 at 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:590)
 at 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:312)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:1089)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1073)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1029)
 ... 19 more
Caused by: org.rocksdb.RocksDBException: while link file to 
/opt/flink/log/iodir/flink-io-1c4c28bd-c5ce-4c07-9d33-81d480ec5216/job_b9a574334a212349298b39d98567b519_op_WindowOperator_306d8342cb5b2ad8b53f1be57f65bee8__2_8__uuid_c9adab75-3696-4342-9019-e8477cf0a7ca/chk-2804.tmp/000279.sst:
 
/opt/flink/log/iodir/flink-io-1c4c28bd-c5ce-4c07-9d33-81d480ec5216/job_b9a574334a212349298b39d98567b519_op_WindowOperator_306d8342cb5b2ad8b53f1be57f65bee8__2_8__uuid_c9adab75-3696-4342-9019-e8477cf0a7ca/db/000279.sst:
 File exists
 at org.rocksdb.Checkpoint.cr

[jira] [Comment Edited] (FLINK-15656) Support user-specified pod templates

2021-12-21 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463167#comment-17463167
 ] 

hjw edited comment on FLINK-15656 at 12/21/21, 11:21 AM:
-

The note about pod template [1] file path must be packaged in image  and set by 

-Dkubernetes.pod-template-file=/opt/flink-templeate.yaml? 

The pod template could be loaded by fileSystem?

thx.

 

[1]. 
[https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-pod-template-file]


was (Author: JIRAUSER280998):
The note about pod template [1] file path must be packaged in image  and set by 

 -Dkubernetes.pod-template-file=/opt/flink-templeate.yaml? 

The pod template could be load by fileSystem? 

thx.

 

[1]. 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-pod-template-file

> Support user-specified pod templates
> 
>
> Key: FLINK-15656
> URL: https://issues.apache.org/jira/browse/FLINK-15656
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Reporter: Canbin Zheng
>Assignee: Yang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> The current approach of introducing new configuration options for each aspect 
> of pod specification a user might wish is becoming unwieldy, we have to 
> maintain more and more Flink side Kubernetes configuration options and users 
> have to learn the gap between the declarative model used by Kubernetes and 
> the configuration model used by Flink. It's a great improvement to allow 
> users to specify pod templates as central places for all customization needs 
> for the jobmanager and taskmanager pods.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-15656) Support user-specified pod templates

2021-12-21 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463167#comment-17463167
 ] 

hjw commented on FLINK-15656:
-

The note about pod template [1] file path must be packaged in image  and set by 

 -Dkubernetes.pod-template-file=/opt/flink-templeate.yaml? 

The pod template could be load by fileSystem? 

thx.

 

[1]. 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-pod-template-file

> Support user-specified pod templates
> 
>
> Key: FLINK-15656
> URL: https://issues.apache.org/jira/browse/FLINK-15656
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Reporter: Canbin Zheng
>Assignee: Yang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> The current approach of introducing new configuration options for each aspect 
> of pod specification a user might wish is becoming unwieldy, we have to 
> maintain more and more Flink side Kubernetes configuration options and users 
> have to learn the gap between the declarative model used by Kubernetes and 
> the configuration model used by Flink. It's a great improvement to allow 
> users to specify pod templates as central places for all customization needs 
> for the jobmanager and taskmanager pods.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-15656) Support user-specified pod templates

2021-12-20 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462730#comment-17462730
 ] 

hjw commented on FLINK-15656:
-

[~wangyang0918] set 'hostAlias' for jobmanager and taskmanage in pod template. 
Does this feature support native k8s ? Or only support standalone by k8s.

> Support user-specified pod templates
> 
>
> Key: FLINK-15656
> URL: https://issues.apache.org/jira/browse/FLINK-15656
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Reporter: Canbin Zheng
>Assignee: Yang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> The current approach of introducing new configuration options for each aspect 
> of pod specification a user might wish is becoming unwieldy, we have to 
> maintain more and more Flink side Kubernetes configuration options and users 
> have to learn the gap between the declarative model used by Kubernetes and 
> the configuration model used by Flink. It's a great improvement to allow 
> users to specify pod templates as central places for all customization needs 
> for the jobmanager and taskmanager pods.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25115) Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond always equal 0

2021-12-01 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451823#comment-17451823
 ] 

hjw commented on FLINK-25115:
-

[~chesnay] thank you very much.

> Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond 
> always equal 0
> --
>
> Key: FLINK-25115
> URL: https://issues.apache.org/jira/browse/FLINK-25115
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.13.2
>Reporter: hjw
>Priority: Major
> Attachments: image-2021-11-30-23-56-26-222.png
>
>
> I submit a Flink-sql job .I found that the numRecordsOut  and 
> numRecordsOutPerSecond  indicators are always 0.
>  
> !image-2021-11-30-23-56-26-222.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25115) Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond always equal 0

2021-12-01 Thread hjw (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451812#comment-17451812
 ] 

hjw commented on FLINK-25115:
-

[~chesnay] Could you provide me related release notes in version 1.14 ? thx.

> Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond 
> always equal 0
> --
>
> Key: FLINK-25115
> URL: https://issues.apache.org/jira/browse/FLINK-25115
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.13.2
>Reporter: hjw
>Priority: Major
> Attachments: image-2021-11-30-23-56-26-222.png
>
>
> I submit a Flink-sql job .I found that the numRecordsOut  and 
> numRecordsOutPerSecond  indicators are always 0.
>  
> !image-2021-11-30-23-56-26-222.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25115) Why Flink Sink operator metric numRecordsOut and numRecordsOutPerSecond always equal 0

2021-11-30 Thread hjw (Jira)
hjw created FLINK-25115:
---

 Summary: Why Flink Sink operator metric numRecordsOut and 
numRecordsOutPerSecond always equal 0
 Key: FLINK-25115
 URL: https://issues.apache.org/jira/browse/FLINK-25115
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.13.2
Reporter: hjw
 Attachments: image-2021-11-30-23-56-26-222.png

I submit a Flink-sql job .I found that the numRecordsOut  and 
numRecordsOutPerSecond  indicators are always 0.

 

!image-2021-11-30-23-56-26-222.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)