Re: 【flink native k8s】HA配置 taskmanager pod一直重启
找不到TM的日志。因为TM还没有启动起来,pod就挂了 我看下是否是这个原因,目前确实没有增加-Dkubernetes.taskmanager.service-account这个参数 -Dkubernetes.taskmanager.service-account这个参数是在./bin/kubernetes-session.sh启动session集群的时候加的吗 在 2022/8/31 下午4:10,“Yang Wang” 写入: 我猜测你是因为没有给TM设置service account,导致TM没有权限从K8s ConfigMap拿到leader,从而注册到RM、JM -Dkubernetes.taskmanager.service-account=wuzhiheng \ Best, Yang Xuyang 于2022年8月30日周二 23:22写道: > Hi, 能贴一下TM的日志吗,看Warn的日志貌似是TM一直起不来 > 在 2022-08-30 03:45:43,"Wu,Zhiheng" 写道: > >【问题描述】 > >启用HA配置之后,taskmanager pod一直处于创建-停止-创建的过程,无法启动任务 > > > >1. 任务配置和启动过程 > > > >a) 修改conf/flink.yaml配置文件,增加HA配置 > >kubernetes.cluster-id: realtime-monitor > >high-availability: > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > >high-availability.storageDir: > file:///opt/flink/checkpoint/recovery/monitor// > 这是一个NFS路径,以pvc挂载到pod > > > >b) 先通过以下命令创建一个无状态部署,建立一个session集群 > > > >./bin/kubernetes-session.sh \ > > > >-Dkubernetes.secrets=cdn-res-bd-keystore:/opt/flink/kafka/res/keystore/bd,cdn-res-bd-truststore:/opt/flink/kafka/res/truststore/bd,cdn-res-bj-keystore://opt/flink/kafka/res/keystore/bj,cdn-res-bj-truststore:/opt/flink/kafka/res/truststore/bj > \ > > > >-Dkubernetes.pod-template-file=./conf/pod-template.yaml \ > > > >-Dkubernetes.cluster-id=realtime-monitor \ > > > >-Dkubernetes.jobmanager.service-account=wuzhiheng \ > > > >-Dkubernetes.namespace=monitor \ > > > >-Dtaskmanager.numberOfTaskSlots=6 \ > > > >-Dtaskmanager.memory.process.size=8192m \ > > > >-Djobmanager.memory.process.size=2048m > > > >c) 最后通过web ui提交一个jar包任务,jobmanager 出现如下日志 > > > >2022-08-29 23:49:04,150 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod > realtime-monitor-taskmanager-1-13 is created. > > > >2022-08-29 23:49:04,152 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod > realtime-monitor-taskmanager-1-12 is created. > > > >2022-08-29 23:49:04,161 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received > new TaskManager pod: realtime-monitor-taskmanager-1-12 > > > >2022-08-29 23:49:04,162 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requested worker realtime-monitor-taskmanager-1-12 with resource spec > WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), > taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), > managedMemSize=0 bytes, numSlots=6}. > > > >2022-08-29 23:49:04,162 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received > new TaskManager pod: realtime-monitor-taskmanager-1-13 > > > >2022-08-29 23:49:04,162 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requested worker realtime-monitor-taskmanager-1-13 with resource spec > WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), > taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), > managedMemSize=0 bytes, numSlots=6}. > > > >2022-08-29 23:49:07,176 WARN > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Reaching max start worker failure rate: 12 events detected in the recent > interval, reaching the threshold 10.00. > > > >2022-08-29 23:49:07,176 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Will not retry creating worker in 3000 ms. > > > >2022-08-29 23:49:07,176 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Worker realtime-monitor-taskmanager-1-12 with resource spec > WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), > taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), > managedMemSize=0 bytes, numSlots=6} was requested in current attempt and > has not registered. Current pending count after removing: 1. > > > >2022-08-29 23:49:07,176 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Worker realtime-monitor-taskmanager-1-12 is terminated. Diagnostics: Pod > terminated, container termination statuses: > [flink-main-container(exitCode=1, reason=Error, message=null)], pod status: > Failed(reason=null, message=null) > > > >2022-08-29 23:49:07,176 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requesting new worker with resource spec WorkerResourceSpec {cpuCores=6.0, > taskHeapSize=6.005gb (6447819631 bytes), taskOffHeapSize=0 bytes, > networkMemSize=711.680mb (746250577 bytes), managedMemSize=0 bytes, > numSlots=6}, current pending count: 2. > > > >2022-
Re: 【flink native k8s】HA配置 taskmanager pod一直重启
我猜测你是因为没有给TM设置service account,导致TM没有权限从K8s ConfigMap拿到leader,从而注册到RM、JM -Dkubernetes.taskmanager.service-account=wuzhiheng \ Best, Yang Xuyang 于2022年8月30日周二 23:22写道: > Hi, 能贴一下TM的日志吗,看Warn的日志貌似是TM一直起不来 > 在 2022-08-30 03:45:43,"Wu,Zhiheng" 写道: > >【问题描述】 > >启用HA配置之后,taskmanager pod一直处于创建-停止-创建的过程,无法启动任务 > > > >1. 任务配置和启动过程 > > > >a) 修改conf/flink.yaml配置文件,增加HA配置 > >kubernetes.cluster-id: realtime-monitor > >high-availability: > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > >high-availability.storageDir: > file:///opt/flink/checkpoint/recovery/monitor// > 这是一个NFS路径,以pvc挂载到pod > > > >b) 先通过以下命令创建一个无状态部署,建立一个session集群 > > > >./bin/kubernetes-session.sh \ > > > >-Dkubernetes.secrets=cdn-res-bd-keystore:/opt/flink/kafka/res/keystore/bd,cdn-res-bd-truststore:/opt/flink/kafka/res/truststore/bd,cdn-res-bj-keystore://opt/flink/kafka/res/keystore/bj,cdn-res-bj-truststore:/opt/flink/kafka/res/truststore/bj > \ > > > >-Dkubernetes.pod-template-file=./conf/pod-template.yaml \ > > > >-Dkubernetes.cluster-id=realtime-monitor \ > > > >-Dkubernetes.jobmanager.service-account=wuzhiheng \ > > > >-Dkubernetes.namespace=monitor \ > > > >-Dtaskmanager.numberOfTaskSlots=6 \ > > > >-Dtaskmanager.memory.process.size=8192m \ > > > >-Djobmanager.memory.process.size=2048m > > > >c) 最后通过web ui提交一个jar包任务,jobmanager 出现如下日志 > > > >2022-08-29 23:49:04,150 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod > realtime-monitor-taskmanager-1-13 is created. > > > >2022-08-29 23:49:04,152 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod > realtime-monitor-taskmanager-1-12 is created. > > > >2022-08-29 23:49:04,161 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received > new TaskManager pod: realtime-monitor-taskmanager-1-12 > > > >2022-08-29 23:49:04,162 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requested worker realtime-monitor-taskmanager-1-12 with resource spec > WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), > taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), > managedMemSize=0 bytes, numSlots=6}. > > > >2022-08-29 23:49:04,162 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received > new TaskManager pod: realtime-monitor-taskmanager-1-13 > > > >2022-08-29 23:49:04,162 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requested worker realtime-monitor-taskmanager-1-13 with resource spec > WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), > taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), > managedMemSize=0 bytes, numSlots=6}. > > > >2022-08-29 23:49:07,176 WARN > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Reaching max start worker failure rate: 12 events detected in the recent > interval, reaching the threshold 10.00. > > > >2022-08-29 23:49:07,176 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Will not retry creating worker in 3000 ms. > > > >2022-08-29 23:49:07,176 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Worker realtime-monitor-taskmanager-1-12 with resource spec > WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), > taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), > managedMemSize=0 bytes, numSlots=6} was requested in current attempt and > has not registered. Current pending count after removing: 1. > > > >2022-08-29 23:49:07,176 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Worker realtime-monitor-taskmanager-1-12 is terminated. Diagnostics: Pod > terminated, container termination statuses: > [flink-main-container(exitCode=1, reason=Error, message=null)], pod status: > Failed(reason=null, message=null) > > > >2022-08-29 23:49:07,176 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requesting new worker with resource spec WorkerResourceSpec {cpuCores=6.0, > taskHeapSize=6.005gb (6447819631 bytes), taskOffHeapSize=0 bytes, > networkMemSize=711.680mb (746250577 bytes), managedMemSize=0 bytes, > numSlots=6}, current pending count: 2. > > > >2022-08-29 23:49:07,514 WARN > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Reaching max start worker failure rate: 13 events detected in the recent > interval, reaching the threshold 10.00. > > > >2022-08-29 23:49:07,514 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Worker realtime-monitor-taskmanager-1-13 with resource spec > WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), > taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), > managedMemSize=0 bytes, numSlots=6} was requested in current attempt and > has not registered. C
Re:【flink native k8s】HA配置 taskmanager pod一直重启
Hi, 能贴一下TM的日志吗,看Warn的日志貌似是TM一直起不来 在 2022-08-30 03:45:43,"Wu,Zhiheng" 写道: >【问题描述】 >启用HA配置之后,taskmanager pod一直处于创建-停止-创建的过程,无法启动任务 > >1. 任务配置和启动过程 > >a) 修改conf/flink.yaml配置文件,增加HA配置 >kubernetes.cluster-id: realtime-monitor >high-availability: >org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory >high-availability.storageDir: file:///opt/flink/checkpoint/recovery/monitor >// 这是一个NFS路径,以pvc挂载到pod > >b) 先通过以下命令创建一个无状态部署,建立一个session集群 > >./bin/kubernetes-session.sh \ > >-Dkubernetes.secrets=cdn-res-bd-keystore:/opt/flink/kafka/res/keystore/bd,cdn-res-bd-truststore:/opt/flink/kafka/res/truststore/bd,cdn-res-bj-keystore://opt/flink/kafka/res/keystore/bj,cdn-res-bj-truststore:/opt/flink/kafka/res/truststore/bj > \ > >-Dkubernetes.pod-template-file=./conf/pod-template.yaml \ > >-Dkubernetes.cluster-id=realtime-monitor \ > >-Dkubernetes.jobmanager.service-account=wuzhiheng \ > >-Dkubernetes.namespace=monitor \ > >-Dtaskmanager.numberOfTaskSlots=6 \ > >-Dtaskmanager.memory.process.size=8192m \ > >-Djobmanager.memory.process.size=2048m > >c) 最后通过web ui提交一个jar包任务,jobmanager 出现如下日志 > >2022-08-29 23:49:04,150 INFO >org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod >realtime-monitor-taskmanager-1-13 is created. > >2022-08-29 23:49:04,152 INFO >org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod >realtime-monitor-taskmanager-1-12 is created. > >2022-08-29 23:49:04,161 INFO >org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received new >TaskManager pod: realtime-monitor-taskmanager-1-12 > >2022-08-29 23:49:04,162 INFO >org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >Requested worker realtime-monitor-taskmanager-1-12 with resource spec >WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), >taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), >managedMemSize=0 bytes, numSlots=6}. > >2022-08-29 23:49:04,162 INFO >org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received new >TaskManager pod: realtime-monitor-taskmanager-1-13 > >2022-08-29 23:49:04,162 INFO >org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >Requested worker realtime-monitor-taskmanager-1-13 with resource spec >WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), >taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), >managedMemSize=0 bytes, numSlots=6}. > >2022-08-29 23:49:07,176 WARN >org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >Reaching max start worker failure rate: 12 events detected in the recent >interval, reaching the threshold 10.00. > >2022-08-29 23:49:07,176 INFO >org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >Will not retry creating worker in 3000 ms. > >2022-08-29 23:49:07,176 INFO >org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >Worker realtime-monitor-taskmanager-1-12 with resource spec WorkerResourceSpec >{cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), taskOffHeapSize=0 >bytes, networkMemSize=711.680mb (746250577 bytes), managedMemSize=0 bytes, >numSlots=6} was requested in current attempt and has not registered. Current >pending count after removing: 1. > >2022-08-29 23:49:07,176 INFO >org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >Worker realtime-monitor-taskmanager-1-12 is terminated. Diagnostics: Pod >terminated, container termination statuses: [flink-main-container(exitCode=1, >reason=Error, message=null)], pod status: Failed(reason=null, message=null) > >2022-08-29 23:49:07,176 INFO >org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >Requesting new worker with resource spec WorkerResourceSpec {cpuCores=6.0, >taskHeapSize=6.005gb (6447819631 bytes), taskOffHeapSize=0 bytes, >networkMemSize=711.680mb (746250577 bytes), managedMemSize=0 bytes, >numSlots=6}, current pending count: 2. > >2022-08-29 23:49:07,514 WARN >org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >Reaching max start worker failure rate: 13 events detected in the recent >interval, reaching the threshold 10.00. > >2022-08-29 23:49:07,514 INFO >org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >Worker realtime-monitor-taskmanager-1-13 with resource spec WorkerResourceSpec >{cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), taskOffHeapSize=0 >bytes, networkMemSize=711.680mb (746250577 bytes), managedMemSize=0 bytes, >numSlots=6} was requested in current attempt and has not registered. Current >pending count after removing: 1. > >2022-08-29 23:49:07,514 INFO >org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - >Worker realtime-monitor-taskmanager-1-13 is terminated. Diagnostics: Pod >terminated, container termination statuses: [flink-mai
【flink native k8s】HA配置 taskmanager pod一直重启
【问题描述】 启用HA配置之后,taskmanager pod一直处于创建-停止-创建的过程,无法启动任务 1. 任务配置和启动过程 a) 修改conf/flink.yaml配置文件,增加HA配置 kubernetes.cluster-id: realtime-monitor high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory high-availability.storageDir: file:///opt/flink/checkpoint/recovery/monitor // 这是一个NFS路径,以pvc挂载到pod b) 先通过以下命令创建一个无状态部署,建立一个session集群 ./bin/kubernetes-session.sh \ -Dkubernetes.secrets=cdn-res-bd-keystore:/opt/flink/kafka/res/keystore/bd,cdn-res-bd-truststore:/opt/flink/kafka/res/truststore/bd,cdn-res-bj-keystore://opt/flink/kafka/res/keystore/bj,cdn-res-bj-truststore:/opt/flink/kafka/res/truststore/bj \ -Dkubernetes.pod-template-file=./conf/pod-template.yaml \ -Dkubernetes.cluster-id=realtime-monitor \ -Dkubernetes.jobmanager.service-account=wuzhiheng \ -Dkubernetes.namespace=monitor \ -Dtaskmanager.numberOfTaskSlots=6 \ -Dtaskmanager.memory.process.size=8192m \ -Djobmanager.memory.process.size=2048m c) 最后通过web ui提交一个jar包任务,jobmanager 出现如下日志 2022-08-29 23:49:04,150 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod realtime-monitor-taskmanager-1-13 is created. 2022-08-29 23:49:04,152 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod realtime-monitor-taskmanager-1-12 is created. 2022-08-29 23:49:04,161 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received new TaskManager pod: realtime-monitor-taskmanager-1-12 2022-08-29 23:49:04,162 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker realtime-monitor-taskmanager-1-12 with resource spec WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), managedMemSize=0 bytes, numSlots=6}. 2022-08-29 23:49:04,162 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received new TaskManager pod: realtime-monitor-taskmanager-1-13 2022-08-29 23:49:04,162 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker realtime-monitor-taskmanager-1-13 with resource spec WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), managedMemSize=0 bytes, numSlots=6}. 2022-08-29 23:49:07,176 WARN org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Reaching max start worker failure rate: 12 events detected in the recent interval, reaching the threshold 10.00. 2022-08-29 23:49:07,176 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Will not retry creating worker in 3000 ms. 2022-08-29 23:49:07,176 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker realtime-monitor-taskmanager-1-12 with resource spec WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), managedMemSize=0 bytes, numSlots=6} was requested in current attempt and has not registered. Current pending count after removing: 1. 2022-08-29 23:49:07,176 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker realtime-monitor-taskmanager-1-12 is terminated. Diagnostics: Pod terminated, container termination statuses: [flink-main-container(exitCode=1, reason=Error, message=null)], pod status: Failed(reason=null, message=null) 2022-08-29 23:49:07,176 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), managedMemSize=0 bytes, numSlots=6}, current pending count: 2. 2022-08-29 23:49:07,514 WARN org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Reaching max start worker failure rate: 13 events detected in the recent interval, reaching the threshold 10.00. 2022-08-29 23:49:07,514 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker realtime-monitor-taskmanager-1-13 with resource spec WorkerResourceSpec {cpuCores=6.0, taskHeapSize=6.005gb (6447819631 bytes), taskOffHeapSize=0 bytes, networkMemSize=711.680mb (746250577 bytes), managedMemSize=0 bytes, numSlots=6} was requested in current attempt and has not registered. Current pending count after removing: 1. 2022-08-29 23:49:07,514 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker realtime-monitor-taskmanager-1-13 is terminated. Diagnostics: Pod terminated, container termination statuses: [flink-main-container(exitCode=1, reason=Error, message=null)], pod status: Failed(reason=null, message=null) 2022-08-29 23:49:07,515 INFO org.apache.flink.runtime.resourcemanager.active.Activ
Re: flink native k8s 按照文档提交任务找不到对应的集群
你的理解是没有问题的 之所以将FlinkSessionJob拆成单独的CR来管理,主要是因为这样也更符合K8s的语义,在Session集群内每个Job也可以作为K8s资源来管理,Job状态变化就能及时更新到Status里面 Best, Yang yidan zhao 于2022年7月14日周四 23:01写道: > 再咨询下关于 flink-k8s-operator 的问题。 > 我看了看问的文档,提供了2个CRD,分别为 FlinkDeployment 和 FlinkSessionJob。不知道如下理解对不对: > (1)对于 application-mode 方式提交运行的任务,则用 FlinkDeployment,并配置好 job 部分。 会自动创建 > flink 集群,并根据 job 配置运行job。 > 这种方式不需要考虑集群创建、任务提交的步骤,本身就是一体。 > (2)对于 session 集群的创建,也是用 FlinkDeployment ,只是不需要指定 job 配置即可。 > (3)配合通过(2)方式创建的 session 集群,则可以配合 FlinkSessionJob 提交任务。 > > Yang Wang 于2022年7月12日周二 17:10写道: > > > > 如果你K8s集群内的机器配置的DNS Server也是coredns,那就可以正常解析clusterIP对应的service的 > > > > 最初ClusterIP的设计也是让任务管理的Pod来使用,例如flink-kubernetes-operator[1] > > > > [1]. https://github.com/apache/flink-kubernetes-operator > > > > Best, > > Yang > > > > yidan zhao 于2022年7月12日周二 13:17写道: > > > > > 我用 flink run -m 方式指定 clusterIp 是可以提交任务的。 > > > 那么使用 --target kubernetes-session > > > -Dkubernetes.cluster-id=my-first-flink-cluster 的方式,为什么不能智能点拿到对应 > > > cluster 的 svc 的 clusterIp 去提交呢。 > > > > > > yidan zhao 于2022年7月12日周二 12:50写道: > > > > > > > > 如果是在 k8s-master-node 上,可不可以直接用 ClusterIp 呢? > > > > > > > > > > > > 其次,NodePort我大概理解,一直不是很懂 LoadBalancer 方式是什么原理。 > > > > > > > > yidan zhao 于2022年7月12日周二 12:48写道: > > > > > > > > > > 我理解的 k8s 集群内是组成 k8s 的机器,是必须在 pod 内?我在k8s的node上也不可以是吧。 > > > > > > > > > > Yang Wang 于2022年7月12日周二 12:07写道: > > > > > > > > > > > > 日志里面已经说明的比较清楚了,如果用的是ClusterIP的方式,那你的Flink > > > > > > client必须在k8s集群内才能正常提交。例如:起一个Pod,然后再pod里面执行flink run > > > > > > 否则你就需要NodePort或者LoadBalancer的方式了 > > > > > > > > > > > > 2022-07-12 10:23:23,021 WARN > > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > > > > savepoint, etc.) won't work from outside the Kubernetes cluster > since > > > > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > > > > > > > > > > > > > Best, > > > > > > Yang > > > > > > > > > > > > yidan zhao 于2022年7月12日周二 10:40写道: > > > > > > > > > > > > > 如下步骤参考的文档 > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > > > > > > > > > > > > 版本:1.15 > > > > > > > > > > > > > > (1)创建集群: > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > > > > > (2)提交任务: > > > > > > > ./bin/flink run \ > > > > > > > --target kubernetes-session \ > > > > > > > -Dkubernetes.cluster-id=my-first-flink-cluster \ > > > > > > > ./examples/streaming/TopSpeedWindowing.jar > > > > > > > > > > > > > > svc是ClusterIp类型 > > > > > > > > > > > > > > 第二步提交任务环节,显示如下: > > > > > > > Executing example with default input data. > > > > > > > Use --input to specify file input. > > > > > > > Printing result to stdout. Use --output to specify output path. > > > > > > > 2022-07-12 10:23:23,021 WARN > > > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor > [] - > > > > > > > Please note that Flink client operations(e.g. cancel, list, > stop, > > > > > > > savepoint, etc.) won't work from outside the Kubernetes cluster > > > since > > > > > > > 'kubernetes.rest-service.exposed.type' has been set to > ClusterIP. > > > > > > > 2022-07-12 10:23:23,027 INFO > > > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor > [] - > > > > > > > Retrieve flink cluster my-first-flink-cluster successfully, > > > JobManager > > > > > > > Web Interface: http://my-first-flink-cluster-rest.test:8081 > > > > > > > 2022-07-12 10:23:23,044 WARN > > > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor > [] - > > > > > > > Please note that Flink client operations(e.g. cancel, list, > stop, > > > > > > > savepoint, etc.) won't work from outside the Kubernetes cluster > > > since > > > > > > > 'kubernetes.rest-service.exposed.type' has been set to > ClusterIP. > > > > > > > > > > > > > > > > > > > > > The program finished with the following exception: > > > > > > > org.apache.flink.client.program.ProgramInvocationException: The > > > main > > > > > > > method caused an error: Failed to execute job > > > > > > > 'CarTopSpeedWindowingExample'. > > > > > > > ... > > > > > > > Caused by: org.apache.flink.util.FlinkException: Failed to > execute > > > job > > > > > > > 'CarTopSpeedWindowingExample'. > > > > > > > ... > > > > > > > Caused by: > org.apache.flink.runtime.client.JobSubmissionException: > > > > > > > Failed to submit JobGraph. > > > > > > > ... > > > > > > > Caused by: > > > org.apache.flink.util.concurrent.FutureUtils$RetryException: > > > > > > > Could not complete the operation. Number of retries has been > > > > > > > exhausted. > > > > > > > ... > > > > > > > Caused by: java.util.concurrent.CompletionExceptio
Re: flink native k8s 按照文档提交任务找不到对应的集群
再咨询下关于 flink-k8s-operator 的问题。 我看了看问的文档,提供了2个CRD,分别为 FlinkDeployment 和 FlinkSessionJob。不知道如下理解对不对: (1)对于 application-mode 方式提交运行的任务,则用 FlinkDeployment,并配置好 job 部分。 会自动创建 flink 集群,并根据 job 配置运行job。 这种方式不需要考虑集群创建、任务提交的步骤,本身就是一体。 (2)对于 session 集群的创建,也是用 FlinkDeployment ,只是不需要指定 job 配置即可。 (3)配合通过(2)方式创建的 session 集群,则可以配合 FlinkSessionJob 提交任务。 Yang Wang 于2022年7月12日周二 17:10写道: > > 如果你K8s集群内的机器配置的DNS Server也是coredns,那就可以正常解析clusterIP对应的service的 > > 最初ClusterIP的设计也是让任务管理的Pod来使用,例如flink-kubernetes-operator[1] > > [1]. https://github.com/apache/flink-kubernetes-operator > > Best, > Yang > > yidan zhao 于2022年7月12日周二 13:17写道: > > > 我用 flink run -m 方式指定 clusterIp 是可以提交任务的。 > > 那么使用 --target kubernetes-session > > -Dkubernetes.cluster-id=my-first-flink-cluster 的方式,为什么不能智能点拿到对应 > > cluster 的 svc 的 clusterIp 去提交呢。 > > > > yidan zhao 于2022年7月12日周二 12:50写道: > > > > > > 如果是在 k8s-master-node 上,可不可以直接用 ClusterIp 呢? > > > > > > > > > 其次,NodePort我大概理解,一直不是很懂 LoadBalancer 方式是什么原理。 > > > > > > yidan zhao 于2022年7月12日周二 12:48写道: > > > > > > > > 我理解的 k8s 集群内是组成 k8s 的机器,是必须在 pod 内?我在k8s的node上也不可以是吧。 > > > > > > > > Yang Wang 于2022年7月12日周二 12:07写道: > > > > > > > > > > 日志里面已经说明的比较清楚了,如果用的是ClusterIP的方式,那你的Flink > > > > > client必须在k8s集群内才能正常提交。例如:起一个Pod,然后再pod里面执行flink run > > > > > 否则你就需要NodePort或者LoadBalancer的方式了 > > > > > > > > > > 2022-07-12 10:23:23,021 WARN > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > > > savepoint, etc.) won't work from outside the Kubernetes cluster since > > > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > > > > > > > > > > Best, > > > > > Yang > > > > > > > > > > yidan zhao 于2022年7月12日周二 10:40写道: > > > > > > > > > > > 如下步骤参考的文档 > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > > > > > > > > > > 版本:1.15 > > > > > > > > > > > > (1)创建集群: > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > > > > (2)提交任务: > > > > > > ./bin/flink run \ > > > > > > --target kubernetes-session \ > > > > > > -Dkubernetes.cluster-id=my-first-flink-cluster \ > > > > > > ./examples/streaming/TopSpeedWindowing.jar > > > > > > > > > > > > svc是ClusterIp类型 > > > > > > > > > > > > 第二步提交任务环节,显示如下: > > > > > > Executing example with default input data. > > > > > > Use --input to specify file input. > > > > > > Printing result to stdout. Use --output to specify output path. > > > > > > 2022-07-12 10:23:23,021 WARN > > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > > > > savepoint, etc.) won't work from outside the Kubernetes cluster > > since > > > > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > 2022-07-12 10:23:23,027 INFO > > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > > > Retrieve flink cluster my-first-flink-cluster successfully, > > JobManager > > > > > > Web Interface: http://my-first-flink-cluster-rest.test:8081 > > > > > > 2022-07-12 10:23:23,044 WARN > > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > > > > savepoint, etc.) won't work from outside the Kubernetes cluster > > since > > > > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > > > > > > > > > > > > > The program finished with the following exception: > > > > > > org.apache.flink.client.program.ProgramInvocationException: The > > main > > > > > > method caused an error: Failed to execute job > > > > > > 'CarTopSpeedWindowingExample'. > > > > > > ... > > > > > > Caused by: org.apache.flink.util.FlinkException: Failed to execute > > job > > > > > > 'CarTopSpeedWindowingExample'. > > > > > > ... > > > > > > Caused by: org.apache.flink.runtime.client.JobSubmissionException: > > > > > > Failed to submit JobGraph. > > > > > > ... > > > > > > Caused by: > > org.apache.flink.util.concurrent.FutureUtils$RetryException: > > > > > > Could not complete the operation. Number of retries has been > > > > > > exhausted. > > > > > > ... > > > > > > Caused by: java.util.concurrent.CompletionException: > > > > > > java.net.UnknownHostException: my-first-flink-cluster-rest.test: > > Name > > > > > > or service not known > > > > > > ... > > > > > > Caused by: java.net.UnknownHostException: > > > > > > my-first-flink-cluster-rest.test: Name or service not known > > > > > > > > > > > > > > > > > > 如上,根据 --target kubernetes-session > > > > > > -Dkubernetes.cluster-id=my-first-flink-cluster 找到的提交入口为
Re: flink native k8s 按照文档提交任务找不到对应的集群
如果你K8s集群内的机器配置的DNS Server也是coredns,那就可以正常解析clusterIP对应的service的 最初ClusterIP的设计也是让任务管理的Pod来使用,例如flink-kubernetes-operator[1] [1]. https://github.com/apache/flink-kubernetes-operator Best, Yang yidan zhao 于2022年7月12日周二 13:17写道: > 我用 flink run -m 方式指定 clusterIp 是可以提交任务的。 > 那么使用 --target kubernetes-session > -Dkubernetes.cluster-id=my-first-flink-cluster 的方式,为什么不能智能点拿到对应 > cluster 的 svc 的 clusterIp 去提交呢。 > > yidan zhao 于2022年7月12日周二 12:50写道: > > > > 如果是在 k8s-master-node 上,可不可以直接用 ClusterIp 呢? > > > > > > 其次,NodePort我大概理解,一直不是很懂 LoadBalancer 方式是什么原理。 > > > > yidan zhao 于2022年7月12日周二 12:48写道: > > > > > > 我理解的 k8s 集群内是组成 k8s 的机器,是必须在 pod 内?我在k8s的node上也不可以是吧。 > > > > > > Yang Wang 于2022年7月12日周二 12:07写道: > > > > > > > > 日志里面已经说明的比较清楚了,如果用的是ClusterIP的方式,那你的Flink > > > > client必须在k8s集群内才能正常提交。例如:起一个Pod,然后再pod里面执行flink run > > > > 否则你就需要NodePort或者LoadBalancer的方式了 > > > > > > > > 2022-07-12 10:23:23,021 WARN > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > > savepoint, etc.) won't work from outside the Kubernetes cluster since > > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > > > > > > > Best, > > > > Yang > > > > > > > > yidan zhao 于2022年7月12日周二 10:40写道: > > > > > > > > > 如下步骤参考的文档 > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > > > > > > > > 版本:1.15 > > > > > > > > > > (1)创建集群: > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > > > (2)提交任务: > > > > > ./bin/flink run \ > > > > > --target kubernetes-session \ > > > > > -Dkubernetes.cluster-id=my-first-flink-cluster \ > > > > > ./examples/streaming/TopSpeedWindowing.jar > > > > > > > > > > svc是ClusterIp类型 > > > > > > > > > > 第二步提交任务环节,显示如下: > > > > > Executing example with default input data. > > > > > Use --input to specify file input. > > > > > Printing result to stdout. Use --output to specify output path. > > > > > 2022-07-12 10:23:23,021 WARN > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > > > savepoint, etc.) won't work from outside the Kubernetes cluster > since > > > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > 2022-07-12 10:23:23,027 INFO > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > > Retrieve flink cluster my-first-flink-cluster successfully, > JobManager > > > > > Web Interface: http://my-first-flink-cluster-rest.test:8081 > > > > > 2022-07-12 10:23:23,044 WARN > > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > > > savepoint, etc.) won't work from outside the Kubernetes cluster > since > > > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > > > > > > > > > > The program finished with the following exception: > > > > > org.apache.flink.client.program.ProgramInvocationException: The > main > > > > > method caused an error: Failed to execute job > > > > > 'CarTopSpeedWindowingExample'. > > > > > ... > > > > > Caused by: org.apache.flink.util.FlinkException: Failed to execute > job > > > > > 'CarTopSpeedWindowingExample'. > > > > > ... > > > > > Caused by: org.apache.flink.runtime.client.JobSubmissionException: > > > > > Failed to submit JobGraph. > > > > > ... > > > > > Caused by: > org.apache.flink.util.concurrent.FutureUtils$RetryException: > > > > > Could not complete the operation. Number of retries has been > > > > > exhausted. > > > > > ... > > > > > Caused by: java.util.concurrent.CompletionException: > > > > > java.net.UnknownHostException: my-first-flink-cluster-rest.test: > Name > > > > > or service not known > > > > > ... > > > > > Caused by: java.net.UnknownHostException: > > > > > my-first-flink-cluster-rest.test: Name or service not known > > > > > > > > > > > > > > > 如上,根据 --target kubernetes-session > > > > > -Dkubernetes.cluster-id=my-first-flink-cluster 找到的提交入口为 > > > > > > my-first-flink-cluster-rest.test。这个应该是根据k8s生成的dns,test是flink的namespace。 > > > > > > > > > > 我本地也的确并无法解析 my-first-flink-cluster-rest.test 这个。 > > > > > >
Re: flink native k8s 按照文档提交任务找不到对应的集群
我用 flink run -m 方式指定 clusterIp 是可以提交任务的。 那么使用 --target kubernetes-session -Dkubernetes.cluster-id=my-first-flink-cluster 的方式,为什么不能智能点拿到对应 cluster 的 svc 的 clusterIp 去提交呢。 yidan zhao 于2022年7月12日周二 12:50写道: > > 如果是在 k8s-master-node 上,可不可以直接用 ClusterIp 呢? > > > 其次,NodePort我大概理解,一直不是很懂 LoadBalancer 方式是什么原理。 > > yidan zhao 于2022年7月12日周二 12:48写道: > > > > 我理解的 k8s 集群内是组成 k8s 的机器,是必须在 pod 内?我在k8s的node上也不可以是吧。 > > > > Yang Wang 于2022年7月12日周二 12:07写道: > > > > > > 日志里面已经说明的比较清楚了,如果用的是ClusterIP的方式,那你的Flink > > > client必须在k8s集群内才能正常提交。例如:起一个Pod,然后再pod里面执行flink run > > > 否则你就需要NodePort或者LoadBalancer的方式了 > > > > > > 2022-07-12 10:23:23,021 WARN > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > savepoint, etc.) won't work from outside the Kubernetes cluster since > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > > > > Best, > > > Yang > > > > > > yidan zhao 于2022年7月12日周二 10:40写道: > > > > > > > 如下步骤参考的文档 > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > > > > > > 版本:1.15 > > > > > > > > (1)创建集群: > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > > (2)提交任务: > > > > ./bin/flink run \ > > > > --target kubernetes-session \ > > > > -Dkubernetes.cluster-id=my-first-flink-cluster \ > > > > ./examples/streaming/TopSpeedWindowing.jar > > > > > > > > svc是ClusterIp类型 > > > > > > > > 第二步提交任务环节,显示如下: > > > > Executing example with default input data. > > > > Use --input to specify file input. > > > > Printing result to stdout. Use --output to specify output path. > > > > 2022-07-12 10:23:23,021 WARN > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > > savepoint, etc.) won't work from outside the Kubernetes cluster since > > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > 2022-07-12 10:23:23,027 INFO > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > Retrieve flink cluster my-first-flink-cluster successfully, JobManager > > > > Web Interface: http://my-first-flink-cluster-rest.test:8081 > > > > 2022-07-12 10:23:23,044 WARN > > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > > savepoint, etc.) won't work from outside the Kubernetes cluster since > > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > > > > > > > The program finished with the following exception: > > > > org.apache.flink.client.program.ProgramInvocationException: The main > > > > method caused an error: Failed to execute job > > > > 'CarTopSpeedWindowingExample'. > > > > ... > > > > Caused by: org.apache.flink.util.FlinkException: Failed to execute job > > > > 'CarTopSpeedWindowingExample'. > > > > ... > > > > Caused by: org.apache.flink.runtime.client.JobSubmissionException: > > > > Failed to submit JobGraph. > > > > ... > > > > Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: > > > > Could not complete the operation. Number of retries has been > > > > exhausted. > > > > ... > > > > Caused by: java.util.concurrent.CompletionException: > > > > java.net.UnknownHostException: my-first-flink-cluster-rest.test: Name > > > > or service not known > > > > ... > > > > Caused by: java.net.UnknownHostException: > > > > my-first-flink-cluster-rest.test: Name or service not known > > > > > > > > > > > > 如上,根据 --target kubernetes-session > > > > -Dkubernetes.cluster-id=my-first-flink-cluster 找到的提交入口为 > > > > my-first-flink-cluster-rest.test。这个应该是根据k8s生成的dns,test是flink的namespace。 > > > > > > > > 我本地也的确并无法解析 my-first-flink-cluster-rest.test 这个。 > > > >
Re: flink native k8s 按照文档提交任务找不到对应的集群
如果是在 k8s-master-node 上,可不可以直接用 ClusterIp 呢? 其次,NodePort我大概理解,一直不是很懂 LoadBalancer 方式是什么原理。 yidan zhao 于2022年7月12日周二 12:48写道: > > 我理解的 k8s 集群内是组成 k8s 的机器,是必须在 pod 内?我在k8s的node上也不可以是吧。 > > Yang Wang 于2022年7月12日周二 12:07写道: > > > > 日志里面已经说明的比较清楚了,如果用的是ClusterIP的方式,那你的Flink > > client必须在k8s集群内才能正常提交。例如:起一个Pod,然后再pod里面执行flink run > > 否则你就需要NodePort或者LoadBalancer的方式了 > > > > 2022-07-12 10:23:23,021 WARN > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > Please note that Flink client operations(e.g. cancel, list, stop, > > savepoint, etc.) won't work from outside the Kubernetes cluster since > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > Best, > > Yang > > > > yidan zhao 于2022年7月12日周二 10:40写道: > > > > > 如下步骤参考的文档 > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > > > > 版本:1.15 > > > > > > (1)创建集群: > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > (2)提交任务: > > > ./bin/flink run \ > > > --target kubernetes-session \ > > > -Dkubernetes.cluster-id=my-first-flink-cluster \ > > > ./examples/streaming/TopSpeedWindowing.jar > > > > > > svc是ClusterIp类型 > > > > > > 第二步提交任务环节,显示如下: > > > Executing example with default input data. > > > Use --input to specify file input. > > > Printing result to stdout. Use --output to specify output path. > > > 2022-07-12 10:23:23,021 WARN > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > savepoint, etc.) won't work from outside the Kubernetes cluster since > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > 2022-07-12 10:23:23,027 INFO > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > Retrieve flink cluster my-first-flink-cluster successfully, JobManager > > > Web Interface: http://my-first-flink-cluster-rest.test:8081 > > > 2022-07-12 10:23:23,044 WARN > > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > > Please note that Flink client operations(e.g. cancel, list, stop, > > > savepoint, etc.) won't work from outside the Kubernetes cluster since > > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > > > > The program finished with the following exception: > > > org.apache.flink.client.program.ProgramInvocationException: The main > > > method caused an error: Failed to execute job > > > 'CarTopSpeedWindowingExample'. > > > ... > > > Caused by: org.apache.flink.util.FlinkException: Failed to execute job > > > 'CarTopSpeedWindowingExample'. > > > ... > > > Caused by: org.apache.flink.runtime.client.JobSubmissionException: > > > Failed to submit JobGraph. > > > ... > > > Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: > > > Could not complete the operation. Number of retries has been > > > exhausted. > > > ... > > > Caused by: java.util.concurrent.CompletionException: > > > java.net.UnknownHostException: my-first-flink-cluster-rest.test: Name > > > or service not known > > > ... > > > Caused by: java.net.UnknownHostException: > > > my-first-flink-cluster-rest.test: Name or service not known > > > > > > > > > 如上,根据 --target kubernetes-session > > > -Dkubernetes.cluster-id=my-first-flink-cluster 找到的提交入口为 > > > my-first-flink-cluster-rest.test。这个应该是根据k8s生成的dns,test是flink的namespace。 > > > > > > 我本地也的确并无法解析 my-first-flink-cluster-rest.test 这个。 > > >
Re: flink native k8s 按照文档提交任务找不到对应的集群
我理解的 k8s 集群内是组成 k8s 的机器,是必须在 pod 内?我在k8s的node上也不可以是吧。 Yang Wang 于2022年7月12日周二 12:07写道: > > 日志里面已经说明的比较清楚了,如果用的是ClusterIP的方式,那你的Flink > client必须在k8s集群内才能正常提交。例如:起一个Pod,然后再pod里面执行flink run > 否则你就需要NodePort或者LoadBalancer的方式了 > > 2022-07-12 10:23:23,021 WARN > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > Please note that Flink client operations(e.g. cancel, list, stop, > savepoint, etc.) won't work from outside the Kubernetes cluster since > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > Best, > Yang > > yidan zhao 于2022年7月12日周二 10:40写道: > > > 如下步骤参考的文档 > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > > > 版本:1.15 > > > > (1)创建集群: > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > (2)提交任务: > > ./bin/flink run \ > > --target kubernetes-session \ > > -Dkubernetes.cluster-id=my-first-flink-cluster \ > > ./examples/streaming/TopSpeedWindowing.jar > > > > svc是ClusterIp类型 > > > > 第二步提交任务环节,显示如下: > > Executing example with default input data. > > Use --input to specify file input. > > Printing result to stdout. Use --output to specify output path. > > 2022-07-12 10:23:23,021 WARN > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > Please note that Flink client operations(e.g. cancel, list, stop, > > savepoint, etc.) won't work from outside the Kubernetes cluster since > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > 2022-07-12 10:23:23,027 INFO > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > Retrieve flink cluster my-first-flink-cluster successfully, JobManager > > Web Interface: http://my-first-flink-cluster-rest.test:8081 > > 2022-07-12 10:23:23,044 WARN > > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > > Please note that Flink client operations(e.g. cancel, list, stop, > > savepoint, etc.) won't work from outside the Kubernetes cluster since > > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > > > > The program finished with the following exception: > > org.apache.flink.client.program.ProgramInvocationException: The main > > method caused an error: Failed to execute job > > 'CarTopSpeedWindowingExample'. > > ... > > Caused by: org.apache.flink.util.FlinkException: Failed to execute job > > 'CarTopSpeedWindowingExample'. > > ... > > Caused by: org.apache.flink.runtime.client.JobSubmissionException: > > Failed to submit JobGraph. > > ... > > Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: > > Could not complete the operation. Number of retries has been > > exhausted. > > ... > > Caused by: java.util.concurrent.CompletionException: > > java.net.UnknownHostException: my-first-flink-cluster-rest.test: Name > > or service not known > > ... > > Caused by: java.net.UnknownHostException: > > my-first-flink-cluster-rest.test: Name or service not known > > > > > > 如上,根据 --target kubernetes-session > > -Dkubernetes.cluster-id=my-first-flink-cluster 找到的提交入口为 > > my-first-flink-cluster-rest.test。这个应该是根据k8s生成的dns,test是flink的namespace。 > > > > 我本地也的确并无法解析 my-first-flink-cluster-rest.test 这个。 > >
Re: flink native k8s 按照文档提交任务找不到对应的集群
日志里面已经说明的比较清楚了,如果用的是ClusterIP的方式,那你的Flink client必须在k8s集群内才能正常提交。例如:起一个Pod,然后再pod里面执行flink run 否则你就需要NodePort或者LoadBalancer的方式了 2022-07-12 10:23:23,021 WARN org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Please note that Flink client operations(e.g. cancel, list, stop, savepoint, etc.) won't work from outside the Kubernetes cluster since 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. Best, Yang yidan zhao 于2022年7月12日周二 10:40写道: > 如下步骤参考的文档 > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > > 版本:1.15 > > (1)创建集群: > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes > (2)提交任务: > ./bin/flink run \ > --target kubernetes-session \ > -Dkubernetes.cluster-id=my-first-flink-cluster \ > ./examples/streaming/TopSpeedWindowing.jar > > svc是ClusterIp类型 > > 第二步提交任务环节,显示如下: > Executing example with default input data. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > 2022-07-12 10:23:23,021 WARN > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > Please note that Flink client operations(e.g. cancel, list, stop, > savepoint, etc.) won't work from outside the Kubernetes cluster since > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > 2022-07-12 10:23:23,027 INFO > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > Retrieve flink cluster my-first-flink-cluster successfully, JobManager > Web Interface: http://my-first-flink-cluster-rest.test:8081 > 2022-07-12 10:23:23,044 WARN > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - > Please note that Flink client operations(e.g. cancel, list, stop, > savepoint, etc.) won't work from outside the Kubernetes cluster since > 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. > > > The program finished with the following exception: > org.apache.flink.client.program.ProgramInvocationException: The main > method caused an error: Failed to execute job > 'CarTopSpeedWindowingExample'. > ... > Caused by: org.apache.flink.util.FlinkException: Failed to execute job > 'CarTopSpeedWindowingExample'. > ... > Caused by: org.apache.flink.runtime.client.JobSubmissionException: > Failed to submit JobGraph. > ... > Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: > Could not complete the operation. Number of retries has been > exhausted. > ... > Caused by: java.util.concurrent.CompletionException: > java.net.UnknownHostException: my-first-flink-cluster-rest.test: Name > or service not known > ... > Caused by: java.net.UnknownHostException: > my-first-flink-cluster-rest.test: Name or service not known > > > 如上,根据 --target kubernetes-session > -Dkubernetes.cluster-id=my-first-flink-cluster 找到的提交入口为 > my-first-flink-cluster-rest.test。这个应该是根据k8s生成的dns,test是flink的namespace。 > > 我本地也的确并无法解析 my-first-flink-cluster-rest.test 这个。 >
flink native k8s 按照文档提交任务找不到对应的集群
如下步骤参考的文档 https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes 版本:1.15 (1)创建集群: https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes (2)提交任务: ./bin/flink run \ --target kubernetes-session \ -Dkubernetes.cluster-id=my-first-flink-cluster \ ./examples/streaming/TopSpeedWindowing.jar svc是ClusterIp类型 第二步提交任务环节,显示如下: Executing example with default input data. Use --input to specify file input. Printing result to stdout. Use --output to specify output path. 2022-07-12 10:23:23,021 WARN org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Please note that Flink client operations(e.g. cancel, list, stop, savepoint, etc.) won't work from outside the Kubernetes cluster since 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. 2022-07-12 10:23:23,027 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Retrieve flink cluster my-first-flink-cluster successfully, JobManager Web Interface: http://my-first-flink-cluster-rest.test:8081 2022-07-12 10:23:23,044 WARN org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Please note that Flink client operations(e.g. cancel, list, stop, savepoint, etc.) won't work from outside the Kubernetes cluster since 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Failed to execute job 'CarTopSpeedWindowingExample'. ... Caused by: org.apache.flink.util.FlinkException: Failed to execute job 'CarTopSpeedWindowingExample'. ... Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph. ... Caused by: org.apache.flink.util.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted. ... Caused by: java.util.concurrent.CompletionException: java.net.UnknownHostException: my-first-flink-cluster-rest.test: Name or service not known ... Caused by: java.net.UnknownHostException: my-first-flink-cluster-rest.test: Name or service not known 如上,根据 --target kubernetes-session -Dkubernetes.cluster-id=my-first-flink-cluster 找到的提交入口为 my-first-flink-cluster-rest.test。这个应该是根据k8s生成的dns,test是flink的namespace。 我本地也的确并无法解析 my-first-flink-cluster-rest.test 这个。
flink native k8s ????????
flink 1.12.2 Native K8s, ./bin/kubernetes-session.sh \ -Dkubernetes.namespace=flink-session-cluster \ -Dkubernetes.jobmanager.service-account=flink \ -Dkubernetes.cluster-id=session001 \ -Dtaskmanager.memory.process.size=1024m \ -Dkubernetes.taskmanager.cpu=1 \ -Dtaskmanager.numberOfTaskSlots=4 \ -Dresourcemanager.taskmanager-timeout=360 ??svc??cluster-Ip ./bin/flink run -d \ -e kubernetes-session \ -Dkubernetes.namespace=flink-session-cluster \ -Dkubernetes.cluster-id=session001 \ examples/streaming/WindowJoin.jar ??k8s??
Re: flink native k8s 有计划支持hostAlias配置码?
目前对于一些不是经常使用的功能,社区打算使用pod template来统一支持 我理解应该是可以满足你的需求的 这样更加灵活,也会有更好的扩展性,具体你可以看一下这个JIRA[1] 已经有了一个draft的PR,会很快在完成后提交正式PR,然后review 你也可以先试用一下,有问题及时反馈 [1]. https://issues.apache.org/jira/browse/FLINK-15656 Best, Yang 高函 于2021年1月18日周一 上午11:13写道: > > > 请问社区有计划支持native k8s模式下配置hostAlais码? > 如果自己扩展的话,需要在模块中添加对应的hostAlais的配置项,并打包自定义的docker 镜像码? > 谢谢~ >
flink native k8s 有计划支持hostAlias配置码?
请问社区有计划支持native k8s模式下配置hostAlais码? 如果自己扩展的话,需要在模块中添加对应的hostAlais的配置项,并打包自定义的docker 镜像码? 谢谢~