Re: Re: flink 1.12.0 kubernetes-session部署问题

2021-01-04 文章 Yang Wang
native方式默认使用的是LoadBalancer的方式来暴露,所以会打印出来一个你无法访问的地址
你可以加一个-Dkubernetes.rest-service.exposed.type=NodePort的方式来使用NodePort来暴露
这样Flink Client端打印出来的地址就是正确的了

另外你可以可以使用minikube ip来查看ip地址,同时用kubectl get svc获取你创建的Flink cluster
svc的NodePort,拼起来就可以


至于你说的NoResourceAvailableException,你可以看下是不是TaskManager的Pod已经创建出来了,但是pending状态
如果是,那就是你minikube资源不够了,可以把minikube资源调大或者把JobManager、TaskManager的Pod资源调小
如果不是,你可以把完整的JobManager日志发一下,这样方便查问题


Best,
Yang

陈帅  于2021年1月2日周六 上午10:43写道:

> 环境:MacBook Pro 单机安装了 minkube v1.15.1 和 kubernetes v1.19.4
> 我在flink v1.11.3发行版下执行如下命令
> kubectl create namespace flink-session-cluster
>
>
> kubectl create serviceaccount flink -n flink-session-cluster
>
>
> kubectl create clusterrolebinding flink-role-binding-flink \
> --clusterrole=edit \ --serviceaccount=flink-session-cluster:flink
>
>
> ./bin/kubernetes-session.sh \ -Dkubernetes.namespace=flink-session-cluster
> \ -Dkubernetes.jobmanager.service-account=flink \
> -Dkubernetes.cluster-id=session001 \
> -Dtaskmanager.memory.process.size=8192m \ -Dkubernetes.taskmanager.cpu=1 \
> -Dtaskmanager.numberOfTaskSlots=4 \
> -Dresourcemanager.taskmanager-timeout=360
>
>
> 屏幕打印的结果显示flink web UI启在了 http://192.168.64.2:8081 而不是类似于
> http://192.168.50.135:31753 这样的5位数端口,是哪里有问题?这里的host ip应该是minikube
> ip吗?我本地浏览器访问不了http://192.168.64.2:8081
>
>
>
> 2021-01-02 10:28:04,177 INFO
> org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The
> derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is
> less than its min value 192.000mb (201326592 bytes), min value will be used
> instead
>
> 2021-01-02 10:28:04,907 INFO
> org.apache.flink.kubernetes.KubernetesClusterDescriptor  [] - Create
> flink session cluster session001 successfully, JobManager Web Interface:
> http://192.168.64.2:8081
>
>
>
>
> 查看了pods, service, deployment都正常启动好了,显示全绿色的
>
>
> 接下来提交任务
> ./bin/flink run -d \ -e kubernetes-session \
> -Dkubernetes.namespace=flink-session-cluster \
> -Dkubernetes.cluster-id=session001 \ examples/streaming/WindowJoin.jar
>
>
>
> Using windowSize=2000, data rate=3
>
> To customize example, use: WindowJoin [--windowSize
> ] [--rate ]
>
> 2021-01-02 10:21:48,658 INFO
> org.apache.flink.kubernetes.KubernetesClusterDescriptor  [] - Retrieve
> flink cluster session001 successfully, JobManager Web Interface:
> http://10.106.136.236:8081
>
>
>
>
> 这里显示的 http://10.106.136.236:8081 我是能够通过浏览器访问到的,打开显示作业正在运行,而且available
> slots一项显示的是 0,查看JM日志有如下error
>
>
>
>
> Causedby:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Couldnot allocate the required slot within slot request timeout. Please
> make sure that the cluster has enough resources.
> at
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
> ~[flink-dist_2.12-1.11.3.jar:1.11.3]
> ... 47 more
> Causedby: java.util.concurrent.CompletionException:
> java.util.concurrent.TimeoutException
> at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> ~[?:1.8.0_275]
> at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> ~[?:1.8.0_275]
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
> ~[?:1.8.0_275]
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> ~[?:1.8.0_275]
> ... 27 more
> Causedby: java.util.concurrent.TimeoutException
> ... 25 more
>
>
> 为什么会报这个资源配置不足的错?谢谢解答!
>
>
>
>
>
>
>
>
> 在 2020-12-29 09:53:48,"Yang Wang"  写道:
> >ConfigMap不需要提前创建,那个Warning信息可以忽略,是正常的,主要原因是先创建的deployment,再创建的ConfigMap
> >你可以参考社区的文档[1]把Jm的log打到console看一下
> >
> >我怀疑是你没有创建service account导致的[2]
> >
> >[1].
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#log-files
> >[2].
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#rbac
> >
> >Best,
> >Yang
> >
> >陈帅  于2020年12月28日周一 下午5:54写道:
> >
> >> 今天改用官方最新发布的flink镜像版本1.11.3也启不起来
> >> 这是我的命令
> >> ./bin/kubernetes-session.sh \
> >>   -Dkubernetes.cluster-id=rtdp \
> >>   -Dtaskmanager.memory.process.size=4096m \
> >>   -Dkubernetes.taskmanager.cpu=2 \
> >>   -Dtaskmanager.numberOfTaskSlots=4 \
> >>   -Dresourcemanager.taskmanager-timeout=360 \
> >>   -Dkubernetes.container.image=flink:1.11.3-scala_2.12-java8 \
> >>   -Dkubernetes.namespace=rtdp
> >>
> >>
> >>
> >> Events:
> >>
> >>   Type Reason  AgeFrom   Message
> >>
> >>    --        ---
> >>
> >>   Normal   Scheduled   88sdefault-scheduler
> >> Successfully assigned rtdp/rtdp-6d7794d65d-g6mb5 to
> >> cn-shanghai.192.168.16.130
> >>
> >>   Warning  FailedMount 88skubelet
> >> MountVolume.SetUp failed for volume "flink-config-volume" : configmap
> >> "flink-config-rtdp" not found
> >>
> >>   Warning

Re:Re: flink 1.12.0 kubernetes-session部署问题

2021-01-01 文章 陈帅
环境:MacBook Pro 单机安装了 minkube v1.15.1 和 kubernetes v1.19.4
我在flink v1.11.3发行版下执行如下命令
kubectl create namespace flink-session-cluster


kubectl create serviceaccount flink -n flink-session-cluster


kubectl create clusterrolebinding flink-role-binding-flink \ --clusterrole=edit 
\ --serviceaccount=flink-session-cluster:flink


./bin/kubernetes-session.sh \ -Dkubernetes.namespace=flink-session-cluster \ 
-Dkubernetes.jobmanager.service-account=flink \ 
-Dkubernetes.cluster-id=session001 \ -Dtaskmanager.memory.process.size=8192m \ 
-Dkubernetes.taskmanager.cpu=1 \ -Dtaskmanager.numberOfTaskSlots=4 \ 
-Dresourcemanager.taskmanager-timeout=360


屏幕打印的结果显示flink web UI启在了 http://192.168.64.2:8081 而不是类似于 
http://192.168.50.135:31753 这样的5位数端口,是哪里有问题?这里的host ip应该是minikube 
ip吗?我本地浏览器访问不了http://192.168.64.2:8081



2021-01-02 10:28:04,177 INFO  
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived 
from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than 
its min value 192.000mb (201326592 bytes), min value will be used instead

2021-01-02 10:28:04,907 INFO  
org.apache.flink.kubernetes.KubernetesClusterDescriptor  [] - Create flink 
session cluster session001 successfully, JobManager Web Interface: 
http://192.168.64.2:8081




查看了pods, service, deployment都正常启动好了,显示全绿色的


接下来提交任务
./bin/flink run -d \ -e kubernetes-session \ 
-Dkubernetes.namespace=flink-session-cluster \ 
-Dkubernetes.cluster-id=session001 \ examples/streaming/WindowJoin.jar



Using windowSize=2000, data rate=3

To customize example, use: WindowJoin [--windowSize ] 
[--rate ]

2021-01-02 10:21:48,658 INFO  
org.apache.flink.kubernetes.KubernetesClusterDescriptor  [] - Retrieve 
flink cluster session001 successfully, JobManager Web Interface: 
http://10.106.136.236:8081




这里显示的 http://10.106.136.236:8081 我是能够通过浏览器访问到的,打开显示作业正在运行,而且available 
slots一项显示的是 0,查看JM日志有如下error




Causedby: 
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
Couldnot allocate the required slot within slot request timeout. Please make 
sure that the cluster has enough resources.
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
 ~[flink-dist_2.12-1.11.3.jar:1.11.3]
... 47 more
Causedby: java.util.concurrent.CompletionException: 
java.util.concurrent.TimeoutException
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_275]
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_275]
at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) 
~[?:1.8.0_275]
at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
 ~[?:1.8.0_275]
... 27 more
Causedby: java.util.concurrent.TimeoutException
... 25 more


为什么会报这个资源配置不足的错?谢谢解答!








在 2020-12-29 09:53:48,"Yang Wang"  写道:
>ConfigMap不需要提前创建,那个Warning信息可以忽略,是正常的,主要原因是先创建的deployment,再创建的ConfigMap
>你可以参考社区的文档[1]把Jm的log打到console看一下
>
>我怀疑是你没有创建service account导致的[2]
>
>[1].
>https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#log-files
>[2].
>https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#rbac
>
>Best,
>Yang
>
>陈帅  于2020年12月28日周一 下午5:54写道:
>
>> 今天改用官方最新发布的flink镜像版本1.11.3也启不起来
>> 这是我的命令
>> ./bin/kubernetes-session.sh \
>>   -Dkubernetes.cluster-id=rtdp \
>>   -Dtaskmanager.memory.process.size=4096m \
>>   -Dkubernetes.taskmanager.cpu=2 \
>>   -Dtaskmanager.numberOfTaskSlots=4 \
>>   -Dresourcemanager.taskmanager-timeout=360 \
>>   -Dkubernetes.container.image=flink:1.11.3-scala_2.12-java8 \
>>   -Dkubernetes.namespace=rtdp
>>
>>
>>
>> Events:
>>
>>   Type Reason  AgeFrom   Message
>>
>>    --        ---
>>
>>   Normal   Scheduled   88sdefault-scheduler
>> Successfully assigned rtdp/rtdp-6d7794d65d-g6mb5 to
>> cn-shanghai.192.168.16.130
>>
>>   Warning  FailedMount 88skubelet
>> MountVolume.SetUp failed for volume "flink-config-volume" : configmap
>> "flink-config-rtdp" not found
>>
>>   Warning  FailedMount 88skubelet
>> MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap
>> "hadoop-config-rtdp" not found
>>
>>   Normal   AllocIPSucceed  87sterway-daemon  Alloc IP
>> 192.168.32.25/22 for Pod
>>
>>   Normal   Pulling 87skubeletPulling
>> image "flink:1.11.3-scala_2.12-java8"
>>
>>   Normal   Pulled  31skubelet
>> Successfully pulled image "flink:1.11.3-scala_2.12-java8"
>>
>>   Normal   Created 18s (x2 over 26s)  kubeletCreated
>> container flink-job-manager
>>
>>   Normal   Started 18s (x2 over 26s)  kubeletStarted
>>

Re:Re: flink 1.12.0 kubernetes-session部署问题

2020-12-30 文章 陈帅
我是在MacBook Pro上搭建了一套MiniKube,安装了VirtualBox。请问正确启动 Flink v1.11.3 on K8S 的步骤是怎样的?
我实践的步骤是:


minikube start
cd /Users/admin/dev/flink-1.11.3
./bin/kubernetes-session.sh
此时显示拉取的镜像名称是 flink:1.11.3-scala_2.12 ,而不是dockerhub仓库上flink官方给的 
flink:1.11.3-scala_2.12-java8
于是我重新使用命令
./bin/kubernetes-session.sh \
  -Dkubernetes.cluster-id=my-flink-cluster \
  -Dkubernetes.container.image=flink:1.11.3-scala_2.12-java8


等待一段拉取镜像时间后get pod显示



SJ-DN0393:flink-1.11.3 admin$ kubectl get pods 

NAME   READY   STATUS 
RESTARTS   AGE

kubernetes-dashboard-1608509744-6bc8455756-mp47w   1/1 Running3 
 10d

my-flink-cluster-77c6f85879-9vcx8  0/1 CrashLoopBackOff   5 
 29m




通过describe pod命令显示




Events:

  Type Reason   AgeFrom   Message

   --         ---

  Normal   Scheduled29mdefault-scheduler  Successfully 
assigned default/my-flink-cluster-77c6f85879-9vcx8 to minikube

  Warning  FailedMount  29mkubelet
MountVolume.SetUp failed for volume "flink-config-volume" : configmap 
"flink-config-my-flink-cluster" not found

  Warning  FailedMount  29mkubelet
MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap 
"hadoop-config-my-flink-cluster" not found

  Normal   Pulling  29mkubeletPulling image 
"flink:1.11.3-scala_2.12-java8"

  Normal   Pulled   2m41s (x5 over 4m34s)  kubeletContainer 
image "flink:1.11.3-scala_2.12-java8" already present on machine

  Normal   Created  2m41s (x5 over 4m33s)  kubeletCreated 
container flink-job-manager

  Normal   Started  2m41s (x5 over 4m33s)  kubeletStarted 
container flink-job-manager

  Warning  BackOff  2m8s (x10 over 4m18s)  kubeletBack-off 
restarting failed container




















在 2020-12-28 10:40:59,"Yang Wang"  写道:
>你整个流程理由有两个问题:
>
>1. 镜像找不到
>原因应该是和minikube的driver设置有关,如果是hyperkit或者其他vm的方式,你需要minikube
>ssh到虚拟机内部查看镜像是否正常存在
>
>2. JM链接无法访问
>2020-12-27 22:08:12,387 INFO
>org.apache.flink.kubernetes.KubernetesClusterDescriptor  [] - Create
>flink session cluster session001 successfully, JobManager Web Interface:
>http://192.168.99.100:8081
>
>我猜你上面的这行log应该不是你贴出来的命令打印的,因为你给的命令是NodePort方式,打印出来的JM地址不应该是8081端口的。
>只要你在minikube上提交的任务加上kubernetes.rest-service.exposed.type=NodePort,并且JM能起来,打印出来的JM地址就是可以访问的
>
>当然你也可以手动拼接出来这个链接,minikube ip拿到APIServer地址,然后用kubectl get svc 去查看你创建的Flink
>Session Cluster对应的rest svc的NodePort,拼起来访问就好了
>
>
>Best,
>Yang
>
>陈帅  于2020年12月27日周日 下午10:51写道:
>
>>
>> 本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤:
>>
>>
>> git clone
>> https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian
>> docker build --tag flink:1.12.0-scala_2.12-java8 .
>>
>>
>> cd flink-1.12.0
>> ./bin/kubernetes-session.sh \
>> -Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \
>> -Dkubernetes.rest-service.exposed.type=NodePort \
>> -Dtaskmanager.numberOfTaskSlots=2 \
>> -Dkubernetes.cluster-id=flink-session-cluster
>>
>>
>> 显示JM启起来了,但无法通过web访问
>>
>> 2020-12-27 22:08:12,387 INFO
>> org.apache.flink.kubernetes.KubernetesClusterDescriptor  [] - Create
>> flink session cluster session001 successfully, JobManager Web Interface:
>> http://192.168.99.100:8081
>>
>>
>>
>>
>> 通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态
>>
>> NAME   READY   STATUS
>> RESTARTS   AGE
>>
>> flink-session-cluster-858bd55dff-bzjk2 0/1
>>  ContainerCreating   0  5m59s
>>
>> kubernetes-dashboard-1608509744-6bc8455756-mp47w   1/1 Running
>>  0  6d14h
>>
>>
>>
>>
>> 于是通过 `kubectl describe pod
>> flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下:
>>
>>
>>
>>
>> Name: flink-session-cluster-858bd55dff-bzjk2
>>
>> Namespace:default
>>
>> Priority: 0
>>
>> Node: minikube/192.168.99.100
>>
>> Start Time:   Sun, 27 Dec 2020 22:21:56 +0800
>>
>> Labels:   app=flink-session-cluster
>>
>>   component=jobmanager
>>
>>   pod-template-hash=858bd55dff
>>
>>   type=flink-native-kubernetes
>>
>> Annotations:  
>>
>> Status:   Pending
>>
>> IP:   172.17.0.4
>>
>> IPs:
>>
>>   IP:   172.17.0.4
>>
>> Controlled By:  ReplicaSet/flink-session-cluster-858bd55dff
>>
>> Containers:
>>
>>   flink-job-manager:
>>
>> Container ID:
>>
>> Image: flink:1.12.0-scala_2.12-java8
>>
>> Image ID:
>>
>> Ports: 8081/TCP, 6123/TCP, 6124/TCP
>>
>> Host Ports:0/TCP, 0/TCP, 0/TCP
>>
>> Command:
>>
>>   /docker-entrypoint.sh
>>
>> Args:
>>
>>   native-k8s
>>
>>   $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824
>> -Xms1073741824 -

Re: flink 1.12.0 kubernetes-session部署问题

2020-12-28 文章 Yang Wang
ConfigMap不需要提前创建,那个Warning信息可以忽略,是正常的,主要原因是先创建的deployment,再创建的ConfigMap
你可以参考社区的文档[1]把Jm的log打到console看一下

我怀疑是你没有创建service account导致的[2]

[1].
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#log-files
[2].
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#rbac

Best,
Yang

陈帅  于2020年12月28日周一 下午5:54写道:

> 今天改用官方最新发布的flink镜像版本1.11.3也启不起来
> 这是我的命令
> ./bin/kubernetes-session.sh \
>   -Dkubernetes.cluster-id=rtdp \
>   -Dtaskmanager.memory.process.size=4096m \
>   -Dkubernetes.taskmanager.cpu=2 \
>   -Dtaskmanager.numberOfTaskSlots=4 \
>   -Dresourcemanager.taskmanager-timeout=360 \
>   -Dkubernetes.container.image=flink:1.11.3-scala_2.12-java8 \
>   -Dkubernetes.namespace=rtdp
>
>
>
> Events:
>
>   Type Reason  AgeFrom   Message
>
>    --        ---
>
>   Normal   Scheduled   88sdefault-scheduler
> Successfully assigned rtdp/rtdp-6d7794d65d-g6mb5 to
> cn-shanghai.192.168.16.130
>
>   Warning  FailedMount 88skubelet
> MountVolume.SetUp failed for volume "flink-config-volume" : configmap
> "flink-config-rtdp" not found
>
>   Warning  FailedMount 88skubelet
> MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap
> "hadoop-config-rtdp" not found
>
>   Normal   AllocIPSucceed  87sterway-daemon  Alloc IP
> 192.168.32.25/22 for Pod
>
>   Normal   Pulling 87skubeletPulling
> image "flink:1.11.3-scala_2.12-java8"
>
>   Normal   Pulled  31skubelet
> Successfully pulled image "flink:1.11.3-scala_2.12-java8"
>
>   Normal   Created 18s (x2 over 26s)  kubeletCreated
> container flink-job-manager
>
>   Normal   Started 18s (x2 over 26s)  kubeletStarted
> container flink-job-manager
>
>   Normal   Pulled  18skubeletContainer
> image "flink:1.11.3-scala_2.12-java8" already present on machine
>
>   Warning  BackOff 10skubeletBack-off
> restarting failed container
>
>
>
>
>
>
>
> 这里面有两个ConfigMap没有找到,是需要提前创建吗?官方文档没有说明?还是我看漏了?
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#start-flink-session
>
>
>
>
>
>
>
>
>
> 在 2020-12-27 22:50:32,"陈帅"  写道:
>
> >本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤:
> >
> >
> >git clone
> https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian
> >docker build --tag flink:1.12.0-scala_2.12-java8 .
> >
> >
> >cd flink-1.12.0
> >./bin/kubernetes-session.sh \
> -Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \
> -Dkubernetes.rest-service.exposed.type=NodePort \
> -Dtaskmanager.numberOfTaskSlots=2 \
> -Dkubernetes.cluster-id=flink-session-cluster
> >
> >
> >显示JM启起来了,但无法通过web访问
> >
> >2020-12-27 22:08:12,387 INFO
> org.apache.flink.kubernetes.KubernetesClusterDescriptor  [] - Create
> flink session cluster session001 successfully, JobManager Web Interface:
> http://192.168.99.100:8081
> >
> >
> >
> >
> >通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态
> >
> >NAME   READY   STATUS
>   RESTARTS   AGE
> >
> >flink-session-cluster-858bd55dff-bzjk2 0/1
>  ContainerCreating   0  5m59s
> >
> >kubernetes-dashboard-1608509744-6bc8455756-mp47w   1/1 Running
>  0  6d14h
> >
> >
> >
> >
> >于是通过 `kubectl describe pod
> flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下:
> >
> >
> >
> >
> >Name: flink-session-cluster-858bd55dff-bzjk2
> >
> >Namespace:default
> >
> >Priority: 0
> >
> >Node: minikube/192.168.99.100
> >
> >Start Time:   Sun, 27 Dec 2020 22:21:56 +0800
> >
> >Labels:   app=flink-session-cluster
> >
> >  component=jobmanager
> >
> >  pod-template-hash=858bd55dff
> >
> >  type=flink-native-kubernetes
> >
> >Annotations:  
> >
> >Status:   Pending
> >
> >IP:   172.17.0.4
> >
> >IPs:
> >
> >  IP:   172.17.0.4
> >
> >Controlled By:  ReplicaSet/flink-session-cluster-858bd55dff
> >
> >Containers:
> >
> >  flink-job-manager:
> >
> >Container ID:
> >
> >Image: flink:1.12.0-scala_2.12-java8
> >
> >Image ID:
> >
> >Ports: 8081/TCP, 6123/TCP, 6124/TCP
> >
> >Host Ports:0/TCP, 0/TCP, 0/TCP
> >
> >Command:
> >
> >  /docker-entrypoint.sh
> >
> >Args:
> >
> >  native-k8s
> >
> >  $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824
> -Xms1073741824 -XX:MaxMetaspaceSize=268435456
> -Dlog.file=/opt/flink/log/jobmanager.log
> -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
> -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
> -Dlog4j.configurationFil

Re: flink 1.12.0 kubernetes-session部署问题

2020-12-27 文章 Yang Wang
你整个流程理由有两个问题:

1. 镜像找不到
原因应该是和minikube的driver设置有关,如果是hyperkit或者其他vm的方式,你需要minikube
ssh到虚拟机内部查看镜像是否正常存在

2. JM链接无法访问
2020-12-27 22:08:12,387 INFO
org.apache.flink.kubernetes.KubernetesClusterDescriptor  [] - Create
flink session cluster session001 successfully, JobManager Web Interface:
http://192.168.99.100:8081

我猜你上面的这行log应该不是你贴出来的命令打印的,因为你给的命令是NodePort方式,打印出来的JM地址不应该是8081端口的。
只要你在minikube上提交的任务加上kubernetes.rest-service.exposed.type=NodePort,并且JM能起来,打印出来的JM地址就是可以访问的

当然你也可以手动拼接出来这个链接,minikube ip拿到APIServer地址,然后用kubectl get svc 去查看你创建的Flink
Session Cluster对应的rest svc的NodePort,拼起来访问就好了


Best,
Yang

陈帅  于2020年12月27日周日 下午10:51写道:

>
> 本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤:
>
>
> git clone
> https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian
> docker build --tag flink:1.12.0-scala_2.12-java8 .
>
>
> cd flink-1.12.0
> ./bin/kubernetes-session.sh \
> -Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \
> -Dkubernetes.rest-service.exposed.type=NodePort \
> -Dtaskmanager.numberOfTaskSlots=2 \
> -Dkubernetes.cluster-id=flink-session-cluster
>
>
> 显示JM启起来了,但无法通过web访问
>
> 2020-12-27 22:08:12,387 INFO
> org.apache.flink.kubernetes.KubernetesClusterDescriptor  [] - Create
> flink session cluster session001 successfully, JobManager Web Interface:
> http://192.168.99.100:8081
>
>
>
>
> 通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态
>
> NAME   READY   STATUS
> RESTARTS   AGE
>
> flink-session-cluster-858bd55dff-bzjk2 0/1
>  ContainerCreating   0  5m59s
>
> kubernetes-dashboard-1608509744-6bc8455756-mp47w   1/1 Running
>  0  6d14h
>
>
>
>
> 于是通过 `kubectl describe pod
> flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下:
>
>
>
>
> Name: flink-session-cluster-858bd55dff-bzjk2
>
> Namespace:default
>
> Priority: 0
>
> Node: minikube/192.168.99.100
>
> Start Time:   Sun, 27 Dec 2020 22:21:56 +0800
>
> Labels:   app=flink-session-cluster
>
>   component=jobmanager
>
>   pod-template-hash=858bd55dff
>
>   type=flink-native-kubernetes
>
> Annotations:  
>
> Status:   Pending
>
> IP:   172.17.0.4
>
> IPs:
>
>   IP:   172.17.0.4
>
> Controlled By:  ReplicaSet/flink-session-cluster-858bd55dff
>
> Containers:
>
>   flink-job-manager:
>
> Container ID:
>
> Image: flink:1.12.0-scala_2.12-java8
>
> Image ID:
>
> Ports: 8081/TCP, 6123/TCP, 6124/TCP
>
> Host Ports:0/TCP, 0/TCP, 0/TCP
>
> Command:
>
>   /docker-entrypoint.sh
>
> Args:
>
>   native-k8s
>
>   $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824
> -Xms1073741824 -XX:MaxMetaspaceSize=268435456
> -Dlog.file=/opt/flink/log/jobmanager.log
> -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
> -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
> -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties
> org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint
> -D jobmanager.memory.off-heap.size=134217728b -D
> jobmanager.memory.jvm-overhead.min=201326592b -D
> jobmanager.memory.jvm-metaspace.size=268435456b -D
> jobmanager.memory.heap.size=1073741824b -D
> jobmanager.memory.jvm-overhead.max=201326592b
>
> State:  Waiting
>
>   Reason:   ImagePullBackOff
>
> Ready:  False
>
> Restart Count:  0
>
> Limits:
>
>   cpu: 1
>
>   memory:  1600Mi
>
> Requests:
>
>   cpu: 1
>
>   memory:  1600Mi
>
> Environment:
>
>   _POD_IP_ADDRESS:   (v1:status.podIP)
>
>   HADOOP_CONF_DIR:  /opt/hadoop/conf
>
> Mounts:
>
>   /opt/flink/conf from flink-config-volume (rw)
>
>   /opt/hadoop/conf from hadoop-config-volume (rw)
>
>   /var/run/secrets/kubernetes.io/serviceaccount from
> default-token-s47ht (ro)
>
> Conditions:
>
>   Type  Status
>
>   Initialized   True
>
>   Ready False
>
>   ContainersReady   False
>
>   PodScheduled  True
>
> Volumes:
>
>   hadoop-config-volume:
>
> Type:  ConfigMap (a volume populated by a ConfigMap)
>
> Name:  hadoop-config-flink-session-cluster
>
> Optional:  false
>
>   flink-config-volume:
>
> Type:  ConfigMap (a volume populated by a ConfigMap)
>
> Name:  flink-config-flink-session-cluster
>
> Optional:  false
>
>   default-token-s47ht:
>
> Type:Secret (a volume populated by a Secret)
>
> SecretName:  default-token-s47ht
>
> Optional:false
>
> QoS Class:   Guaranteed
>
> Node-Selectors:  
>
> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
>
>  node.kubernetes.io/unreachable:NoExecute op=Exists for
> 300s
>
> Events:
>
>   Type Reason   Age  From   Message
>
>    --   ---

flink 1.12.0 kubernetes-session部署问题

2020-12-27 文章 陈帅
本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤:


git clone 
https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian
docker build --tag flink:1.12.0-scala_2.12-java8 .


cd flink-1.12.0
./bin/kubernetes-session.sh \ 
-Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \ 
-Dkubernetes.rest-service.exposed.type=NodePort \ 
-Dtaskmanager.numberOfTaskSlots=2 \ 
-Dkubernetes.cluster-id=flink-session-cluster


显示JM启起来了,但无法通过web访问

2020-12-27 22:08:12,387 INFO  
org.apache.flink.kubernetes.KubernetesClusterDescriptor  [] - Create flink 
session cluster session001 successfully, JobManager Web Interface: 
http://192.168.99.100:8081




通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态

NAME   READY   STATUS  
RESTARTS   AGE

flink-session-cluster-858bd55dff-bzjk2 0/1 ContainerCreating   
0  5m59s

kubernetes-dashboard-1608509744-6bc8455756-mp47w   1/1 Running 
0  6d14h




于是通过 `kubectl describe pod flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下:




Name: flink-session-cluster-858bd55dff-bzjk2

Namespace:default

Priority: 0

Node: minikube/192.168.99.100

Start Time:   Sun, 27 Dec 2020 22:21:56 +0800

Labels:   app=flink-session-cluster

  component=jobmanager

  pod-template-hash=858bd55dff

  type=flink-native-kubernetes

Annotations:  

Status:   Pending

IP:   172.17.0.4

IPs:

  IP:   172.17.0.4

Controlled By:  ReplicaSet/flink-session-cluster-858bd55dff

Containers:

  flink-job-manager:

Container ID:  

Image: flink:1.12.0-scala_2.12-java8

Image ID:  

Ports: 8081/TCP, 6123/TCP, 6124/TCP

Host Ports:0/TCP, 0/TCP, 0/TCP

Command:

  /docker-entrypoint.sh

Args:

  native-k8s

  $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824 
-Xms1073741824 -XX:MaxMetaspaceSize=268435456 
-Dlog.file=/opt/flink/log/jobmanager.log 
-Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml 
-Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 
-Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties 
org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint -D 
jobmanager.memory.off-heap.size=134217728b -D 
jobmanager.memory.jvm-overhead.min=201326592b -D 
jobmanager.memory.jvm-metaspace.size=268435456b -D 
jobmanager.memory.heap.size=1073741824b -D 
jobmanager.memory.jvm-overhead.max=201326592b

State:  Waiting

  Reason:   ImagePullBackOff

Ready:  False

Restart Count:  0

Limits:

  cpu: 1

  memory:  1600Mi

Requests:

  cpu: 1

  memory:  1600Mi

Environment:

  _POD_IP_ADDRESS:   (v1:status.podIP)

  HADOOP_CONF_DIR:  /opt/hadoop/conf

Mounts:

  /opt/flink/conf from flink-config-volume (rw)

  /opt/hadoop/conf from hadoop-config-volume (rw)

  /var/run/secrets/kubernetes.io/serviceaccount from default-token-s47ht 
(ro)

Conditions:

  Type  Status

  Initialized   True 

  Ready False 

  ContainersReady   False 

  PodScheduled  True 

Volumes:

  hadoop-config-volume:

Type:  ConfigMap (a volume populated by a ConfigMap)

Name:  hadoop-config-flink-session-cluster

Optional:  false

  flink-config-volume:

Type:  ConfigMap (a volume populated by a ConfigMap)

Name:  flink-config-flink-session-cluster

Optional:  false

  default-token-s47ht:

Type:Secret (a volume populated by a Secret)

SecretName:  default-token-s47ht

Optional:false

QoS Class:   Guaranteed

Node-Selectors:  

Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events:

  Type Reason   Age  From   Message

   --       ---

  Normal   Scheduled21m  default-scheduler  Successfully 
assigned default/flink-session-cluster-858bd55dff-bzjk2 to minikube

  Warning  FailedMount  21m (x2 over 21m)kubelet
MountVolume.SetUp failed for volume "flink-config-volume" : configmap 
"flink-config-flink-session-cluster" not found

  Warning  FailedMount  21m (x2 over 21m)kubelet
MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap 
"hadoop-config-flink-session-cluster" not found

  Normal   Pulling  13m (x4 over 21m)kubeletPulling image 
"flink:1.12.0-scala_2.12-java8"

  Warning  Failed   13m (x4 over 15m)kubeletFailed to pull 
image "flink:1.12.0-scala_2.12-java8": rpc error: code = Unknown desc = Error 
response from daemon: manifest for flink:1.12.0-scala_2.12-j