????

2022-08-09 Thread 1120344670


native k8s flink 更新configmap获取锁失败

2021-04-15 Thread 1120344670
您好:
   我们线上flink集群一个pod更新configmap时报错,我们有两个pod做的k8s原生高可用。

pod1 日志:(也是当时configmap里面保存的leader pod, ip: 10.20.0.39)

2021-04-15 20:42:26,058 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector []
- New leader elected 7d4a9b5c-39aa-4103-963b-eaf24ea6435a for
tuiwen-flink-restserver-leader.
2021-04-15 20:42:26,069 INFO 
org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting
RPC endpoint for
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager at
akka://flink/user/rpc/resourcemanager_0 .
2021-04-15 20:42:26,069 INFO 
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint   [] -
http://10.20.0.39:8081 was granted leadership with
leaderSessionID=a314d756-aa7c-4be4-a2a0-14267465d648
2021-04-15 20:42:26,261 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector []
- Create KubernetesLeaderElector tuiwen-flink-dispatcher-leader with lock
identity 7d4a9b5c-39aa-4103-963b-eaf24ea6435a.
2021-04-15 20:42:26,660 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector []
- New leader elected 6b1aac24-cf40-4aac-bb50-6812290a1f34 for
tuiwen-flink-dispatcher-leader.
2021-04-15 20:42:26,765 INFO 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] -
Starting DefaultLeaderElectionService with
KubernetesLeaderElectionDriver{configMapName='tuiwen-flink-dispatcher-leader'}.
2021-04-15 20:42:26,960 INFO 
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] -
Starting DefaultLeaderRetrievalService with
KubernetesLeaderRetrievalDriver{configMapName='tuiwen-flink-resourcemanager-leader'}.
2021-04-15 20:42:27,258 INFO 
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] -
Starting DefaultLeaderRetrievalService with
KubernetesLeaderRetrievalDriver{configMapName='tuiwen-flink-dispatcher-leader'}.
2021-04-15 20:42:30,457 INFO 
org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Recovered
2 pods from previous attempts, current attempt id is 2.
2021-04-15 20:42:30,458 INFO 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
Recovered 2 workers from previous attempt.
2021-04-15 20:42:30,458 INFO 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
Worker tuiwen-flink-taskmanager-1-12 recovered from previous attempt.
2021-04-15 20:42:30,458 INFO 
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
Worker tuiwen-flink-taskmanager-1-2 recovered from previous attempt.
2021-04-15 20:42:30,458 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector []
- Create KubernetesLeaderElector tuiwen-flink-resourcemanager-leader with
lock identity 7d4a9b5c-39aa-4103-963b-eaf24ea6435a.
2021-04-15 20:42:30,959 INFO 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] -
Starting DefaultLeaderElectionService with
KubernetesLeaderElectionDriver{configMapName='tuiwen-flink-resourcemanager-leader'}.
2021-04-15 20:42:30,978 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector []
- New leader elected 6b1aac24-cf40-4aac-bb50-6812290a1f34 for
tuiwen-flink-resourcemanager-leader.
2021-04-15 23:11:15,866 WARN 
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] -
Error while retrieving the leader gateway. Retrying to connect to
akka.tcp://flink@10.20.0.39:6123/user/rpc/dispatcher_1.
2021-04-15 23:11:30,626 WARN 
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] -
Error while retrieving the leader gateway. Retrying to connect to
akka.tcp://flink@10.20.0.39:6123/user/rpc/dispatcher_1.
2021-04-15 23:11:32,438 WARN 
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] -
Error while retrieving the leader gateway. Retrying to connect to
akka.tcp://flink@10.20.0.39:6123/user/rpc/dispatcher_1.
2021-04-15 23:11:33,325 WARN 
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] -
Error while retrieving the leader gateway. Retrying to connect to
akka.tcp://flink@10.20.0.39:6123/user/rpc/dispatcher_1.
2021-04-15 23:11:35,948 WARN 
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] -
Error while retrieving the leader gateway. Retrying to connect to
akka.tcp://flink@10.20.0.39:6123/user/rpc/dispatcher_1.
2021-04-15 23:11:39,387 WARN 
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] -
Error while retrieving the leader gateway. Retrying to connect to
akka.tcp://flink@10.20.0.39:6123/user/rpc/dispatcher_1.
2021-04-15 23:11:40,336 WARN 
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] -
Error while retrieving the leader gateway. Retrying to connect to
akka.tcp://flink@10.20.0.39:6123/user/rpc/dispatcher_1.
2021-04-15 23:11:41,485 WARN 
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever [] -
Error while retrieving the leader gateway. Retrying to connect to

Re: flink-1.12.2 TM无法使用自定的serviceAccount访问configmap

2021-03-31 Thread 1120344670
您好, 这是TM的报错, 
<http://apache-flink.147419.n8.nabble.com/file/t1260/1617171749995.jpg> 

启动的命令如下:
./bin/kubernetes-session.sh -Dkubernetes.cluster-id=tuiwen-flink
-Dtaskmanager.memory.process.size=2200m -Dkubernetes.taskmanager.cpu=0.3
-Dkubernetes.jobmanager.cpu=0.3 -Dtaskmanager.numberOfTaskSlots=2 
-Dkubernetes.rest-service.exposed.type=ClusterIP
-Dkubernetes.jobmanager.service-account=flink-service-account
-Dresourcemanager.taskmanager-timeout=345600   -Dkubernetes.namespace=flink

镜像使我们根据: apache/flink:1.12.2-scala_2.12 自己做的。



Yang Wang wrote
> 我可以确认1.12.1和1.12.2已经修复,如果还是不能正常使用,麻烦发一下启动命令以及对应的TM报错日志
> 
> Best,
> Yang
> 
> 1120344670 <

> 1120344670@

>> 于2021年3月29日周一 下午5:09写道:
> 
>> 您好:
>>之前提交过一个关于这方面的issue,链接如下:
>> http://apache-flink.147419.n8.nabble.com/flink1-12-k8s-session-TM-td10153.html
>>目前看还是没有fix对应的issue。
>>
>>报错如下:
>>
>>
>> 目前看jira上的issue已经关闭了, 请确认是否修复。
>>


Yang Wang wrote
> 我可以确认1.12.1和1.12.2已经修复,如果还是不能正常使用,麻烦发一下启动命令以及对应的TM报错日志
> 
> Best,
> Yang
> 
> 1120344670 <

> 1120344670@

>> 于2021年3月29日周一 下午5:09写道:
> 
>> 您好:
>>之前提交过一个关于这方面的issue,链接如下:
>> http://apache-flink.147419.n8.nabble.com/flink1-12-k8s-session-TM-td10153.html
>>目前看还是没有fix对应的issue。
>>
>>报错如下:
>>
>>
>> 目前看jira上的issue已经关闭了, 请确认是否修复。
>>





--
Sent from: http://apache-flink.147419.n8.nabble.com/


flink-1.12.2 TM??????????????serviceAccount????configmap

2021-03-29 Thread 1120344670
: 
??issue??:http://apache-flink.147419.n8.nabble.com/flink1-12-k8s-session-TM-td10153.html
 ??fix??issue??


 :



??jiraissue 

flink1.12 k8s session??????TM????????

2021-01-13 Thread 1120344670
flink: 1.12
kubernetes: 1.17
TM, :



TMnamespaceconfigmap 
system:serviceaccount:flink-test:default ?? 
??flink?? "taskmanager.service-account", 
"jobmanager.service-account", "kubernetes.service-account" 
service account. ??


?? ??default
kubectl create clusterrolebinding flink-role-binding-flink-defalut 
--clusterrole=edit --serviceaccount=namespace:service-account.