[ 
https://issues.apache.org/jira/browse/FLINK-31775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Sainz updated FLINK-31775:
---------------------------------
    Description: 
When using native kubernetes deployment mode, and when new TaskManager pod is 
started to process a job, the TaskManager pod will attempt to register itself 
to the resource manager (JobManager). the TaskManager looks up the resource 
manager per ip-address 
(akka.tcp://flink@192.168.140.164:6123/user/rpc/resourcemanager_1)

 

Nevertheless when istio is enabled, the resolution by ip address is blocked, 
and hence we see that the job cannot start because task manager cannot register 
with the resource manager:

2023-04-10 23:24:19,752 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not 
resolve ResourceManager address 
akka.tcp://flink@192.168.140.164:6123/user/rpc/resourcemanager_1, retrying in 
10000 ms: Could not connect to rpc endpoint under address 
akka.tcp://flink@192.168.140.164:6123/user/rpc/resourcemanager_1.

 

Notice that when HA is disabled, the resolution of the resource manager is made 
by service name and so the resource manager can be found

 

2023-04-11 00:49:34,162 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Successful 
registration at resource manager 
akka.tcp://flink@myenv-dev-flink-cluster.myenv-dev:6123/user/rpc/resourcemanager_*
 under registration id 83ad942597f86aa880ee96f1c2b8b923.

 

Notice in my case , it is not possible to disable istio as explained here: 
[https://doc.akka.io/docs/akka-management/current/bootstrap/istio.html]

 

Although similar to https://issues.apache.org/jira/browse/FLINK-28171 , logging 
as separate defect as I believe the fix of FLINK-28171 won't fix this case. 
FLINK-28171  is about Flink Kubernetes Operator.

 

  was:
When using native kubernetes deployment mode, and when new TaskManager is 
started to process a job, the TaskManager will attempt to register itself to 
the resource manager (job manager). the TaskManager looks up the resource 
manager per ip-address 
(akka.tcp://flink@192.168.140.164:6123/user/rpc/resourcemanager_1)

 

Nevertheless when istio is enabled, the resolution by ip address is blocked, 
and hence we see that the job cannot start because task manager cannot register 
with the resource manager:

2023-04-10 23:24:19,752 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not 
resolve ResourceManager address 
akka.tcp://flink@192.168.140.164:6123/user/rpc/resourcemanager_1, retrying in 
10000 ms: Could not connect to rpc endpoint under address 
akka.tcp://flink@192.168.140.164:6123/user/rpc/resourcemanager_1.

 

Notice that when HA is disabled, the resolution of the resource manager is made 
by service name and so the resource manager can be found

 

2023-04-11 00:49:34,162 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Successful 
registration at resource manager 
akka.tcp://fl...@local-mci-ar32a-dev-flink-cluster.mstr-env-mci-ar32a-dev:6123/user/rpc/resourcemanager_*
 under registration id 83ad942597f86aa880ee96f1c2b8b923.

 

Notice it is not possible to disable istio (as explained here : 
https://doc.akka.io/docs/akka-management/current/bootstrap/istio.html)

 

Although similar to https://issues.apache.org/jira/browse/FLINK-28171 , logging 
as separate defect as I believe the fix of FLINK-28171 won't fix this case. 
FLINK-28171  is about Flink Kubernetes Operator.


> High-Availability not supported in kubernetes when istio enabled
> ----------------------------------------------------------------
>
>                 Key: FLINK-31775
>                 URL: https://issues.apache.org/jira/browse/FLINK-31775
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.16.1
>            Reporter: Sergio Sainz
>            Priority: Major
>
> When using native kubernetes deployment mode, and when new TaskManager pod is 
> started to process a job, the TaskManager pod will attempt to register itself 
> to the resource manager (JobManager). the TaskManager looks up the resource 
> manager per ip-address 
> (akka.tcp://flink@192.168.140.164:6123/user/rpc/resourcemanager_1)
>  
> Nevertheless when istio is enabled, the resolution by ip address is blocked, 
> and hence we see that the job cannot start because task manager cannot 
> register with the resource manager:
> 2023-04-10 23:24:19,752 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not 
> resolve ResourceManager address 
> akka.tcp://flink@192.168.140.164:6123/user/rpc/resourcemanager_1, retrying in 
> 10000 ms: Could not connect to rpc endpoint under address 
> akka.tcp://flink@192.168.140.164:6123/user/rpc/resourcemanager_1.
>  
> Notice that when HA is disabled, the resolution of the resource manager is 
> made by service name and so the resource manager can be found
>  
> 2023-04-11 00:49:34,162 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Successful 
> registration at resource manager 
> akka.tcp://flink@myenv-dev-flink-cluster.myenv-dev:6123/user/rpc/resourcemanager_*
>  under registration id 83ad942597f86aa880ee96f1c2b8b923.
>  
> Notice in my case , it is not possible to disable istio as explained here: 
> [https://doc.akka.io/docs/akka-management/current/bootstrap/istio.html]
>  
> Although similar to https://issues.apache.org/jira/browse/FLINK-28171 , 
> logging as separate defect as I believe the fix of FLINK-28171 won't fix this 
> case. FLINK-28171  is about Flink Kubernetes Operator.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to