[jira] [Commented] (FLINK-29572) Flink Task Manager skip loopback interface for resource manager registration

2022-10-28 Thread Kevin Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625932#comment-17625932
 ] 

Kevin Li commented on FLINK-29572:
--

Hi, Xintong, thanks for your help first. However, this is not some vague proxy 
software, it is part of Service Mesh implementation and now become very popular 
now, especially in Kubernetes world. 
https://medium.com/microservices-in-practice/service-mesh-for-microservices-2953109a3c9a

Keep in mind that this FLINK-24474 is not available before 1.15. Original 
purpose is to make Flink cluster more secure if both JM/TMs run on the same 
node/computer, which is not really a case for production deployment. Also the 
way it probes the location of Job Manager is wrong if such proxy exists. That's 
why I recommended to add an option to disable/skip the loopback check since we 
know JM is not running on the same node as TM. So in my opinion, it is a bug.

> Flink Task Manager skip loopback interface for resource manager registration
> 
>
> Key: FLINK-29572
> URL: https://issues.apache.org/jira/browse/FLINK-29572
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.15.2
> Environment: Flink 1.15.2
> Kubernetes with Istio Proxy
>Reporter: Kevin Li
>Priority: Major
>
> Currently Flink Task Manager use different local interface to bind to connect 
> to Resource Manager. First one is Loopback interface. Normally if Job Manager 
> is running on remote host/container, using loopback interface to connect will 
> fail and it will pick up correct IP address.
> However, if Task Manager is running with some proxy, loopback interface can 
> connect to remote host as well. This will result 127.0.0.1 reported to 
> Resource Manager during registration, even Job Manager/Resource Manager runs 
> on remote host, and problem will happen. For us, only one Task Manager can 
> register in this case.
> I suggest adding configuration to skip Loopback interface check if we know 
> Job/Resource Manager is running on remote host/container.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29572) Flink Task Manager skip loopback interface for resource manager registration

2022-10-27 Thread Kevin Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625277#comment-17625277
 ] 

Kevin Li commented on FLINK-29572:
--

It will work if we configure different ports for each different task manager. 
But that will be cumbersome. If you have 10 task manager, you need to create 10 
different deployments for each of them. Also autoscale could be issue too. 
Rather than you have one deployment with 10 replicas and they can scale up and 
down.

I downgrade my Flink to 1.14.6 and it works fine. Looks like it was introduced 
by  FLINK-24474.

> Flink Task Manager skip loopback interface for resource manager registration
> 
>
> Key: FLINK-29572
> URL: https://issues.apache.org/jira/browse/FLINK-29572
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.15.2
> Environment: Flink 1.15.2
> Kubernetes with Istio Proxy
>Reporter: Kevin Li
>Priority: Major
>
> Currently Flink Task Manager use different local interface to bind to connect 
> to Resource Manager. First one is Loopback interface. Normally if Job Manager 
> is running on remote host/container, using loopback interface to connect will 
> fail and it will pick up correct IP address.
> However, if Task Manager is running with some proxy, loopback interface can 
> connect to remote host as well. This will result 127.0.0.1 reported to 
> Resource Manager during registration, even Job Manager/Resource Manager runs 
> on remote host, and problem will happen. For us, only one Task Manager can 
> register in this case.
> I suggest adding configuration to skip Loopback interface check if we know 
> Job/Resource Manager is running on remote host/container.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29572) Flink Task Manager skip loopback interface for resource manager registration

2022-10-26 Thread Kevin Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17624793#comment-17624793
 ] 

Kevin Li commented on FLINK-29572:
--

The sidecar proxy allows application binding to 127.0.0.1 to connect remote IP 
address (where Job Manager runs), which it shouldn't under normal situation. 
This will make Task Manager report its IP as 127.0.0.1 to Job Manager, instead 
of its real IP, such as 1.2.3.4. It has nothing with port.

Under this situation, all TMs will report their IP as 127.0.0.1, this confuse 
the Job Manager and eventually no TM can communicate with JM.

> Flink Task Manager skip loopback interface for resource manager registration
> 
>
> Key: FLINK-29572
> URL: https://issues.apache.org/jira/browse/FLINK-29572
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.15.2
> Environment: Flink 1.15.2
> Kubernetes with Istio Proxy
>Reporter: Kevin Li
>Priority: Major
>
> Currently Flink Task Manager use different local interface to bind to connect 
> to Resource Manager. First one is Loopback interface. Normally if Job Manager 
> is running on remote host/container, using loopback interface to connect will 
> fail and it will pick up correct IP address.
> However, if Task Manager is running with some proxy, loopback interface can 
> connect to remote host as well. This will result 127.0.0.1 reported to 
> Resource Manager during registration, even Job Manager/Resource Manager runs 
> on remote host, and problem will happen. For us, only one Task Manager can 
> register in this case.
> I suggest adding configuration to skip Loopback interface check if we know 
> Job/Resource Manager is running on remote host/container.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29572) Flink Task Manager skip loopback interface for resource manager registration

2022-10-16 Thread Kevin Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17618407#comment-17618407
 ] 

Kevin Li commented on FLINK-29572:
--

1. It is called service mesh, basically all ingress/egress traffic are captured 
by proxy and proxies are connected as service mesh so that apps are transparent 
for service discovery and many more. 
https://istio.io/latest/docs/ops/deployment/architecture/

2. With service mesh proxy deployed, TM can connect JM using loopback address. 
If this works, TM will report its address as 127.0.0.1:6223. JM can RPC this 
address as well. But as soon as you have multiple TMs, all of them will report 
their address as 127.0.0.1:6223. Obviously only one will succeed. This result 
JM can only connect with one TM, which is the one got success.

3. Capturing loopback traffic and forward to remote is how proxy working. 
Disable this will make proxy useless. Pls check the link in No.1.

> Flink Task Manager skip loopback interface for resource manager registration
> 
>
> Key: FLINK-29572
> URL: https://issues.apache.org/jira/browse/FLINK-29572
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.15.2
> Environment: Flink 1.15.2
> Kubernetes with Istio Proxy
>Reporter: Kevin Li
>Priority: Major
>
> Currently Flink Task Manager use different local interface to bind to connect 
> to Resource Manager. First one is Loopback interface. Normally if Job Manager 
> is running on remote host/container, using loopback interface to connect will 
> fail and it will pick up correct IP address.
> However, if Task Manager is running with some proxy, loopback interface can 
> connect to remote host as well. This will result 127.0.0.1 reported to 
> Resource Manager during registration, even Job Manager/Resource Manager runs 
> on remote host, and problem will happen. For us, only one Task Manager can 
> register in this case.
> I suggest adding configuration to skip Loopback interface check if we know 
> Job/Resource Manager is running on remote host/container.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29572) Flink Task Manager skip loopback interface for resource manager registration

2022-10-14 Thread Kevin Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617811#comment-17617811
 ] 

Kevin Li commented on FLINK-29572:
--

No, it wouldn't. This problem happens for K8s deployment. For K8s, all task 
managers share the same configuration, which was converted from config-map. I 
think we just need a configuration flag to skip loopback check since we know 
Job Manager is not running on localhost.

As indicated from doc: 

{code:java}
The external address of the network interface where the TaskManager is exposed. 
Because different TaskManagers need different values for this option, usually 
it is specified in an additional non-shared TaskManager-specific config file.
{code}


> Flink Task Manager skip loopback interface for resource manager registration
> 
>
> Key: FLINK-29572
> URL: https://issues.apache.org/jira/browse/FLINK-29572
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.15.2
> Environment: Flink 1.15.2
> Kubernetes with Istio Proxy
>Reporter: Kevin Li
>Priority: Major
>
> Currently Flink Task Manager use different local interface to bind to connect 
> to Resource Manager. First one is Loopback interface. Normally if Job Manager 
> is running on remote host/container, using loopback interface to connect will 
> fail and it will pick up correct IP address.
> However, if Task Manager is running with some proxy, loopback interface can 
> connect to remote host as well. This will result 127.0.0.1 reported to 
> Resource Manager during registration, even Job Manager/Resource Manager runs 
> on remote host, and problem will happen. For us, only one Task Manager can 
> register in this case.
> I suggest adding configuration to skip Loopback interface check if we know 
> Job/Resource Manager is running on remote host/container.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29572) Flink Task Manager skip loopback interface for resource manager registration

2022-10-10 Thread Kevin Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Li updated FLINK-29572:
-
Description: 
Currently Flink Task Manager use different local interface to bind to connect 
to Resource Manager. First one is Loopback interface. Normally if Job Manager 
is running on remote host/container, using loopback interface to connect will 
fail and it will pick up correct IP address.

However, if Task Manager is running with some proxy, loopback interface can 
connect to remote host as well. This will result 127.0.0.1 reported to Resource 
Manager during registration, even Job Manager/Resource Manager runs on remote 
host, and problem will happen. For us, only one Task Manager can register in 
this case.

I suggest adding configuration to skip Loopback interface check if we know 
Job/Resource Manager is running on remote host/container.

 

  was:
Currently Flink Task Manager use different local interface to bind to connect 
to Resource Manager. First one is Loopback interface. Normally if Job Manager 
is running on remote host/container, using loopback interface to connect will 
fail and it will pick up correct IP address.

 

However, if Task Manager is running with some proxy, loopback interface can 
connect to remote host as well. This will result 127.0.0.1 reported to Resource 
Manager during registration, even Job Manager/Resource Manager runs on remote 
host, and problem will happen. For us, only one Task Manager can register in 
this case.

 

 

 

I suggest adding configuration to skip Loopback interface check if we know 
Job/Resource Manager is running on remote host/container.

 


> Flink Task Manager skip loopback interface for resource manager registration
> 
>
> Key: FLINK-29572
> URL: https://issues.apache.org/jira/browse/FLINK-29572
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Affects Versions: 1.15.2
> Environment: Flink 1.15.2
> Kubernetes with Istio Proxy
>Reporter: Kevin Li
>Priority: Major
>
> Currently Flink Task Manager use different local interface to bind to connect 
> to Resource Manager. First one is Loopback interface. Normally if Job Manager 
> is running on remote host/container, using loopback interface to connect will 
> fail and it will pick up correct IP address.
> However, if Task Manager is running with some proxy, loopback interface can 
> connect to remote host as well. This will result 127.0.0.1 reported to 
> Resource Manager during registration, even Job Manager/Resource Manager runs 
> on remote host, and problem will happen. For us, only one Task Manager can 
> register in this case.
> I suggest adding configuration to skip Loopback interface check if we know 
> Job/Resource Manager is running on remote host/container.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29572) Flink Task Manager skip loopback interface for resource manager registration

2022-10-10 Thread Kevin Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615214#comment-17615214
 ] 

Kevin Li commented on FLINK-29572:
--

{quote}Task Manager Log:
2022-10-08 17:22:32,983 INFO  
org.apache.flink.runtime.util.LeaderRetrievalUtils   [] - Trying to 
select the network interface and address to use by connecting to the leading 
JobManager.
2022-10-08 17:22:32,984 INFO  
org.apache.flink.runtime.util.LeaderRetrievalUtils   [] - TaskManager 
will try to connect for PT10S before falling back to heuristics
2022-10-08 17:22:33,356 DEBUG org.apache.flink.runtime.net.ConnectionUtils  
   [] - Retrieved new target address 
flink-jobmanager/172.20.133.241:6123 for akka URL 
[akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*]
.
2022-10-08 17:22:33,357 DEBUG org.apache.flink.runtime.net.ConnectionUtils  
   [] - Trying to connect to [flink-jobmanager/172.20.133.241:6123] 
from local address [localhost/127.0.0.1] with timeout [100]
2022-10-08 17:22:33,361 DEBUG org.apache.flink.runtime.net.ConnectionUtils  
   [] - Using InetAddress.getLoopbackAddress() immediately for 
connecting address
2022-10-08 17:22:33,361 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner  [] - TaskManager 
will use hostname/address 'localhost' (127.0.0.1) for communication.
2022-10-08 17:22:33,416 INFO  
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils[] - Trying to 
start actor system, external address 127.0.0.1:6122, bind address 
0.0.0.0:6122.{quote}

> Flink Task Manager skip loopback interface for resource manager registration
> 
>
> Key: FLINK-29572
> URL: https://issues.apache.org/jira/browse/FLINK-29572
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Affects Versions: 1.15.2
> Environment: Flink 1.15.2
> Kubernetes with Istio Proxy
>Reporter: Kevin Li
>Priority: Major
>
> Currently Flink Task Manager use different local interface to bind to connect 
> to Resource Manager. First one is Loopback interface. Normally if Job Manager 
> is running on remote host/container, using loopback interface to connect will 
> fail and it will pick up correct IP address.
>  
> However, if Task Manager is running with some proxy, loopback interface can 
> connect to remote host as well. This will result 127.0.0.1 reported to 
> Resource Manager during registration, even Job Manager/Resource Manager runs 
> on remote host, and problem will happen. For us, only one Task Manager can 
> register in this case.
>  
>  
>  
> I suggest adding configuration to skip Loopback interface check if we know 
> Job/Resource Manager is running on remote host/container.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29572) Flink Task Manager skip loopback interface for resource manager registration

2022-10-10 Thread Kevin Li (Jira)
Kevin Li created FLINK-29572:


 Summary: Flink Task Manager skip loopback interface for resource 
manager registration
 Key: FLINK-29572
 URL: https://issues.apache.org/jira/browse/FLINK-29572
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Affects Versions: 1.15.2
 Environment: Flink 1.15.2

Kubernetes with Istio Proxy
Reporter: Kevin Li


Currently Flink Task Manager use different local interface to bind to connect 
to Resource Manager. First one is Loopback interface. Normally if Job Manager 
is running on remote host/container, using loopback interface to connect will 
fail and it will pick up correct IP address.

 

However, if Task Manager is running with some proxy, loopback interface can 
connect to remote host as well. This will result 127.0.0.1 reported to Resource 
Manager during registration, even Job Manager/Resource Manager runs on remote 
host, and problem will happen. For us, only one Task Manager can register in 
this case.

 

 

 

I suggest adding configuration to skip Loopback interface check if we know 
Job/Resource Manager is running on remote host/container.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)