Moshe Elisha created FLINK-28171:
------------------------------------

             Summary: Adjust Job and Task manager port definitions to work with 
Istio+mTLS
                 Key: FLINK-28171
                 URL: https://issues.apache.org/jira/browse/FLINK-28171
             Project: Flink
          Issue Type: Improvement
          Components: Deployment / Kubernetes
    Affects Versions: 1.14.4
         Environment: flink-kubernetes-operator 1.0.0

Flink 1.14-java11

Kubernetes v1.19.5

Istio 1.7.6
            Reporter: Moshe Elisha


Hello,

 

We are launching Flink deployments using the [Flink Kubernetes 
Operator|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/]
 on a Kubernetes cluster with Istio and mTLS enabled.

 

We found that the TaskManager is unable to communicate with the JobManager on 
the jobmanager-rpc port:

 

{{2022-06-15 15:25:40,508 WARN  akka.remote.ReliableDeliverySupervisor          
             [] - Association with remote system 
[akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123]
 has failed, address is now gated for [50] ms. Reason: [Association failed with 
[akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123]]
 Caused by: [The remote system explicitly disassociated (reason unknown).]}}

 

The reason for the issue is that the JobManager service port definitions are 
not following the Istio guidelines 
[https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/]
 (see example below).

 

There was also an email discussion around this topic in the users mailing group 
under the subject "Flink Kubernetes Operator with K8S + Istio + mTLS - port 
definitions".

With the help of the community, we were able to work around the issue but it 
was very hard and forced us to skip Istio proxy which is not ideal.

 

We would like you to consider changing the default port definitions, either
 # Rename the ports – I understand it is Istio specific guideline but maybe it 
is better to at least be aligned with one (popular) vendor guideline instead of 
none at all.
 # Add the “appProtocol” property[1] that is not specific to any vendor but 
requires Kubernetes >= 1.19 where it was introduced as beta and moved to stable 
in >= 1.20. The option to add appProtocol property was added only in 
[https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0] with 
[#3570|https://github.com/fabric8io/kubernetes-client/issues/3570].
 # Or allow a way to override the defaults.

 

[https://kubernetes.io/docs/concepts/services-networking/_print/#application-protocol]

 

 

{{# k get service inference-results-to-analytics-engine -o yaml}}

{{apiVersion: v1}}

{{kind: Service}}

{{...}}

{{spec:}}

{{  clusterIP: None}}

{{  ports:}}

{{  - name: jobmanager-rpc *# should start with “tcp-“ or add "appProtocol" 
property*}}

{{    port: 6123}}

{{    protocol: TCP}}

{{    targetPort: 6123}}

{{  - name: blobserver *# should start with "tcp-" or add "appProtocol" 
property*}}

{{    port: 6124}}

{{    protocol: TCP}}

{{    targetPort: 6124}}

{{...}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to