[ 
https://issues.apache.org/jira/browse/SPARK-42411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Puneet updated SPARK-42411:
---------------------------
    Description: 
h3. Support for Strict MTLS

In strict MTLS Peer Authentication Istio requires each pod to be associated 
with a service identity (as this allows listeners to use the correct cert and 
chain). Without the service identity communication goes through passthrough 
cluster which is not permitted in strict mode. Community is still investigating 
communication through IPs with strict MTLS 
[https://github.com/istio/istio/issues/37431#issuecomment-1412831780]. Today 
Spark backend creates a service record for driver however executor pods 
register with driver using their Pod IPs. In this model therefore, TLS 
handshake would fail between driver and executor and also between executors. As 
part of this Jira we want to similarly add service records for the executor 
pods as well. This can be achieved by adding a ExecutorServiceFeatureStep 
similar to existing DriverServiceFeatureStep
h3. Allowing binding to all IPs

Before Istio 1.10 the istio-proxy sidecar was forwarding outside traffic to 
localhost of the pod. Thus if the application container is binding only to Pod 
IP the traffic would not be forwarded to it. This was addressed in 1.10 
[https://istio.io/latest/blog/2021/upcoming-networking-changes]. However the 
old behavior is still accessible through disabling the feature flag 
PILOT_ENABLE_INBOUND_PASSTHROUGH. Request to remove it has had some push back 
[https://github.com/istio/istio/issues/37642]. In current implementation Spark 
K8s backend does not allow to pass bind address for driver 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala#L35]
 however as part of this Jira we want to allow passing of bind address even in 
Kubernetes mode so long as the bind address is 0.0.0.0. This lets user choose 
the behavior depending on the state of PILOT_ENABLE_INBOUND_PASSTHROUGH in her 
Istio cluster.
h3. Better support for istio-proxy sidecar lifecycle management

In istio-enabled cluster istio-proxy sidecars would be auto-injected to 
driver/executor pods. If the application is ephemeral then driver and executor 
containers would exit, however istio-proxy container would continue to run. 
This causes driver/executor pods to enter NotReady state. As part of this jira 
we want ability to run a post stop cleanup after driver/executor container is 
completed. Similarly we also want to add support for a pre start up script, 
which can ensure for example that istio-sidecar is up before executor/driver 
container gets started.

  was:
h3. Support for Strict MTLS


In strict MTLS Peer Authentication Istio requires each pod to be associated 
with a service identity (as this allows listeners to use the correct cert and 
chain). Without the service identity communication goes through passthrough 
cluster which is not permitted in strict mode. Community is still investigating 
communication through IPs with strict MTLS 
https://github.com/istio/istio/issues/37431#issuecomment-1412831780. Today 
Spark backend creates a service record for driver however executor pods 
register with pod ip with driver. In this model therefore, TLS handshake would 
fail between driver and executor and also between executors. As part of this 
jira we want to similarly add service records for the executor pods as well. 
This can be achieved by adding a ExecutorServiceFeatureStep similar to existing 
DriverServiceFeatureStep
h3. Allowing binding to all IPs

Before Istio 1.10 the istio-proxy sidecar was forwarding outside traffic to 
localhost of the pod. Thus is the application container is binding only to Pod 
IP the traffic would not be forwarded to it. This was addressed in 1.10 
https://istio.io/latest/blog/2021/upcoming-networking-changes. However the old 
behavior is still accessible through disabling the feature flag 
PILOT_ENABLE_INBOUND_PASSTHROUGH. Request to remove it has had some push back 
https://github.com/istio/istio/issues/37642. In current implementation Spark 
K8s backend does not allow to pass bind address for driver 
https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala#L35
 however as part of this jira we want to allow passing of bind address even in 
Kubernetes mode so long as the bind address is 0.0.0.0. This lets user choose 
the behavior dependening on state of PILOT_ENABLE_INBOUND_PASSTHROUGH in her 
Istio cluster.
h3. Better support for istio-proxy sidecar lifecycle management

In istio-enabled cluster istio-proxy sidecars would be auto-injected to 
driver/executor pods. If the application is ephemeral then driver and executor 
containers would exit, however istio-proxy container would continue to run. 
This causes driver/executor pods to enter NotReady state. As part of this jira 
we want ability to run a post stop cleanup after driver/executor container is 
completed. Similarly we also want to add support for a pre start up script, 
which can ensure for example that istio-sidecar is up before executor/driver 
container gets started.


> Better support for Istio service mesh while running Spark on Kubernetes
> -----------------------------------------------------------------------
>
>                 Key: SPARK-42411
>                 URL: https://issues.apache.org/jira/browse/SPARK-42411
>             Project: Spark
>          Issue Type: New Feature
>          Components: Kubernetes
>    Affects Versions: 3.2.3
>            Reporter: Puneet
>            Priority: Major
>
> h3. Support for Strict MTLS
> In strict MTLS Peer Authentication Istio requires each pod to be associated 
> with a service identity (as this allows listeners to use the correct cert and 
> chain). Without the service identity communication goes through passthrough 
> cluster which is not permitted in strict mode. Community is still 
> investigating communication through IPs with strict MTLS 
> [https://github.com/istio/istio/issues/37431#issuecomment-1412831780]. Today 
> Spark backend creates a service record for driver however executor pods 
> register with driver using their Pod IPs. In this model therefore, TLS 
> handshake would fail between driver and executor and also between executors. 
> As part of this Jira we want to similarly add service records for the 
> executor pods as well. This can be achieved by adding a 
> ExecutorServiceFeatureStep similar to existing DriverServiceFeatureStep
> h3. Allowing binding to all IPs
> Before Istio 1.10 the istio-proxy sidecar was forwarding outside traffic to 
> localhost of the pod. Thus if the application container is binding only to 
> Pod IP the traffic would not be forwarded to it. This was addressed in 1.10 
> [https://istio.io/latest/blog/2021/upcoming-networking-changes]. However the 
> old behavior is still accessible through disabling the feature flag 
> PILOT_ENABLE_INBOUND_PASSTHROUGH. Request to remove it has had some push back 
> [https://github.com/istio/istio/issues/37642]. In current implementation 
> Spark K8s backend does not allow to pass bind address for driver 
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala#L35]
>  however as part of this Jira we want to allow passing of bind address even 
> in Kubernetes mode so long as the bind address is 0.0.0.0. This lets user 
> choose the behavior depending on the state of 
> PILOT_ENABLE_INBOUND_PASSTHROUGH in her Istio cluster.
> h3. Better support for istio-proxy sidecar lifecycle management
> In istio-enabled cluster istio-proxy sidecars would be auto-injected to 
> driver/executor pods. If the application is ephemeral then driver and 
> executor containers would exit, however istio-proxy container would continue 
> to run. This causes driver/executor pods to enter NotReady state. As part of 
> this jira we want ability to run a post stop cleanup after driver/executor 
> container is completed. Similarly we also want to add support for a pre start 
> up script, which can ensure for example that istio-sidecar is up before 
> executor/driver container gets started.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to