Hi, I've been playing around with Spark Kubernetes deployments over the past week and I'm curious to know why Spark deploys as a driver pod that creates more worker pods.
I've read that it's normal to use Kubernetes Deployments to create a distributed service, so I am wondering why Spark just creates Pods. I suppose the driver program is 'the odd one out' so it doesn't belong in a Deployment or ReplicaSet, but maybe the workers could be Deployment? Is this something to do with data locality? I have tried Streaming pipelines on Kubernetes yet, are these also Pods that create Pods rather than Deployments? It seems more important for a streaming pipeline to be 'durable'[1] as the Kubernetes documentation might say. I ask this question partly because the Kubernetes deployment of Spark is still experimental and I am wondering whether this aspect of the deployment might change. I had a look at the Flink[2] documentation and it does seem to use Deployments however these seem to be a lightweight job/task manager that accepts Flink jobs. It sounds actually like running a lightweight version YARN inside containers on Kubernetes. Thanks, Frank [1] https://kubernetes.io/docs/concepts/workloads/pods/pod/#durability-of-pods-or-lack-thereof [2] https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html