Re: Spark Kubernetes Architecture: Deployments vs Pods that create Pods

Yinan Li Tue, 29 Jan 2019 13:49:23 -0800

Hi Wilson,

The behavior of a Deployment doesn't fit with the way Spark executor pods
are run and managed. For example, executor pods are created and deleted per
the requests from the driver dynamically and normally they run to
completion. A Deployment assumes uniformity and statelessness of the set of
Pods it manages, which is not necessarily the case for Spark executors. For
example, executor Pods have unique executor IDs. Dynamic resource
allocation doesn't play well with a Deployment as scaling or shrinking the
number of executor Pods requires a rolling update with a Deployment, which
means restarting all the executor Pods. In the Kubernetes mode, the driver
is effectively a custom controller of executor Pods that adds or deletes
Pods per requests from the driver, and watches the status of the Pods.


The way Flink on Kubernetes works, as you said, is basically running the
Flink job/task managers using Deployments. A equivalent is running a
standalone Spark cluster on top of Kubernetes. If you want auto-restart for
Spark streaming jobs, I would suggest you take a look at the K8S Spark
Operator <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator>.

On Tue, Jan 29, 2019 at 5:53 AM WILSON Frank <
frank.wil...@uk.thalesgroup.com> wrote:

> Hi,
>
>
>
> I’ve been playing around with Spark Kubernetes deployments over the past
> week and I’m curious to know why Spark deploys as a driver pod that creates
> more worker pods.
>
>
>
> I’ve read that it’s normal to use Kubernetes Deployments to create a
> distributed service, so I am wondering why Spark just creates Pods. I
> suppose the driver program
>
> is ‘the odd one out’ so it doesn’t belong in a Deployment or ReplicaSet,
> but maybe the workers could be Deployment? Is this something to do with
> data locality?
>
>
>
> I have tried Streaming pipelines on Kubernetes yet, are these also Pods
> that create Pods rather than Deployments? It seems more important for a
> streaming pipeline to be ‘durable’[1] as the Kubernetes documentation might
> say.
>
>
>
> I ask this question partly because the Kubernetes deployment of Spark is
> still experimental and I am wondering whether this aspect of the deployment
> might change.
>
>
>
> I had a look at the Flink[2] documentation and it does seem to use
> Deployments however these seem to be a lightweight job/task manager that
> accepts Flink jobs. It sounds actually like running a lightweight version
> YARN inside containers on Kubernetes.
>
>
>
>
>
> Thanks,
>
>
>
>
>
> Frank
>
>
>
> [1]
> https://kubernetes.io/docs/concepts/workloads/pods/pod/#durability-of-pods-or-lack-thereof
>
> [2]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html
>

Re: Spark Kubernetes Architecture: Deployments vs Pods that create Pods

Reply via email to