Re: Deploying Spark on Google Kubernetes (GKE) autopilot, preliminary findings

Gourav Sengupta Sun, 13 Feb 2022 21:46:20 -0800

Hi,
may be this is useful in case someone is testing SPARK in containers for
developing SPARK.


*From a production scale work point of view:*
But if I am in AWS, I will just use GLUE if I want to use containers for
SPARK, without massively increasing my costs for operations unnecessarily.

Also, in case I am not wrong, GCP already has SPARK running in serverless
mode.  Personally I would never create the overhead of additional costs and
issues to my clients of deploying SPARK when those solutions are already
available by Cloud vendors. Infact, that is one of the precise reasons why
people use cloud - to reduce operational costs.

Sorry, just trying to understand what is the scope of this work.


Regards,
Gourav Sengupta

On Fri, Feb 11, 2022 at 8:35 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> The equivalent of Google GKE autopilot
> <https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview> 
> in
> AWS is AWS Fargate <https://aws.amazon.com/fargate/>
>
>
> I have not used the AWS Fargate so I can only mension Google's GKE
> Autopilot.
>
>
> This is developed from the concept of containerization and microservices.
> In the standard mode of creating a GKE cluster users can customize their
> configurations based on the requirements, GKE manages the control plane and
> users manually provision and manage their node infrastructure. So you
> choose your hardware type and memory/CPU where your spark containers will
> be running and they will be shown as VM hosts in your account. In GKE
> Autopilot mode, GKE manages the nodes, pre-configures the cluster with
> adds-on for auto-scaling, auto-upgrades, maintenance, Day 2 operations and
> security hardening. So there is a lot there. You don't choose your nodes
> and their sizes. You are effectively paying for the pods you use.
>
>
> Within spark-submit, you still need to specify the number of executors,
> driver and executor memory plus cores for each driver and executor when
> doing spark-submit. The theory is that the k8s cluster will deploy suitable
> nodes and will create enough pods on those nodes. With the standard k8s
> cluster you choose your nodes and you ensure that one core on each node is
> reserved for the OS itself. Otherwise if you allocate all cores to spark
> with --conf spark.executor.cores, you will receive this error
>
>
> kubctl describe pods -n spark
>
> ...
>
> Events:
>
>   Type     Reason             Age                 From
> Message
>
>   ----     ------             ----                ----
> -------
>
>   Warning  FailedScheduling   9s (x17 over 15m)   default-scheduler   0/3
> nodes are available: 3 Insufficient cpu.
>
> So with the standard k8s you have a choice of selecting your core sizes.
> With autopilot this node selection is left to autopilot to deploy suitable
> nodes and this will be a trial and error at the start (to get the
> configuration right). You may be lucky if the history of executions are
> kept current and the same job can be repeated. However, in my experience,
> to procedure the driver pod in "running state" is expensive timewise and
> without an executor in running state, there is no chance of spark job doing
> anything
>
>
> NAME                                         READY   STATUS    RESTARTS
>  AGE
>
> randomdatabigquery-cebab77eea6de971-exec-1   0/1     Pending   0
> 31s
>
> randomdatabigquery-cebab77eea6de971-exec-2   0/1     Pending   0
> 31s
>
> randomdatabigquery-cebab77eea6de971-exec-3   0/1     Pending   0
> 31s
>
> randomdatabigquery-cebab77eea6de971-exec-4   0/1     Pending   0
> 31s
>
> randomdatabigquery-cebab77eea6de971-exec-5   0/1     Pending   0
> 31s
>
> randomdatabigquery-cebab77eea6de971-exec-6   0/1     Pending   0
> 31s
>
> sparkbq-37405a7eea6b9468-driver              1/1     Running   0
> 3m4s
>
>
> NAME                                         READY   STATUS
> RESTARTS   AGE
>
> randomdatabigquery-cebab77eea6de971-exec-6   0/1     ContainerCreating
>  0          112s
>
> sparkbq-37405a7eea6b9468-driver              1/1     Running
>  0          4m25s
>
> NAME                                         READY   STATUS    RESTARTS
>  AGE
>
> randomdatabigquery-cebab77eea6de971-exec-6   1/1     Running   0
> 114s
>
> sparkbq-37405a7eea6b9468-driver              1/1     Running   0
> 4m27s
>
> Basically I told Spak to have 6 executors but could only bring into
> running state one executor after the driver pod spinning for 4 minutes.
>
> 22/02/11 20:16:18 INFO SparkKubernetesClientFactory: Auto-configuring K8S
> client using current context from users K8S config file
>
> 22/02/11 20:16:19 INFO Utils: Using initial executors = 6, max of
> spark.dynamicAllocation.initialExecutors,
> spark.dynamicAllocation.minExecutors and spark.executor.instances
>
> 22/02/11 20:16:19 INFO ExecutorPodsAllocator: Going to request 3 executors
> from Kubernetes for ResourceProfile Id: 0, target: 6 running: 0.
>
> 22/02/11 20:16:20 INFO BasicExecutorFeatureStep: Decommissioning not
> enabled, skipping shutdown script
>
> 22/02/11 20:16:20 INFO Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
>
> 22/02/11 20:16:20 INFO NettyBlockTransferService: Server created on
> sparkbq-37405a7eea6b9468-driver-svc.spark.svc:7079
>
> 22/02/11 20:16:20 INFO BlockManager: Using
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
> policy
>
> 22/02/11 20:16:20 INFO BlockManagerMaster: Registering BlockManager
> BlockManagerId(driver, sparkbq-37405a7eea6b9468-driver-svc.spark.svc, 7079,
> None)
>
> 22/02/11 20:16:20 INFO BlockManagerMasterEndpoint: Registering block
> manager sparkbq-37405a7eea6b9468-driver-svc.spark.svc:7079 with 366.3 MiB
> RAM, BlockManagerId(driver,
> sparkbq-37405a7eea6b9468-driver-svc.spark.svc, 7079, None)
>
> 22/02/11 20:16:20 INFO BlockManagerMaster: Registered BlockManager
> BlockManagerId(driver, sparkbq-37405a7eea6b9468-driver-svc.spark.svc, 7079,
> None)
>
> 22/02/11 20:16:20 INFO BlockManager: Initialized BlockManager:
> BlockManagerId(driver, sparkbq-37405a7eea6b9468-driver-svc.spark.svc, 7079,
> None)
>
> 22/02/11 20:16:20 INFO Utils: Using initial executors = 6, max of
> spark.dynamicAllocation.initialExecutors,
> spark.dynamicAllocation.minExecutors and spark.executor.instances
>
> 22/02/11 20:16:20 WARN ExecutorAllocationManager: Dynamic allocation
> without a shuffle service is an experimental feature.
>
> 22/02/11 20:16:20 INFO BasicExecutorFeatureStep: Decommissioning not
> enabled, skipping shutdown script
>
> 22/02/11 20:16:20 INFO BasicExecutorFeatureStep: Decommissioning not
> enabled, skipping shutdown script
>
> 22/02/11 20:16:20 INFO ExecutorPodsAllocator: Going to request 3 executors
> from Kubernetes for ResourceProfile Id: 0, target: 6 running: 3.
>
> 22/02/11 20:16:20 INFO BasicExecutorFeatureStep: Decommissioning not
> enabled, skipping shutdown script
>
> 22/02/11 20:16:20 INFO BasicExecutorFeatureStep: Decommissioning not
> enabled, skipping shutdown script
>
> 22/02/11 20:16:20 INFO BasicExecutorFeatureStep: Decommissioning not
> enabled, skipping shutdown script
>
> 22/02/11 20:16:49 INFO KubernetesClusterSchedulerBackend: SchedulerBackend
> is ready for scheduling beginning after waiting
> maxRegisteredResourcesWaitingTime: 30000000000(ns)
>
> 22/02/11 20:16:49 INFO SharedState: Setting hive.metastore.warehouse.dir
> ('null') to the value of spark.sql.warehouse.dir
> ('file:/opt/spark/work-dir/spark-warehouse').
>
> 22/02/11 20:16:49 INFO SharedState: Warehouse path is
> 'file:/opt/spark/work-dir/spark-warehouse'.
>
> OK there is a lot to digest here and I appreciate feedback from other
> members that have experimented with GKE autopilot or AWS Fargate or are
> familiar with k8s.
>
> Thanks
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Deploying Spark on Google Kubernetes (GKE) autopilot, preliminary findings

Reply via email to