Not a big expert on Spark, but I’m not really understand how you are going to 
compare and what? Reading-writing to and from Hdfs? How does it related to yarn 
and k8s… these are recourse managers (YARN yet another resource manager) : what 
and how much to allocate and when… (cpu, ram).
Local Disk spilling? Depends on disk throughput…
So what you are going to measure?




Best regards

> On 5 Jul 2021, at 20:43, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> 
> 
> 
> I was curious to know if there are benchmarks around on comparison between 
> Spark on Yarn compared to Kubernetes.
> 
> This question arose because traditionally in Google Cloud we have been using 
> Spark on Dataproc clusters. Dataproc  provides Spark, Hadoop plus others 
> (optional install) for data and analytic processing. It is PaaS
> 
> Now they have GKE clusters as well and also introduced Apache Spark with 
> Cloud Dataproc on Kubernetes which allows one to submit Spark jobs to k8s 
> using Dataproc stub as a platform to submit the job as below from cloud 
> console or local
> 
> gcloud dataproc jobs submit pyspark --cluster="dataproc-for-gke" 
> gs://bucket/testme.py --region="europe-west2" --py-files gs://bucket/DSBQ.zip
> Job [e5fc19b62cf744f0b13f3e6d9cc66c19] submitted.
> Waiting for job output...
> 
> At the moment it is a struggle to see what merits using k8s instead of 
> dataproc bar notebooks etc. Actually there is not much literature around with 
> PySpark on k8s.
> 
> For me Spark on bare metal is the preferred option as I cannot see how one 
> can pigeon hole Spark into a container and make it performant but I may be 
> totally wrong. 
> 
> Thanks
> 
>    view my Linkedin profile
> 
>  
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  

Reply via email to