Re: Automated setup of a multi-node cluster for Apache Spark

2021-04-10 Thread Hariharan
> 1. Writing scripts for automated setup of a multi-node cluster for Apache Spark with Hadoop File System (HDFS). This is required since I don’t have a fixed set of machines to run my Spark experiments and hence, need an easy, quick and automated way to do the entire Spark setup. Where will you

Re: Dynamic Allocation Backlog Property in Spark on Kubernetes

2021-04-10 Thread ranju goel
Hi Attila, Thanks for your guidance of how to use dynamic allocation effectively for spark job. Now I am bit more confident to set the schedulerbacklogtimeout wisely. In your statement *"*If there is no more available new resource for Spark then the existing ones will be used* (even the min

Re: Dynamic Allocation Backlog Property in Spark on Kubernetes

2021-04-10 Thread Attila Zsolt Piros
Hi Ranju! > But if there are no extra resources available, then go for static allocation rather dynamic. Is it correct ? I think there is no such rule. If there is no more available new resource for Spark then the existing ones will be used (even the min executors is not guaranteed to be reached

Re: Dynamic Allocation Backlog Property in Spark on Kubernetes

2021-04-10 Thread ranju goel
Hi Attila, I understood what you mean that Use the extra resources if available for running spark job, using schedulerbacklogtimeout (dynamic allocation). This will speeds up the job. But if there are no extra resources available, then go for static allocation rather dynamic. Is it correct ?

Re: Tasks are skewed to one executor

2021-04-10 Thread Mich Talebzadeh
Hi, Can you provide a bit more info please? How are you running this job and what is the streaming framework (kafka, files etc)? HTH Mich view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all

Automated setup of a multi-node cluster for Apache Spark

2021-04-10 Thread Dhruv Kumar
Hello I am new to Apache Spark and am looking for some close guidance or collaboration for my Spark Project which has the following main components: 1. Writing scripts for automated setup of a multi-node cluster for Apache Spark with Hadoop File System (HDFS). This is required since I don’t

Tasks are skewed to one executor

2021-04-10 Thread András Kolbert
hi, I have a streaming job and quite often executors die (due to memory errors/ "unable to find location for shuffle etc) during the processing. I started digging and found that some of the tasks are concentrated to one executor, just as below: [image: image.png] Can this be the reason? Should I