Hard to say without a lot more info, but 76.5K tasks is very large. How big
are the tasks / how long do they take? if very short, you should
repartition down.
Do you end up with 800 executors? if so why 2 per machine? that generally
is a loss at this scale of worker. I'm confused because you have
Hi,
I am working on a spark job. It takes 10 mins for the job just for the count()
function. Question is How can I make it faster ?
From the above image, what I understood is that there 4001 tasks are running in
parallel. Total tasks are 76,553 .
Here are the parameters that I am using for