Re: Tuning spark job to make count faster.

2021-04-06 Thread Sean Owen
Hard to say without a lot more info, but 76.5K tasks is very large. How big are the tasks / how long do they take? if very short, you should repartition down. Do you end up with 800 executors? if so why 2 per machine? that generally is a loss at this scale of worker. I'm confused because you have

Tuning spark job to make count faster.

2021-04-06 Thread Krishna Chakka
Hi, I am working on a spark job. It takes 10 mins for the job just for the count() function. Question is How can I make it faster ? From the above image, what I understood is that there 4001 tasks are running in parallel. Total tasks are 76,553 . Here are the parameters that I am using for