Thanks for the answer Sean!
On Sun, May 3, 2020 at 10:35 AM Sean Owen wrote:
> Spark will by default assume each task needs 1 CPU. On an executor
> with 16 cores and 16 slots, you'd schedule 16 tasks. If each is using
> 4 cores, then 64 threads are trying to run. If you're CPU-bound, that
>
Spark will by default assume each task needs 1 CPU. On an executor
with 16 cores and 16 slots, you'd schedule 16 tasks. If each is using
4 cores, then 64 threads are trying to run. If you're CPU-bound, that
could slow things down. But to the extent some of tasks take some time
blocking on I/O, it
Hi all,
We have a spark job (spark 2.4.4, hadoop 2.7, scala 2.11.12) where we use
semaphores / parallel collections within our spark job. We definitely
notice a huge speedup in our job from doing this, but were wondering if
this could cause any unintended side effects? Particularly I’m worried