Re: Good idea to do multi-threading in spark job?

2020-05-06 Thread Ruijing Li
Thanks for the answer Sean! On Sun, May 3, 2020 at 10:35 AM Sean Owen wrote: > Spark will by default assume each task needs 1 CPU. On an executor > with 16 cores and 16 slots, you'd schedule 16 tasks. If each is using > 4 cores, then 64 threads are trying to run. If you're CPU-bound, that >

Re: Good idea to do multi-threading in spark job?

2020-05-03 Thread Sean Owen
Spark will by default assume each task needs 1 CPU. On an executor with 16 cores and 16 slots, you'd schedule 16 tasks. If each is using 4 cores, then 64 threads are trying to run. If you're CPU-bound, that could slow things down. But to the extent some of tasks take some time blocking on I/O, it

Good idea to do multi-threading in spark job?

2020-05-03 Thread Ruijing Li
Hi all, We have a spark job (spark 2.4.4, hadoop 2.7, scala 2.11.12) where we use semaphores / parallel collections within our spark job. We definitely notice a huge speedup in our job from doing this, but were wondering if this could cause any unintended side effects? Particularly I’m worried