Thank you Adrian,

The dataset is indeed skewed. My concern was that some executors do not
participate in computation at all. I understand that executors finish tasks
sequentially. Therefore, using more executors allow for better parallelism.

I managed to force all executors to participate by increasing number of
partitions. My guess is, the scheduler preferred to reduce number of
machines participating in the computation to decrease network overhead.

Do you think my analysis is correct? How should one decide on number of
partitions? Does it depend on the workload or dataset or both ?

Thanks,
-Khaled


On Wed, Nov 4, 2015 at 7:21 AM, Adrian Tanase <atan...@adobe.com> wrote:

> If some of the operations required involve shuffling and partitioning, it
> might mean that the data set is skewed to specific partitions which will
> create hot spotting on certain executors.
>
> -adrian
>
> From: Khaled Ammar
> Date: Tuesday, November 3, 2015 at 11:43 PM
> To: "user@spark.apache.org"
> Subject: Why some executors are lazy?
>
> Hi,
>
> I'm using the most recent Spark version on a standalone setup of 16+1
> machines.
>
> While running GraphX workloads, I found that some executors are lazy? They
> *rarely* participate in computation. This causes some other executors to do
> their work. This behavior is consistent in all iterations and even in the
> data loading step. Only two specific executors do not participate in most
> computations.
>
>
> Does any one know how to fix that?
>
>
> *More details:*
> Each machine has 4 cores. I set number of partitions to be 3*16. Each
> executor was supposed to do 3 tasks, but few of them end up working on 4
> task instead, which causes delay in computation.
>
>
>
> --
> Thanks,
> -Khaled
>



-- 
Thanks,
-Khaled

Reply via email to