Re: Standalone Scheduler VS YARN Performance

Denny Lee Tue, 24 Mar 2015 07:32:33 -0700

By any chance does this thread address look similar:
http://apache-spark-developers-list.1001551.n3.nabble.com/Lost-executor-on-YARN-ALS-iterations-td7916.html
?




On Tue, Mar 24, 2015 at 5:23 AM Harut Martirosyan <
harut.martiros...@gmail.com> wrote:

> What is performance overhead caused by YARN, or what configurations are
> being changed when the app is ran through YARN?
>
> The following example:
>
> sqlContext.sql("SELECT dayStamp(date),
> count(distinct deviceId) AS c
> FROM full
> GROUP BY dayStamp(date)
> ORDER BY c
> DESC LIMIT 10")
> .collect()
>
> runs on shell when we use standalone scheduler:
> ./spark-shell --master sparkmaster:7077 --executor-memory 20g
> --executor-cores 10  --driver-memory 10g --num-executors 8
>
> and fails due to losing an executor, when we run it through YARN.
> ./spark-shell --master yarn-client --executor-memory 20g --executor-cores
> 10  --driver-memory 10g --num-executors 8
>
> There are no evident logs, just messages that executors are being lost,
> and connection refused errors, (apparently due to executor failures)
> The cluster is the same, 8 nodes, 64Gb RAM each.
> Format is parquet.
>
> --
> RGRDZ Harut
>

Re: Standalone Scheduler VS YARN Performance

Reply via email to