Standalone Scheduler VS YARN Performance

2015-03-24 Thread Harut Martirosyan
What is performance overhead caused by YARN, or what configurations are
being changed when the app is ran through YARN?

The following example:

sqlContext.sql(SELECT dayStamp(date),
count(distinct deviceId) AS c
FROM full
GROUP BY dayStamp(date)
ORDER BY c
DESC LIMIT 10)
.collect()

runs on shell when we use standalone scheduler:
./spark-shell --master sparkmaster:7077 --executor-memory 20g
--executor-cores 10  --driver-memory 10g --num-executors 8

and fails due to losing an executor, when we run it through YARN.
./spark-shell --master yarn-client --executor-memory 20g --executor-cores
10  --driver-memory 10g --num-executors 8

There are no evident logs, just messages that executors are being lost, and
connection refused errors, (apparently due to executor failures)
The cluster is the same, 8 nodes, 64Gb RAM each.
Format is parquet.

-- 
RGRDZ Harut


Re: Standalone Scheduler VS YARN Performance

2015-03-24 Thread Denny Lee
By any chance does this thread address look similar:
http://apache-spark-developers-list.1001551.n3.nabble.com/Lost-executor-on-YARN-ALS-iterations-td7916.html
?



On Tue, Mar 24, 2015 at 5:23 AM Harut Martirosyan 
harut.martiros...@gmail.com wrote:

 What is performance overhead caused by YARN, or what configurations are
 being changed when the app is ran through YARN?

 The following example:

 sqlContext.sql(SELECT dayStamp(date),
 count(distinct deviceId) AS c
 FROM full
 GROUP BY dayStamp(date)
 ORDER BY c
 DESC LIMIT 10)
 .collect()

 runs on shell when we use standalone scheduler:
 ./spark-shell --master sparkmaster:7077 --executor-memory 20g
 --executor-cores 10  --driver-memory 10g --num-executors 8

 and fails due to losing an executor, when we run it through YARN.
 ./spark-shell --master yarn-client --executor-memory 20g --executor-cores
 10  --driver-memory 10g --num-executors 8

 There are no evident logs, just messages that executors are being lost,
 and connection refused errors, (apparently due to executor failures)
 The cluster is the same, 8 nodes, 64Gb RAM each.
 Format is parquet.

 --
 RGRDZ Harut