Hello.

We use ALS(Collaborative filtering) of Spark MLlib on YARN.
Spark version is 1.2.0 included CDH 5.3.1.

1,000,000,000 records(5,000,000 users data and 5,000,000 items data) are
used for machine learning with ALS.
These large quantities of data increases virtual memory usage, 
node manager of YARN kills Spark worker process.
Even though Spark run again after killing process, Spark worker process is
killed again.
As a result, the whole Spark processes are terminated.

# Spark worker process is killed, it seems that virtual memory usage
increased by 
# 'Shuffle' or 'Disk writing' gets over the threshold of YARN.

To avoid such a case from occurring, we use the method that
'yarn.nodemanager.vmem-check-enabled' is false, then exit successfully.
But it does not seem to have an appropriate way.
If you know, please let me know about tuning method of Spark.

The conditions of machines and Spark settings are as follows.
1)six machines, physical memory is 32GB of each machine.
2)Spark settings
- spark.executor.memory=16g
- spark.closure.serializer=org.apache.spark.serializer.KryoSerializer
- spark.rdd.compress=true
- spark.shuffle.memoryFraction=0.4

Thanks,
Yuichiro Sakamoto



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-being-killed-by-YARN-node-manager-tp22199.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to