Re: How to restrict disk space for spark caches on yarn?

2015-07-13 Thread Peter Rudenko
Hi Andrew, here's what i found. Maybe would be relevant for people with the same issue: 1) There's 3 types of local resources in YARN (public, private, application). More about it here: http://hortonworks.com/blog/management-of-application-dependencies-in-yarn/ 2) Spark cache is of

Re: How to restrict disk space for spark caches on yarn?

2015-07-13 Thread Sandy Ryza
To clear one thing up: the space taken up by data that Spark caches on disk is not related to YARN's local resource / application cache concept. The latter is a way that YARN provides for distributing bits to worker nodes. The former is just usage of disk by Spark, which happens to be in a local

How to restrict disk space for spark caches on yarn?

2015-07-10 Thread Peter Rudenko
Hi, i have a spark ML worklflow. It uses some persist calls. When i launch it with 1 tb dataset - it puts down all cluster, becauses it fills all disk space at /yarn/nm/usercache/root/appcache: http://i.imgur.com/qvRUrOp.png I found a yarn settings: