Hi Andrew, here's what i found. Maybe would be relevant for people with
the same issue:
1) There's 3 types of local resources in YARN (public, private,
application). More about it here:
http://hortonworks.com/blog/management-of-application-dependencies-in-yarn/
2) Spark cache is of
To clear one thing up: the space taken up by data that Spark caches on disk
is not related to YARN's local resource / application cache concept.
The latter is a way that YARN provides for distributing bits to worker
nodes. The former is just usage of disk by Spark, which happens to be in a
local
Hi, i have a spark ML worklflow. It uses some persist calls. When i
launch it with 1 tb dataset - it puts down all cluster, becauses it
fills all disk space at /yarn/nm/usercache/root/appcache:
http://i.imgur.com/qvRUrOp.png
I found a yarn settings: