[ https://issues.apache.org/jira/browse/SPARK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036565#comment-14036565 ]
Xiangrui Meng commented on SPARK-2121: -------------------------------------- Had an offline discussion with [~coderxiang]. The root cause was that the containers were kill by YARN for some reason. So the RDDs become partially cached even it was marked as MEMORY_AND_DISK. Things become messy when Spark tries to recover the RDDs while new containers are getting killed by YARN. [~coderxiang] is looking at the real cause of container failures. > Not fully cached when there is enough memory > -------------------------------------------- > > Key: SPARK-2121 > URL: https://issues.apache.org/jira/browse/SPARK-2121 > Project: Spark > Issue Type: Bug > Components: Block Manager, MLlib, Spark Core > Affects Versions: 1.0.0 > Reporter: Shuo Xiang > > While factorizing a large matrix using the latest Alternating Least Squares > (ALS) in mllib, from sparkUI it looks like that spark fail to cache all the > partitions of some RDD while memory is sufficient. Please find [this > post](http://apache-spark-user-list.1001560.n3.nabble.com/Not-fully-cached-when-there-is-enough-memory-tt7429.html) > for screenshots. This may cause subsequent job failures while executing > `userOut.Count()` or `productsOut.count`. -- This message was sent by Atlassian JIRA (v6.2#6252)