[ 
https://issues.apache.org/jira/browse/SPARK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036565#comment-14036565
 ] 

Xiangrui Meng commented on SPARK-2121:
--------------------------------------

Had an offline discussion with [~coderxiang]. The root cause was that the 
containers were kill by YARN for some reason. So the RDDs become partially 
cached even it was marked as MEMORY_AND_DISK. Things become messy when Spark 
tries to recover the RDDs while new containers are getting killed by YARN. 
[~coderxiang] is looking at the real cause of container failures.

> Not fully cached when there is enough memory
> --------------------------------------------
>
>                 Key: SPARK-2121
>                 URL: https://issues.apache.org/jira/browse/SPARK-2121
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, MLlib, Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Shuo Xiang
>
> While factorizing a large matrix using the latest Alternating Least Squares 
> (ALS) in mllib, from sparkUI it looks like that spark fail to cache all the 
> partitions of some RDD while memory is sufficient. Please find [this 
> post](http://apache-spark-user-list.1001560.n3.nabble.com/Not-fully-cached-when-there-is-enough-memory-tt7429.html)
>  for screenshots. This may cause subsequent job failures while executing 
> `userOut.Count()` or `productsOut.count`.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to