How to avoid being killed by YARN node manager ?

2015-03-24 Thread Yuichiro Sakamoto
and Spark settings are as follows. 1)six machines, physical memory is 32GB of each machine. 2)Spark settings - spark.executor.memory=16g - spark.closure.serializer=org.apache.spark.serializer.KryoSerializer - spark.rdd.compress=true - spark.shuffle.memoryFraction=0.4 Thanks, Yuichiro Sakamoto

Re: Can't cache RDD of collaborative filtering on MLlib

2015-03-12 Thread Yuichiro Sakamoto
I got answer from mail posted to ML. --- Summary --- cache() is lazy, so you can use `RDD.count()` explicitly to load into memory. --- And I tried, two RDDs were cached and the speed became faster. Thank you. -- View this message in context:

Re: Can't cache RDD of collaborative filtering on MLlib

2015-03-10 Thread Yuichiro Sakamoto
.userFeatures.getStorageLevel()`. I printed the return value of getStorageLevel() userFeatures and productFeatures, both were Memory Deserialized 1x Replicated . I think, two variables were configured to cache, but didn't cache at that time. (delayed ?) Thanks, Yuichiro Sakamoto -- View

Can't cache RDD of collaborative filtering on MLlib

2015-03-08 Thread Yuichiro Sakamoto
Hello. I create program, collaborative filtering using Spark, but I have trouble with calculating speed. I want to implement recommendation program using ALS (MLlib), which is another process from Spark. But access speed of MatrixFactorizationModel object on HDFS is slow, so I want to cache it,