and Spark settings are as follows.
1)six machines, physical memory is 32GB of each machine.
2)Spark settings
- spark.executor.memory=16g
- spark.closure.serializer=org.apache.spark.serializer.KryoSerializer
- spark.rdd.compress=true
- spark.shuffle.memoryFraction=0.4
Thanks,
Yuichiro Sakamoto
I got answer from mail posted to ML.
--- Summary ---
cache() is lazy, so you can use `RDD.count()` explicitly to load into
memory.
---
And I tried, two RDDs were cached and the speed became faster.
Thank you.
--
View this message in context:
.userFeatures.getStorageLevel()`.
I printed the return value of getStorageLevel() userFeatures and
productFeatures,
both were Memory Deserialized 1x Replicated .
I think, two variables were configured to cache,
but didn't cache at that time. (delayed ?)
Thanks,
Yuichiro Sakamoto
--
View
Hello.
I create program, collaborative filtering using Spark,
but I have trouble with calculating speed.
I want to implement recommendation program using ALS (MLlib),
which is another process from Spark.
But access speed of MatrixFactorizationModel object on HDFS is slow,
so I want to cache it,