Hello, I am trying to find some performance figures of spark vs various other languages for ALS based recommender system. I am using 20 million ratings movielens dataset. The test environment involves one big 30 core machine with 132 GB memory. I am using the scala version of the script provided here, http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html <http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html>
I am not an expert in spark, and I assume that varying the n while invoking spark with following flags, --master local[n], is supposed to provide ideal scaling. Initial observations didnt favour spark by some small margins, but as I said since I am not a spark expert, I would comment only after being assured that this is the most optimal way of running the ALS snippet. Could the experts please help me with the most optimal way to get the best timings out of sparks ALS example on the mentioned environment. Thanks. -- Best regards, Abhijith