Hi all, Trying to build recommendation system using Spark MLLib's ALS.
Currently, we're trying to pre-build recommendations for all users on daily basis. We're using simple implicit feedbacks and ALS. The problem is, we have 20M users and 30M products, and to call the main predict() method, we need to have the cartesian join for users and products, which is too huge, and it may take days to generate only the join. Is there a way to avoid cartesian join to make the process faster? Currently we have 8 nodes with 64Gb of RAM, I think it should be enough for the data. val users: RDD[Int] = ??? // RDD with 20M userIds val products: RDD[Int] = ??? // RDD with 30M productIds val ratings : RDD[Rating] = ??? // RDD with all user->product feedbacks val model = new ALS().setRank(10).setIterations(10) .setLambda(0.0001).setImplicitPrefs(true) .setAlpha(40).run(ratings) val usersProducts = users.cartesian(products) val recommendations = model.predict(usersProducts) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-ALS-recommendations-approach-tp22116.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org