model.recommendProducts can only be called from the master then ? I have a set of 20% users on whom I am performing the test...the 20% users are in a RDD...if I have to collect them all to master node and then call model.recommendProducts, that's a issue...
Any idea how to optimize this so that we can calculate MAP statistics on large samples of data ? On Thu, Nov 6, 2014 at 4:41 PM, Xiangrui Meng <men...@gmail.com> wrote: > ALS model contains RDDs. So you cannot put `model.recommendProducts` > inside a RDD closure `userProductsRDD.map`. -Xiangrui > > On Thu, Nov 6, 2014 at 4:39 PM, Debasish Das <debasish.da...@gmail.com> > wrote: > > I reproduced the problem in mllib tests ALSSuite.scala using the > following > > functions: > > > > val arrayPredict = userProductsRDD.map{case(user,product) => > > > > val recommendedProducts = model.recommendProducts(user, > products) > > > > val productScore = recommendedProducts.find{x=>x.product == > > product} > > > > require(productScore != None) > > > > productScore.get > > > > }.collect > > > > arrayPredict.foreach { elem => > > > > if (allRatings.get(elem.user, elem.product) != elem.rating) > > > > fail("Prediction APIs don't match") > > > > } > > > > If the usage of model.recommendProducts is correct, the test fails with > the > > same error I sent before... > > > > org.apache.spark.SparkException: Job aborted due to stage failure: Task > 0 in > > stage 316.0 failed 1 times, most recent failure: Lost task 0.0 in stage > > 316.0 (TID 79, localhost): scala.MatchError: null > > > > org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:825) > > > org.apache.spark.mllib.recommendation.MatrixFactorizationModel.recommendProducts(MatrixFactorizationModel.scala:81) > > > > It is a blocker for me and I am debugging it. I will open up a JIRA if > this > > is indeed a bug... > > > > Do I have to cache the models to make userFeatures.lookup(user).head to > work > > ? > > > > > > On Mon, Nov 3, 2014 at 9:24 PM, Xiangrui Meng <men...@gmail.com> wrote: > >> > >> Was "user" presented in training? We can put a check there and return > >> NaN if the user is not included in the model. -Xiangrui > >> > >> On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das <debasish.da...@gmail.com> > >> wrote: > >> > Hi, > >> > > >> > I am testing MatrixFactorizationModel.predict(user: Int, product: Int) > >> > but > >> > the code fails on userFeatures.lookup(user).head > >> > > >> > In computeRmse MatrixFactorizationModel.predict(RDD[(Int, Int)]) has > >> > been > >> > called and in all the test-cases that API has been used... > >> > > >> > I can perhaps refactor my code to do the same but I was wondering > >> > whether > >> > people test the lookup(user) version of the code.. > >> > > >> > Do I need to cache the model to make it work ? I think right now > default > >> > is > >> > STORAGE_AND_DISK... > >> > > >> > Thanks. > >> > Deb > > > > >