I reproduced the problem in mllib tests ALSSuite.scala using the following functions:
val arrayPredict = userProductsRDD.map{case(user,product) => val recommendedProducts = model.recommendProducts(user, products) val productScore = recommendedProducts.find{x=>x.product == product } require(productScore != None) productScore.get }.collect arrayPredict.foreach { elem => if (allRatings.get(elem.user, elem.product) != elem.rating) fail("Prediction APIs don't match") } If the usage of model.recommendProducts is correct, the test fails with the same error I sent before... org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 316.0 failed 1 times, most recent failure: Lost task 0.0 in stage 316.0 (TID 79, localhost): scala.MatchError: null org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:825) org.apache.spark.mllib.recommendation.MatrixFactorizationModel.recommendProducts(MatrixFactorizationModel.scala:81) It is a blocker for me and I am debugging it. I will open up a JIRA if this is indeed a bug... Do I have to cache the models to make userFeatures.lookup(user).head to work ? On Mon, Nov 3, 2014 at 9:24 PM, Xiangrui Meng <men...@gmail.com> wrote: > Was "user" presented in training? We can put a check there and return > NaN if the user is not included in the model. -Xiangrui > > On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das <debasish.da...@gmail.com> > wrote: > > Hi, > > > > I am testing MatrixFactorizationModel.predict(user: Int, product: Int) > but > > the code fails on userFeatures.lookup(user).head > > > > In computeRmse MatrixFactorizationModel.predict(RDD[(Int, Int)]) has been > > called and in all the test-cases that API has been used... > > > > I can perhaps refactor my code to do the same but I was wondering whether > > people test the lookup(user) version of the code.. > > > > Do I need to cache the model to make it work ? I think right now default > is > > STORAGE_AND_DISK... > > > > Thanks. > > Deb >