[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822021#comment-15822021 ]
Danilo Ascione commented on SPARK-13857: ---------------------------------------- I have a pipeline similar to [~abudd2014]'s one. I have implemented a dataframe api based RankingEvaluator that takes care of getting the top K recommendations at the evaluation phase of the pipeline, and it can be used in model selection pipeline (Cross-Validation). Sample usage code: {code} val als = new ALS() //input dataframe (userId, itemId, clicked) .setUserCol("userId") .setItemCol("itemId") .setRatingCol("clicked") .setImplicitPrefs(true) val paramGrid = new ParamGridBuilder() .addGrid(als.regParam, Array(0.01,0.1)) .addGrid(als.alpha, Array(40.0, 1.0)) .build() val evaluator = new RankingEvaluator() .setMetricName("mpr") //Mean Percentile Rank .setLabelCol("itemId") .setPredictionCol("prediction") .setQueryCol("userId") .setK(5) //Top K val cv = new CrossValidator() .setEstimator(als) .setEvaluator(evaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(3) val crossValidatorModel = cv.fit(inputDF) // Print the average metrics per ParamGrid entry val avgMetricsParamGrid = crossValidatorModel.avgMetrics // Combine with paramGrid to see how they affect the overall metrics val combined = paramGrid.zip(avgMetricsParamGrid) {code} Then the resulting "bestModel" from cross validation model is used to generate the top K recommendations in batches. RankingEvaluator code is here [https://github.com/daniloascione/spark/commit/c93ab86d35984e9f70a3b4f543fb88f5541333f0] I would appreciate any feedback. Thanks! > Feature parity for ALS ML with MLLIB > ------------------------------------ > > Key: SPARK-13857 > URL: https://issues.apache.org/jira/browse/SPARK-13857 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Nick Pentreath > Assignee: Nick Pentreath > > Currently {{mllib.recommendation.MatrixFactorizationModel}} has methods > {{recommendProducts/recommendUsers}} for recommending top K to a given user / > item, as well as {{recommendProductsForUsers/recommendUsersForProducts}} to > recommend top K across all users/items. > Additionally, SPARK-10802 is for adding the ability to do > {{recommendProductsForUsers}} for a subset of users (or vice versa). > Look at exposing or porting (as appropriate) these methods to ALS in ML. > Investigate if efficiency can be improved at the same time (see SPARK-11968). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org