[ 
https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822021#comment-15822021
 ] 

Danilo Ascione commented on SPARK-13857:
----------------------------------------

I have a pipeline similar to [~abudd2014]'s one. I have implemented a dataframe 
api based RankingEvaluator that takes care of getting the top K recommendations 
at the evaluation phase of the pipeline, and it can be used in model selection 
pipeline (Cross-Validation). 
Sample usage code:
{code}
val als = new ALS() //input dataframe (userId, itemId, clicked)
      .setUserCol("userId")
      .setItemCol("itemId")
      .setRatingCol("clicked")
      .setImplicitPrefs(true)

val paramGrid = new ParamGridBuilder()
    .addGrid(als.regParam, Array(0.01,0.1))
    .addGrid(als.alpha, Array(40.0, 1.0))
    .build()

val evaluator = new RankingEvaluator()
    .setMetricName("mpr") //Mean Percentile Rank
    .setLabelCol("itemId")
    .setPredictionCol("prediction")
    .setQueryCol("userId")
    .setK(5) //Top K
 
val cv = new CrossValidator()
  .setEstimator(als)
  .setEvaluator(evaluator)
  .setEstimatorParamMaps(paramGrid)
  .setNumFolds(3)

val crossValidatorModel = cv.fit(inputDF)

// Print the average metrics per ParamGrid entry
val avgMetricsParamGrid = crossValidatorModel.avgMetrics

// Combine with paramGrid to see how they affect the overall metrics
val combined = paramGrid.zip(avgMetricsParamGrid)
{code}

Then the resulting "bestModel" from cross validation model is used to generate 
the top K recommendations in batches.

RankingEvaluator code is here 
[https://github.com/daniloascione/spark/commit/c93ab86d35984e9f70a3b4f543fb88f5541333f0]

I would appreciate any feedback. Thanks!


> Feature parity for ALS ML with MLLIB
> ------------------------------------
>
>                 Key: SPARK-13857
>                 URL: https://issues.apache.org/jira/browse/SPARK-13857
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Nick Pentreath
>            Assignee: Nick Pentreath
>
> Currently {{mllib.recommendation.MatrixFactorizationModel}} has methods 
> {{recommendProducts/recommendUsers}} for recommending top K to a given user / 
> item, as well as {{recommendProductsForUsers/recommendUsersForProducts}} to 
> recommend top K across all users/items.
> Additionally, SPARK-10802 is for adding the ability to do 
> {{recommendProductsForUsers}} for a subset of users (or vice versa).
> Look at exposing or porting (as appropriate) these methods to ALS in ML. 
> Investigate if efficiency can be improved at the same time (see SPARK-11968).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to