[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880324#comment-15880324 ]
Nick Pentreath commented on SPARK-14409: ---------------------------------------- [~roberto.mirizzi] If using the current {{ALS.transform}} output as input to the {{RankingEvaluator}}, as envisaged here, the model will predict a score for each {{user-item}} pair in the evaluation set. For each user, the ground truth is exactly this distinct set of items. By definition the top-k items ranked by predicted sore will be in the ground truth set, since {{ALS}} is only scoring {{user-item}} pairs *that already exist in the evaluation set*. So how is it possible *not* to get a perfect score, since all top-k recommended items will be "relevant"? Unless you are cutting off the ground truth set at {{k}} too - in which case that does not sound like a correct computation to me. By contrast, if {{ALS.transform}} output a set of top-k items for each user, where the items are scored from *the entire set of possible candidate items*, then computing the ranking metric of that top-k set against the actual ground truth for each user is correct. > Investigate adding a RankingEvaluator to ML > ------------------------------------------- > > Key: SPARK-14409 > URL: https://issues.apache.org/jira/browse/SPARK-14409 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: Nick Pentreath > Priority: Minor > > {{mllib.evaluation}} contains a {{RankingMetrics}} class, while there is no > {{RankingEvaluator}} in {{ml.evaluation}}. Such an evaluator can be useful > for recommendation evaluation (and can be useful in other settings > potentially). > Should be thought about in conjunction with adding the "recommendAll" methods > in SPARK-13857, so that top-k ranking metrics can be used in cross-validators. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org