[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898855#comment-15898855 ]
Nick Pentreath commented on SPARK-14409: ---------------------------------------- [~josephkb] the proposed input schema above encompasses that - the {{labelCol}} is the true relevance score (rating, confidence, etc), while the {{predictionCol}} is the predicted relevance (rating, confidence, etc). Note we can name these columns something more specific ({{labelCol}} and {{predictionCol}} are re-used really from the other evaluators). This also allows "weighted" forms of ranking metric later (e.g. some metrics can incorporate the true relevance score into the computation which serves as a form of weighting of the metric) - the metrics we currently have don't do that. So for now the true relevance can serve as a filter - for example, when computing the ranking metric for recommendation, we *don't* want to include negative ratings in the "ground truth set of relevant documents" - hence the {{goodThreshold}} param above (I would rather call it something like {{relevanceThreshold}} myself). *Note* that there are 2 formats I detail in my comment above - the first is the the actual schema of the {{DataFrame}} used as input to the {{RankingEvaluator}} - this must therefore be the schema of the DF output by {{model.transform}} (whether that is ALS for recommendation, a logistic regression for ad prediction, or whatever). The second format mentioned is simply illustrating the *intermediate internal transformation* that the evaluator will do in the {{evaluate}} method. You can see a rough draft of it in Danilo's PR [here|https://github.com/apache/spark/pull/16618/files#diff-0345c4cb1878d3bb0d84297202fdc95fR93]. > Investigate adding a RankingEvaluator to ML > ------------------------------------------- > > Key: SPARK-14409 > URL: https://issues.apache.org/jira/browse/SPARK-14409 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: Nick Pentreath > Priority: Minor > > {{mllib.evaluation}} contains a {{RankingMetrics}} class, while there is no > {{RankingEvaluator}} in {{ml.evaluation}}. Such an evaluator can be useful > for recommendation evaluation (and can be useful in other settings > potentially). > Should be thought about in conjunction with adding the "recommendAll" methods > in SPARK-13857, so that top-k ranking metrics can be used in cross-validators. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org