[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

Nick Pentreath (JIRA) Mon, 06 Mar 2017 23:10:14 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898855#comment-15898855
 ]


Nick Pentreath commented on SPARK-14409:
----------------------------------------

[~josephkb] the proposed input schema above encompasses that - the {{labelCol}} 
is the true relevance score (rating, confidence, etc), while the 
{{predictionCol}} is the predicted relevance (rating, confidence, etc). Note we 
can name these columns something more specific ({{labelCol}} and 
{{predictionCol}} are re-used really from the other evaluators).

This also allows "weighted" forms of ranking metric later (e.g. some metrics 
can incorporate the true relevance score into the computation which serves as a 
form of weighting of the metric) - the metrics we currently have don't do that. 
So for now the true relevance can serve as a filter - for example, when 
computing the ranking metric for recommendation, we *don't* want to include 
negative ratings in the "ground truth set of relevant documents" - hence the 
{{goodThreshold}} param above (I would rather call it something like 
{{relevanceThreshold}} myself).

*Note* that there are 2 formats I detail in my comment above - the first is the 
the actual schema of the {{DataFrame}} used as input to the 
{{RankingEvaluator}} - this must therefore be the schema of the DF output by 
{{model.transform}} (whether that is ALS for recommendation, a logistic 
regression for ad prediction, or whatever).

The second format mentioned is simply illustrating the *intermediate internal 
transformation* that the evaluator will do in the {{evaluate}} method. You can 
see a rough draft of it in Danilo's PR 
[here|https://github.com/apache/spark/pull/16618/files#diff-0345c4cb1878d3bb0d84297202fdc95fR93].

> Investigate adding a RankingEvaluator to ML
> -------------------------------------------
>
>                 Key: SPARK-14409
>                 URL: https://issues.apache.org/jira/browse/SPARK-14409
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Nick Pentreath
>            Priority: Minor
>
> {{mllib.evaluation}} contains a {{RankingMetrics}} class, while there is no 
> {{RankingEvaluator}} in {{ml.evaluation}}. Such an evaluator can be useful 
> for recommendation evaluation (and can be useful in other settings 
> potentially).
> Should be thought about in conjunction with adding the "recommendAll" methods 
> in SPARK-13857, so that top-k ranking metrics can be used in cross-validators.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

Reply via email to