[ 
https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826901#comment-15826901
 ] 

Roberto Mirizzi edited comment on SPARK-14409 at 1/17/17 9:55 PM:
------------------------------------------------------------------

[~srowen] I've updated the code above to generalize K. 
I've also added a couple of lines to deal with NaN (it probably could be 
further generalized, but it's a good start).

In the code I propose I simply re-use the class 
*org.apache.spark.mllib.evaluation.RankingMetrics* already available in Spark 
since 1.2.0. The class only offers *p@k*, *ndcg@k* and *map* (as you can also 
see here: 
https://spark.apache.org/docs/2.1.0/mllib-evaluation-metrics.html#ranking-systems).
 That's why they are the only one also available in my implementation. 
AUC or ROC are under *BinaryClassificationMetrics*. I haven't wrapped them yet, 
but I could do that too later by creating a *BinaryClassificationEvaluator* as 
I've done for *RankingEvaluator*. 

The motivation behind *goodThreshold* is that the test set may also contain 
items that user doesn't like. However, when you compute accuracy metric, you 
want to make sure you compare only against the set of items that the user 
likes. As you can see in my code it's set to 0 by default, so unless specified, 
everything in the user profile will be considered.


was (Author: roberto.mirizzi):
[~srowen] I've updated the code above to generalize K. 
I've also added a couple of lines to deal with NaN (it probably could be 
further generalized, but it's a good start).

In the code I propose I simply re-use the class 
*org.apache.spark.mllib.evaluation.RankingMetrics* already available in Spark 
since 1.2.0. The class only offers *p@k*, *ndcg@k* and *map* (as you can also 
see here: 
https://spark.apache.org/docs/2.1.0/mllib-evaluation-metrics.html#ranking-systems).
 That's why they are the only one also available in my implementation. 
AUC or ROC are under *BinaryClassificationMetrics*. I haven't wrapped them yet, 
but I could do that too later by creating a *BinaryClassificationEvaluator* as 
I've done for *RankingEvaluator*. 

The motivation behind for *goodThreshold* is that the test set may also contain 
items that user doesn't like. However, when you compute accuracy metric, you 
want to make sure you compare only against the set of items that the user 
likes. As you can see in my code it's set to 0 by default, so unless specified, 
everything in the user profile will be considered.

> Investigate adding a RankingEvaluator to ML
> -------------------------------------------
>
>                 Key: SPARK-14409
>                 URL: https://issues.apache.org/jira/browse/SPARK-14409
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Nick Pentreath
>            Priority: Minor
>
> {{mllib.evaluation}} contains a {{RankingMetrics}} class, while there is no 
> {{RankingEvaluator}} in {{ml.evaluation}}. Such an evaluator can be useful 
> for recommendation evaluation (and can be useful in other settings 
> potentially).
> Should be thought about in conjunction with adding the "recommendAll" methods 
> in SPARK-13857, so that top-k ranking metrics can be used in cross-validators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to