[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902649#comment-15902649 ]
Nick Pentreath commented on SPARK-14409: ---------------------------------------- [~josephkb] in reference to your [PR comment|https://github.com/apache/spark/pull/17090#issuecomment-284827573]: Really the input schema for evaluation is fairly simple - a set of ground truth ids and a (sorted) set of predicted ids, for each query (/user). The exact format (arrays like for {{mllib}} version, "exploded" version proposed in this JIRA) is not relevant in itself. Rather, the format selected is actually dictated by the {{Pipeline}} API - specifically, a model's prediction output schema from {{transform}} must be compatible with the evaluator's input schema for {{evaluate}}. The schema proposed above is - I believe - the only one that is compatible with both "linear model" style things such as `LogisticRegression` for ad CTR prediction and learning-to-rank settings, as well as recommendation tasks. > Investigate adding a RankingEvaluator to ML > ------------------------------------------- > > Key: SPARK-14409 > URL: https://issues.apache.org/jira/browse/SPARK-14409 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: Nick Pentreath > Priority: Minor > > {{mllib.evaluation}} contains a {{RankingMetrics}} class, while there is no > {{RankingEvaluator}} in {{ml.evaluation}}. Such an evaluator can be useful > for recommendation evaluation (and can be useful in other settings > potentially). > Should be thought about in conjunction with adding the "recommendAll" methods > in SPARK-13857, so that top-k ranking metrics can be used in cross-validators. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org