[ 
https://issues.apache.org/jira/browse/SPARK-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256096#comment-15256096
 ] 

Sean Owen commented on SPARK-14886:
-----------------------------------

No I don't think this concerns maxDCG in particular. It's evaluated through to 
the label set size (or k, if k is smaller of course). It does affect dcg, 
because the number of predictions may be less than the label set size. The rest 
of the predictions conceptually don't exist and do not match the label set, and 
so don't add to dcg.

> RankingMetrics.ndcgAt  throw  java.lang.ArrayIndexOutOfBoundsException
> ----------------------------------------------------------------------
>
>                 Key: SPARK-14886
>                 URL: https://issues.apache.org/jira/browse/SPARK-14886
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.6.1
>            Reporter: lichenglin
>            Priority: Minor
>
> {code} 
> @Since("1.2.0")
>   def ndcgAt(k: Int): Double = {
>     require(k > 0, "ranking position k should be positive")
>     predictionAndLabels.map { case (pred, lab) =>
>       val labSet = lab.toSet
>       if (labSet.nonEmpty) {
>         val labSetSize = labSet.size
>         val n = math.min(math.max(pred.length, labSetSize), k)
>         var maxDcg = 0.0
>         var dcg = 0.0
>         var i = 0
>         while (i < n) {
>           val gain = 1.0 / math.log(i + 2)
>           if (labSet.contains(pred(i))) {
>             dcg += gain
>           }
>           if (i < labSetSize) {
>             maxDcg += gain
>           }
>           i += 1
>         }
>         dcg / maxDcg
>       } else {
>         logWarning("Empty ground truth set, check input data")
>         0.0
>       }
>     }.mean()
>   }
> {code}
> "if (labSet.contains(pred(i)))" will throw ArrayIndexOutOfBoundsException 
> when pred's size less then k.
> That meas the true relevant documents has less size then the param k.
> just try this with sample_movielens_data.txt
> for example set pred.size to 5,labSetSize to 10,k to 20,then the n is 10. 
> pred[10] not exists;
> precisionAt is ok just because it has         
> val n = math.min(pred.length, k)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to