[ https://issues.apache.org/jira/browse/SPARK-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256096#comment-15256096 ]
Sean Owen commented on SPARK-14886: ----------------------------------- No I don't think this concerns maxDCG in particular. It's evaluated through to the label set size (or k, if k is smaller of course). It does affect dcg, because the number of predictions may be less than the label set size. The rest of the predictions conceptually don't exist and do not match the label set, and so don't add to dcg. > RankingMetrics.ndcgAt throw java.lang.ArrayIndexOutOfBoundsException > ---------------------------------------------------------------------- > > Key: SPARK-14886 > URL: https://issues.apache.org/jira/browse/SPARK-14886 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.6.1 > Reporter: lichenglin > Priority: Minor > > {code} > @Since("1.2.0") > def ndcgAt(k: Int): Double = { > require(k > 0, "ranking position k should be positive") > predictionAndLabels.map { case (pred, lab) => > val labSet = lab.toSet > if (labSet.nonEmpty) { > val labSetSize = labSet.size > val n = math.min(math.max(pred.length, labSetSize), k) > var maxDcg = 0.0 > var dcg = 0.0 > var i = 0 > while (i < n) { > val gain = 1.0 / math.log(i + 2) > if (labSet.contains(pred(i))) { > dcg += gain > } > if (i < labSetSize) { > maxDcg += gain > } > i += 1 > } > dcg / maxDcg > } else { > logWarning("Empty ground truth set, check input data") > 0.0 > } > }.mean() > } > {code} > "if (labSet.contains(pred(i)))" will throw ArrayIndexOutOfBoundsException > when pred's size less then k. > That meas the true relevant documents has less size then the param k. > just try this with sample_movielens_data.txt > for example set pred.size to 5,labSetSize to 10,k to 20,then the n is 10. > pred[10] not exists; > precisionAt is ok just because it has > val n = math.min(pred.length, k) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org