[jira] [Updated] (SPARK-14886) RankingMetrics.ndcgAt throw java.lang.ArrayIndexOutOfBoundsException

lichenglin (JIRA) Mon, 25 Apr 2016 00:47:43 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-14886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


lichenglin updated SPARK-14886:
-------------------------------
    Description: 
{code} 
@Since("1.2.0")
  def ndcgAt(k: Int): Double = {
    require(k > 0, "ranking position k should be positive")
    predictionAndLabels.map { case (pred, lab) =>
      val labSet = lab.toSet

      if (labSet.nonEmpty) {
        val labSetSize = labSet.size
        val n = math.min(math.max(pred.length, labSetSize), k)
        var maxDcg = 0.0
        var dcg = 0.0
        var i = 0
        while (i < n) {
          val gain = 1.0 / math.log(i + 2)
          if (labSet.contains(pred(i))) {
            dcg += gain
          }
          if (i < labSetSize) {
            maxDcg += gain
          }
          i += 1
        }
        dcg / maxDcg
      } else {
        logWarning("Empty ground truth set, check input data")
        0.0
      }
    }.mean()
  }
{code}

"if (labSet.contains(pred(i)))" will throw ArrayIndexOutOfBoundsException when 
pred's size less then k.
That meas the true relevant documents has less size then the param k.
just try this with sample_movielens_data.txt
for example set pred.size to 5,labSetSize to 10,k to 20,then the n is 10. 
pred[10] not exists;
precisionAt is ok just because it has         
val n = math.min(pred.length, k)




  was:
{code} 
@Since("1.2.0")
  def ndcgAt(k: Int): Double = {
    require(k > 0, "ranking position k should be positive")
    predictionAndLabels.map { case (pred, lab) =>
      val labSet = lab.toSet

      if (labSet.nonEmpty) {
        val labSetSize = labSet.size
        val n = math.min(math.max(pred.length, labSetSize), k)
        var maxDcg = 0.0
        var dcg = 0.0
        var i = 0
        while (i < n) {
          val gain = 1.0 / math.log(i + 2)
          if (labSet.contains(pred(i))) {
            dcg += gain
          }
          if (i < labSetSize) {
            maxDcg += gain
          }
          i += 1
        }
        dcg / maxDcg
      } else {
        logWarning("Empty ground truth set, check input data")
        0.0
      }
    }.mean()
  }
{code}

"if (labSet.contains(pred(i)))" will throw ArrayIndexOutOfBoundsException when 
pred's size less then k.
That meas the true relevant documents has less size then the param k.
just try this with sample_movielens_data.txt
precisionAt is ok just because it has         
val n = math.min(pred.length, k)





> RankingMetrics.ndcgAt  throw  java.lang.ArrayIndexOutOfBoundsException
> ----------------------------------------------------------------------
>
>                 Key: SPARK-14886
>                 URL: https://issues.apache.org/jira/browse/SPARK-14886
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.6.1
>            Reporter: lichenglin
>            Priority: Minor
>
> {code} 
> @Since("1.2.0")
>   def ndcgAt(k: Int): Double = {
>     require(k > 0, "ranking position k should be positive")
>     predictionAndLabels.map { case (pred, lab) =>
>       val labSet = lab.toSet
>       if (labSet.nonEmpty) {
>         val labSetSize = labSet.size
>         val n = math.min(math.max(pred.length, labSetSize), k)
>         var maxDcg = 0.0
>         var dcg = 0.0
>         var i = 0
>         while (i < n) {
>           val gain = 1.0 / math.log(i + 2)
>           if (labSet.contains(pred(i))) {
>             dcg += gain
>           }
>           if (i < labSetSize) {
>             maxDcg += gain
>           }
>           i += 1
>         }
>         dcg / maxDcg
>       } else {
>         logWarning("Empty ground truth set, check input data")
>         0.0
>       }
>     }.mean()
>   }
> {code}
> "if (labSet.contains(pred(i)))" will throw ArrayIndexOutOfBoundsException 
> when pred's size less then k.
> That meas the true relevant documents has less size then the param k.
> just try this with sample_movielens_data.txt
> for example set pred.size to 5,labSetSize to 10,k to 20,then the n is 10. 
> pred[10] not exists;
> precisionAt is ok just because it has         
> val n = math.min(pred.length, k)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14886) RankingMetrics.ndcgAt throw java.lang.ArrayIndexOutOfBoundsException

Reply via email to