[spark] branch master updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

srowen Fri, 28 Jul 2023 15:30:10 -0700

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 72af2c0fbc6 [SPARK-44585][MLLIB] Fix warning condition in MLLib 
RankingMetrics ndcgAk
72af2c0fbc6 is described below

commit 72af2c0fbc6673a5e49f1fd6693fe2c90141a84f
Author: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com>
AuthorDate: Fri Jul 28 17:29:47 2023 -0500

    [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk
    
    ### What changes were proposed in this pull request?
    
    This PR fixes the condition to raise the following warning in MLLib's 
RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value 
set should be equal, check input data"
    
    The logic for raising warnings is faulty at the moment: it raises a warning 
if the `rel` input is empty and `lab.size` and `rel.size` are not equal.
    
    The logic should be to raise a warning if `rel` input is **not empty** and 
`lab.size` and `rel.size` are not equal.
    
    This warning was added in the following PR: 
https://github.com/apache/spark/pull/36843
    
    ### Why are the changes needed?
    
    With the current logic, RankingMetrics will:
    - raise incorrect warning when a user is using it in the "binary" mode 
(i.e. no relevance values in the input)
    - not raise warning (that could be necessary) when the user is using it in 
the "non-binary" model (i.e. with relevance values in the input)
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    No change made to the test suite for RankingMetrics: 
https://github.com/uchiiii/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala
    
    Closes #42207 from guilhem-depop/patch-1.
    
    Authored-by: Guilhem Vuillier 
<101632595+guilhem-de...@users.noreply.github.com>
    Signed-off-by: Sean Owen <sro...@gmail.com>
---
 .../scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala     | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
index 37e57736574..a3316d8a8fa 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
@@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
    * and the NDCG is obtained by dividing the DCG value on the ground truth 
set. In the current
    * implementation, the relevance value is binary if the relevance value is 
empty.
 
+   * If the relevance value is not empty but its size doesn't match the ground 
truth set size,
+   * a log warning is generated.
+   *
    * If a query has an empty ground truth set, zero will be used as ndcg 
together with
    * a log warning.
    *
@@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
       val useBinary = rel.isEmpty
       val labSet = lab.toSet
       val relMap = Utils.toMap(lab, rel)
-      if (useBinary && lab.size != rel.size) {
+      if (!useBinary && lab.size != rel.size) {
         logWarning(
           "# of ground truth set and # of relevance value set should be equal, 
" +
             "check input data")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

Reply via email to