spark git commit: [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score

srowen Sat, 04 Jun 2016 05:57:07 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 cf8782116 -> 729730159



[SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" 
f1_score

## What changes were proposed in this pull request?
1, del precision,recall in  `ml.MulticlassClassificationEvaluator`
2, update user guide for `mlllib.weightedFMeasure`

## How was this patch tested?
local build

Author: Ruifeng Zheng <ruife...@foxmail.com>

Closes #13390 from zhengruifeng/clarify_f1.

(cherry picked from commit 2099e05f93067937cdf6cedcf493afd66e212abe)
Signed-off-by: Sean Owen <so...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/72973015
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/72973015
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/72973015

Branch: refs/heads/branch-2.0
Commit: 729730159c6236cb437d215388d444f16849f405
Parents: cf87821
Author: Ruifeng Zheng <ruife...@foxmail.com>
Authored: Sat Jun 4 13:56:04 2016 +0100
Committer: Sean Owen <so...@cloudera.com>
Committed: Sat Jun 4 13:56:16 2016 +0100

----------------------------------------------------------------------
 docs/mllib-evaluation-metrics.md                    | 16 +++-------------
 .../MulticlassClassificationEvaluator.scala         | 12 +++++-------
 .../MulticlassClassificationEvaluatorSuite.scala    |  2 +-
 python/pyspark/ml/evaluation.py                     |  4 +---
 4 files changed, 10 insertions(+), 24 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/72973015/docs/mllib-evaluation-metrics.md
----------------------------------------------------------------------
diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md
index a269dbf..c49bc4f 100644
--- a/docs/mllib-evaluation-metrics.md
+++ b/docs/mllib-evaluation-metrics.md
@@ -140,7 +140,7 @@ definitions of positive and negative labels is 
straightforward.
 #### Label based metrics
 
 Opposed to binary classification where there are only two possible labels, 
multiclass classification problems have many
-possible labels and so the concept of label-based metrics is introduced. 
Overall precision measures precision across all
+possible labels and so the concept of label-based metrics is introduced. 
Accuracy measures precision across all
 labels -  the number of times any class was predicted correctly (true 
positives) normalized by the number of data
 points. Precision by label considers only one class, and measures the number 
of time a specific label was predicted
 correctly normalized by the number of times that label appears in the output.
@@ -182,21 +182,11 @@ $$\hat{\delta}(x) = \begin{cases}1 & \text{if $x = 0$}, 
\\ 0 & \text{otherwise}.
       </td>
     </tr>
     <tr>
-      <td>Overall Precision</td>
-      <td>$PPV = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1} 
\hat{\delta}\left(\hat{\mathbf{y}}_i -
-        \mathbf{y}_i\right)$</td>
-    </tr>
-    <tr>
-      <td>Overall Recall</td>
-      <td>$TPR = \frac{TP}{TP + FN} = \frac{1}{N}\sum_{i=0}^{N-1} 
\hat{\delta}\left(\hat{\mathbf{y}}_i -
+      <td>Accuracy</td>
+      <td>$ACC = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1} 
\hat{\delta}\left(\hat{\mathbf{y}}_i -
         \mathbf{y}_i\right)$</td>
     </tr>
     <tr>
-      <td>Overall F1-measure</td>
-      <td>$F1 = 2 \cdot \left(\frac{PPV \cdot TPR}
-          {PPV + TPR}\right)$</td>
-    </tr>
-    <tr>
       <td>Precision by label</td>
       <td>$PPV(\ell) = \frac{TP}{TP + FP} =
           \frac{\sum_{i=0}^{N-1} \hat{\delta}(\hat{\mathbf{y}}_i - \ell) \cdot 
\hat{\delta}(\mathbf{y}_i - \ell)}

http://git-wip-us.apache.org/repos/asf/spark/blob/72973015/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
index 0b84e0a..794b1e7 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
@@ -39,16 +39,16 @@ class MulticlassClassificationEvaluator @Since("1.5.0") 
(@Since("1.5.0") overrid
   def this() = this(Identifiable.randomUID("mcEval"))
 
   /**
-   * param for metric name in evaluation (supports `"f1"` (default), 
`"precision"`, `"recall"`,
-   * `"weightedPrecision"`, `"weightedRecall"`, `"accuracy"`)
+   * param for metric name in evaluation (supports `"f1"` (default), 
`"weightedPrecision"`,
+   * `"weightedRecall"`, `"accuracy"`)
    * @group param
    */
   @Since("1.5.0")
   val metricName: Param[String] = {
-    val allowedParams = ParamValidators.inArray(Array("f1", "precision",
-      "recall", "weightedPrecision", "weightedRecall", "accuracy"))
+    val allowedParams = ParamValidators.inArray(Array("f1", 
"weightedPrecision",
+      "weightedRecall", "accuracy"))
     new Param(this, "metricName", "metric name in evaluation " +
-      "(f1|precision|recall|weightedPrecision|weightedRecall|accuracy)", 
allowedParams)
+      "(f1|weightedPrecision|weightedRecall|accuracy)", allowedParams)
   }
 
   /** @group getParam */
@@ -82,8 +82,6 @@ class MulticlassClassificationEvaluator @Since("1.5.0") 
(@Since("1.5.0") overrid
     val metrics = new MulticlassMetrics(predictionAndLabels)
     val metric = $(metricName) match {
       case "f1" => metrics.weightedFMeasure
-      case "precision" => metrics.accuracy
-      case "recall" => metrics.accuracy
       case "weightedPrecision" => metrics.weightedPrecision
       case "weightedRecall" => metrics.weightedRecall
       case "accuracy" => metrics.accuracy

http://git-wip-us.apache.org/repos/asf/spark/blob/72973015/mllib/src/test/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluatorSuite.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluatorSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluatorSuite.scala
index 522f667..1a3a8a1 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluatorSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluatorSuite.scala
@@ -33,7 +33,7 @@ class MulticlassClassificationEvaluatorSuite
     val evaluator = new MulticlassClassificationEvaluator()
       .setPredictionCol("myPrediction")
       .setLabelCol("myLabel")
-      .setMetricName("recall")
+      .setMetricName("accuracy")
     testDefaultReadWrite(evaluator)
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/72973015/python/pyspark/ml/evaluation.py
----------------------------------------------------------------------
diff --git a/python/pyspark/ml/evaluation.py b/python/pyspark/ml/evaluation.py
index b8b2b37..c480525 100644
--- a/python/pyspark/ml/evaluation.py
+++ b/python/pyspark/ml/evaluation.py
@@ -258,9 +258,7 @@ class MulticlassClassificationEvaluator(JavaEvaluator, 
HasLabelCol, HasPredictio
     >>> evaluator = 
MulticlassClassificationEvaluator(predictionCol="prediction")
     >>> evaluator.evaluate(dataset)
     0.66...
-    >>> evaluator.evaluate(dataset, {evaluator.metricName: "precision"})
-    0.66...
-    >>> evaluator.evaluate(dataset, {evaluator.metricName: "recall"})
+    >>> evaluator.evaluate(dataset, {evaluator.metricName: "accuracy"})
     0.66...
 
     .. versionadded:: 1.5.0


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score

Reply via email to