spark git commit: [SPARK-15707][SQL] Make Code Neat - Use map instead of if check.
Repository: spark Updated Branches: refs/heads/branch-2.0 7e4c9dd55 -> 32a64d8fc [SPARK-15707][SQL] Make Code Neat - Use map instead of if check. ## What changes were proposed in this pull request? In forType function of object RandomDataGenerator, the code following: if (maybeSqlTypeGenerator.isDefined){ Some(generator) } else{ None } will be changed. Instead, maybeSqlTypeGenerator.map will be used. ## How was this patch tested? All of the current unit tests passed. Author: Weiqing Yang Closes #13448 from Sherry302/master. (cherry picked from commit 0f307db5e17e1e8a655cfa751218ac4ed88717a7) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/32a64d8f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/32a64d8f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/32a64d8f Branch: refs/heads/branch-2.0 Commit: 32a64d8fc9e7ddaf993bdd7e679113dc605a69a7 Parents: 7e4c9dd Author: Weiqing Yang Authored: Sat Jun 4 22:44:03 2016 +0100 Committer: Sean Owen Committed: Sat Jun 4 22:44:12 2016 +0100 -- .../scala/org/apache/spark/sql/RandomDataGenerator.scala | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/32a64d8f/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala index 711e870..8508697 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala @@ -236,9 +236,8 @@ object RandomDataGenerator { // convert it to catalyst value to call udt's deserialize. val toCatalystType = CatalystTypeConverters.createToCatalystConverter(udt.sqlType) -if (maybeSqlTypeGenerator.isDefined) { - val sqlTypeGenerator = maybeSqlTypeGenerator.get - val generator = () => { +maybeSqlTypeGenerator.map { sqlTypeGenerator => + () => { val generatedScalaValue = sqlTypeGenerator.apply() if (generatedScalaValue == null) { null @@ -246,9 +245,6 @@ object RandomDataGenerator { udt.deserialize(toCatalystType(generatedScalaValue)) } } - Some(generator) -} else { - None } case unsupportedType => None } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15707][SQL] Make Code Neat - Use map instead of if check.
Repository: spark Updated Branches: refs/heads/master 091f81e1f -> 0f307db5e [SPARK-15707][SQL] Make Code Neat - Use map instead of if check. ## What changes were proposed in this pull request? In forType function of object RandomDataGenerator, the code following: if (maybeSqlTypeGenerator.isDefined){ Some(generator) } else{ None } will be changed. Instead, maybeSqlTypeGenerator.map will be used. ## How was this patch tested? All of the current unit tests passed. Author: Weiqing Yang Closes #13448 from Sherry302/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0f307db5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0f307db5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0f307db5 Branch: refs/heads/master Commit: 0f307db5e17e1e8a655cfa751218ac4ed88717a7 Parents: 091f81e Author: Weiqing Yang Authored: Sat Jun 4 22:44:03 2016 +0100 Committer: Sean Owen Committed: Sat Jun 4 22:44:03 2016 +0100 -- .../scala/org/apache/spark/sql/RandomDataGenerator.scala | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0f307db5/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala index 711e870..8508697 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala @@ -236,9 +236,8 @@ object RandomDataGenerator { // convert it to catalyst value to call udt's deserialize. val toCatalystType = CatalystTypeConverters.createToCatalystConverter(udt.sqlType) -if (maybeSqlTypeGenerator.isDefined) { - val sqlTypeGenerator = maybeSqlTypeGenerator.get - val generator = () => { +maybeSqlTypeGenerator.map { sqlTypeGenerator => + () => { val generatedScalaValue = sqlTypeGenerator.apply() if (generatedScalaValue == null) { null @@ -246,9 +245,6 @@ object RandomDataGenerator { udt.deserialize(toCatalystType(generatedScalaValue)) } } - Some(generator) -} else { - None } case unsupportedType => None } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15762][SQL] Cache Metadata & StructType hashCodes; use singleton Metadata.empty
Repository: spark Updated Branches: refs/heads/branch-2.0 ed1e20207 -> 7e4c9dd55 [SPARK-15762][SQL] Cache Metadata & StructType hashCodes; use singleton Metadata.empty We should cache `Metadata.hashCode` and use a singleton for `Metadata.empty` because calculating metadata hashCodes appears to be a bottleneck for certain workloads. We should also cache `StructType.hashCode`. In an optimizer stress-test benchmark run by ericl, these `hashCode` calls accounted for roughly 40% of the total CPU time and this bottleneck was completely eliminated by the caching added by this patch. Author: Josh Rosen Closes #13504 from JoshRosen/metadata-fix. (cherry picked from commit 091f81e1f7ef1581376c71e3872ce06f4c1713bd) Signed-off-by: Josh Rosen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7e4c9dd5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7e4c9dd5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7e4c9dd5 Branch: refs/heads/branch-2.0 Commit: 7e4c9dd55532b35030f6542f6640521596eb13f3 Parents: ed1e202 Author: Josh Rosen Authored: Sat Jun 4 14:14:50 2016 -0700 Committer: Josh Rosen Committed: Sat Jun 4 14:15:03 2016 -0700 -- .../src/main/scala/org/apache/spark/sql/types/Metadata.scala | 7 +-- .../main/scala/org/apache/spark/sql/types/StructType.scala| 3 ++- 2 files changed, 7 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7e4c9dd5/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala index 1fb2e24..657bd86 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala @@ -104,7 +104,8 @@ sealed class Metadata private[types] (private[types] val map: Map[String, Any]) } } - override def hashCode: Int = Metadata.hash(this) + private lazy val _hashCode: Int = Metadata.hash(this) + override def hashCode: Int = _hashCode private def get[T](key: String): T = { map(key).asInstanceOf[T] @@ -115,8 +116,10 @@ sealed class Metadata private[types] (private[types] val map: Map[String, Any]) object Metadata { + private[this] val _empty = new Metadata(Map.empty) + /** Returns an empty Metadata. */ - def empty: Metadata = new Metadata(Map.empty) + def empty: Metadata = _empty /** Creates a Metadata instance from JSON. */ def fromJson(json: String): Metadata = { http://git-wip-us.apache.org/repos/asf/spark/blob/7e4c9dd5/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala index fd2b524..9a92373 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala @@ -112,7 +112,8 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru } } - override def hashCode(): Int = java.util.Arrays.hashCode(fields.asInstanceOf[Array[AnyRef]]) + private lazy val _hashCode: Int = java.util.Arrays.hashCode(fields.asInstanceOf[Array[AnyRef]]) + override def hashCode(): Int = _hashCode /** * Creates a new [[StructType]] by adding a new field. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15762][SQL] Cache Metadata & StructType hashCodes; use singleton Metadata.empty
Repository: spark Updated Branches: refs/heads/master 681387b2d -> 091f81e1f [SPARK-15762][SQL] Cache Metadata & StructType hashCodes; use singleton Metadata.empty We should cache `Metadata.hashCode` and use a singleton for `Metadata.empty` because calculating metadata hashCodes appears to be a bottleneck for certain workloads. We should also cache `StructType.hashCode`. In an optimizer stress-test benchmark run by ericl, these `hashCode` calls accounted for roughly 40% of the total CPU time and this bottleneck was completely eliminated by the caching added by this patch. Author: Josh Rosen Closes #13504 from JoshRosen/metadata-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/091f81e1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/091f81e1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/091f81e1 Branch: refs/heads/master Commit: 091f81e1f7ef1581376c71e3872ce06f4c1713bd Parents: 681387b Author: Josh Rosen Authored: Sat Jun 4 14:14:50 2016 -0700 Committer: Josh Rosen Committed: Sat Jun 4 14:14:50 2016 -0700 -- .../src/main/scala/org/apache/spark/sql/types/Metadata.scala | 7 +-- .../main/scala/org/apache/spark/sql/types/StructType.scala| 3 ++- 2 files changed, 7 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/091f81e1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala index 1fb2e24..657bd86 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala @@ -104,7 +104,8 @@ sealed class Metadata private[types] (private[types] val map: Map[String, Any]) } } - override def hashCode: Int = Metadata.hash(this) + private lazy val _hashCode: Int = Metadata.hash(this) + override def hashCode: Int = _hashCode private def get[T](key: String): T = { map(key).asInstanceOf[T] @@ -115,8 +116,10 @@ sealed class Metadata private[types] (private[types] val map: Map[String, Any]) object Metadata { + private[this] val _empty = new Metadata(Map.empty) + /** Returns an empty Metadata. */ - def empty: Metadata = new Metadata(Map.empty) + def empty: Metadata = _empty /** Creates a Metadata instance from JSON. */ def fromJson(json: String): Metadata = { http://git-wip-us.apache.org/repos/asf/spark/blob/091f81e1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala index fd2b524..9a92373 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala @@ -112,7 +112,8 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru } } - override def hashCode(): Int = java.util.Arrays.hashCode(fields.asInstanceOf[Array[AnyRef]]) + private lazy val _hashCode: Int = java.util.Arrays.hashCode(fields.asInstanceOf[Array[AnyRef]]) + override def hashCode(): Int = _hashCode /** * Creates a new [[StructType]] by adding a new field. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright
Repository: spark Updated Branches: refs/heads/master 2099e05f9 -> 681387b2d [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright ## What changes were proposed in this pull request? Per conversation on dev list, add missing modernizr license. Specify "2014 and onwards" in copyright statement. ## How was this patch tested? (none required) Author: Sean Owen Closes #13510 from srowen/ModernizrLicense. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/681387b2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/681387b2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/681387b2 Branch: refs/heads/master Commit: 681387b2dc9a094cfba84188a1dd1ac9192bb99c Parents: 2099e05 Author: Sean Owen Authored: Sat Jun 4 21:41:27 2016 +0100 Committer: Sean Owen Committed: Sat Jun 4 21:41:27 2016 +0100 -- LICENSE| 1 + NOTICE | 2 +- licenses/LICENSE-modernizr.txt | 21 + 3 files changed, 23 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/681387b2/LICENSE -- diff --git a/LICENSE b/LICENSE index f403640..94fd46f 100644 --- a/LICENSE +++ b/LICENSE @@ -296,3 +296,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt. (MIT License) blockUI (http://jquery.malsup.com/block/) (MIT License) RowsGroup (http://datatables.net/license/mit) (MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html) + (MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE) http://git-wip-us.apache.org/repos/asf/spark/blob/681387b2/NOTICE -- diff --git a/NOTICE b/NOTICE index f4b1260..69b513e 100644 --- a/NOTICE +++ b/NOTICE @@ -1,5 +1,5 @@ Apache Spark -Copyright 2014 The Apache Software Foundation. +Copyright 2014 and onwards The Apache Software Foundation. This product includes software developed at The Apache Software Foundation (http://www.apache.org/). http://git-wip-us.apache.org/repos/asf/spark/blob/681387b2/licenses/LICENSE-modernizr.txt -- diff --git a/licenses/LICENSE-modernizr.txt b/licenses/LICENSE-modernizr.txt new file mode 100644 index 000..2bf24b9 --- /dev/null +++ b/licenses/LICENSE-modernizr.txt @@ -0,0 +1,21 @@ +The MIT License (MIT) + +Copyright (c) + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright
Repository: spark Updated Branches: refs/heads/branch-2.0 729730159 -> ed1e20207 [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright ## What changes were proposed in this pull request? Per conversation on dev list, add missing modernizr license. Specify "2014 and onwards" in copyright statement. ## How was this patch tested? (none required) Author: Sean Owen Closes #13510 from srowen/ModernizrLicense. (cherry picked from commit 681387b2dc9a094cfba84188a1dd1ac9192bb99c) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed1e2020 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed1e2020 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed1e2020 Branch: refs/heads/branch-2.0 Commit: ed1e20207c1c2e503a22d5ad2cdf505ef6ecbcad Parents: 7297301 Author: Sean Owen Authored: Sat Jun 4 21:41:27 2016 +0100 Committer: Sean Owen Committed: Sat Jun 4 21:41:35 2016 +0100 -- LICENSE| 1 + NOTICE | 2 +- licenses/LICENSE-modernizr.txt | 21 + 3 files changed, 23 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/LICENSE -- diff --git a/LICENSE b/LICENSE index f403640..94fd46f 100644 --- a/LICENSE +++ b/LICENSE @@ -296,3 +296,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt. (MIT License) blockUI (http://jquery.malsup.com/block/) (MIT License) RowsGroup (http://datatables.net/license/mit) (MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html) + (MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE) http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/NOTICE -- diff --git a/NOTICE b/NOTICE index f4b1260..69b513e 100644 --- a/NOTICE +++ b/NOTICE @@ -1,5 +1,5 @@ Apache Spark -Copyright 2014 The Apache Software Foundation. +Copyright 2014 and onwards The Apache Software Foundation. This product includes software developed at The Apache Software Foundation (http://www.apache.org/). http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/licenses/LICENSE-modernizr.txt -- diff --git a/licenses/LICENSE-modernizr.txt b/licenses/LICENSE-modernizr.txt new file mode 100644 index 000..2bf24b9 --- /dev/null +++ b/licenses/LICENSE-modernizr.txt @@ -0,0 +1,21 @@ +The MIT License (MIT) + +Copyright (c) + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score
Repository: spark Updated Branches: refs/heads/branch-2.0 cf8782116 -> 729730159 [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score ## What changes were proposed in this pull request? 1, del precision,recall in `ml.MulticlassClassificationEvaluator` 2, update user guide for `mlllib.weightedFMeasure` ## How was this patch tested? local build Author: Ruifeng Zheng Closes #13390 from zhengruifeng/clarify_f1. (cherry picked from commit 2099e05f93067937cdf6cedcf493afd66e212abe) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/72973015 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/72973015 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/72973015 Branch: refs/heads/branch-2.0 Commit: 729730159c6236cb437d215388d444f16849f405 Parents: cf87821 Author: Ruifeng Zheng Authored: Sat Jun 4 13:56:04 2016 +0100 Committer: Sean Owen Committed: Sat Jun 4 13:56:16 2016 +0100 -- docs/mllib-evaluation-metrics.md| 16 +++- .../MulticlassClassificationEvaluator.scala | 12 +--- .../MulticlassClassificationEvaluatorSuite.scala| 2 +- python/pyspark/ml/evaluation.py | 4 +--- 4 files changed, 10 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/72973015/docs/mllib-evaluation-metrics.md -- diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md index a269dbf..c49bc4f 100644 --- a/docs/mllib-evaluation-metrics.md +++ b/docs/mllib-evaluation-metrics.md @@ -140,7 +140,7 @@ definitions of positive and negative labels is straightforward. Label based metrics Opposed to binary classification where there are only two possible labels, multiclass classification problems have many -possible labels and so the concept of label-based metrics is introduced. Overall precision measures precision across all +possible labels and so the concept of label-based metrics is introduced. Accuracy measures precision across all labels - the number of times any class was predicted correctly (true positives) normalized by the number of data points. Precision by label considers only one class, and measures the number of time a specific label was predicted correctly normalized by the number of times that label appears in the output. @@ -182,21 +182,11 @@ $$\hat{\delta}(x) = \begin{cases}1 & \text{if $x = 0$}, \\ 0 & \text{otherwise}. - Overall Precision - $PPV = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1} \hat{\delta}\left(\hat{\mathbf{y}}_i - -\mathbf{y}_i\right)$ - - - Overall Recall - $TPR = \frac{TP}{TP + FN} = \frac{1}{N}\sum_{i=0}^{N-1} \hat{\delta}\left(\hat{\mathbf{y}}_i - + Accuracy + $ACC = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1} \hat{\delta}\left(\hat{\mathbf{y}}_i - \mathbf{y}_i\right)$ - Overall F1-measure - $F1 = 2 \cdot \left(\frac{PPV \cdot TPR} - {PPV + TPR}\right)$ - - Precision by label $PPV(\ell) = \frac{TP}{TP + FP} = \frac{\sum_{i=0}^{N-1} \hat{\delta}(\hat{\mathbf{y}}_i - \ell) \cdot \hat{\delta}(\mathbf{y}_i - \ell)} http://git-wip-us.apache.org/repos/asf/spark/blob/72973015/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala b/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala index 0b84e0a..794b1e7 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala @@ -39,16 +39,16 @@ class MulticlassClassificationEvaluator @Since("1.5.0") (@Since("1.5.0") overrid def this() = this(Identifiable.randomUID("mcEval")) /** - * param for metric name in evaluation (supports `"f1"` (default), `"precision"`, `"recall"`, - * `"weightedPrecision"`, `"weightedRecall"`, `"accuracy"`) + * param for metric name in evaluation (supports `"f1"` (default), `"weightedPrecision"`, + * `"weightedRecall"`, `"accuracy"`) * @group param */ @Since("1.5.0") val metricName: Param[String] = { -val allowedParams = ParamValidators.inArray(Array("f1", "precision", - "recall", "weightedPrecision", "weightedRecall", "accuracy")) +val allowedParams = ParamValidators.inArray(Array("f1", "weightedPrecision", + "weightedRecall", "accuracy")) new Param(this,
spark git commit: [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score
Repository: spark Updated Branches: refs/heads/master 2ca563cc4 -> 2099e05f9 [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score ## What changes were proposed in this pull request? 1, del precision,recall in `ml.MulticlassClassificationEvaluator` 2, update user guide for `mlllib.weightedFMeasure` ## How was this patch tested? local build Author: Ruifeng Zheng Closes #13390 from zhengruifeng/clarify_f1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2099e05f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2099e05f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2099e05f Branch: refs/heads/master Commit: 2099e05f93067937cdf6cedcf493afd66e212abe Parents: 2ca563c Author: Ruifeng Zheng Authored: Sat Jun 4 13:56:04 2016 +0100 Committer: Sean Owen Committed: Sat Jun 4 13:56:04 2016 +0100 -- docs/mllib-evaluation-metrics.md| 16 +++- .../MulticlassClassificationEvaluator.scala | 12 +--- .../MulticlassClassificationEvaluatorSuite.scala| 2 +- python/pyspark/ml/evaluation.py | 4 +--- 4 files changed, 10 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2099e05f/docs/mllib-evaluation-metrics.md -- diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md index a269dbf..c49bc4f 100644 --- a/docs/mllib-evaluation-metrics.md +++ b/docs/mllib-evaluation-metrics.md @@ -140,7 +140,7 @@ definitions of positive and negative labels is straightforward. Label based metrics Opposed to binary classification where there are only two possible labels, multiclass classification problems have many -possible labels and so the concept of label-based metrics is introduced. Overall precision measures precision across all +possible labels and so the concept of label-based metrics is introduced. Accuracy measures precision across all labels - the number of times any class was predicted correctly (true positives) normalized by the number of data points. Precision by label considers only one class, and measures the number of time a specific label was predicted correctly normalized by the number of times that label appears in the output. @@ -182,21 +182,11 @@ $$\hat{\delta}(x) = \begin{cases}1 & \text{if $x = 0$}, \\ 0 & \text{otherwise}. - Overall Precision - $PPV = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1} \hat{\delta}\left(\hat{\mathbf{y}}_i - -\mathbf{y}_i\right)$ - - - Overall Recall - $TPR = \frac{TP}{TP + FN} = \frac{1}{N}\sum_{i=0}^{N-1} \hat{\delta}\left(\hat{\mathbf{y}}_i - + Accuracy + $ACC = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1} \hat{\delta}\left(\hat{\mathbf{y}}_i - \mathbf{y}_i\right)$ - Overall F1-measure - $F1 = 2 \cdot \left(\frac{PPV \cdot TPR} - {PPV + TPR}\right)$ - - Precision by label $PPV(\ell) = \frac{TP}{TP + FP} = \frac{\sum_{i=0}^{N-1} \hat{\delta}(\hat{\mathbf{y}}_i - \ell) \cdot \hat{\delta}(\mathbf{y}_i - \ell)} http://git-wip-us.apache.org/repos/asf/spark/blob/2099e05f/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala b/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala index 0b84e0a..794b1e7 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala @@ -39,16 +39,16 @@ class MulticlassClassificationEvaluator @Since("1.5.0") (@Since("1.5.0") overrid def this() = this(Identifiable.randomUID("mcEval")) /** - * param for metric name in evaluation (supports `"f1"` (default), `"precision"`, `"recall"`, - * `"weightedPrecision"`, `"weightedRecall"`, `"accuracy"`) + * param for metric name in evaluation (supports `"f1"` (default), `"weightedPrecision"`, + * `"weightedRecall"`, `"accuracy"`) * @group param */ @Since("1.5.0") val metricName: Param[String] = { -val allowedParams = ParamValidators.inArray(Array("f1", "precision", - "recall", "weightedPrecision", "weightedRecall", "accuracy")) +val allowedParams = ParamValidators.inArray(Array("f1", "weightedPrecision", + "weightedRecall", "accuracy")) new Param(this, "metricName", "metric name in evaluation " + - "(f1|precision|recall|weightedPrecision|weightedReca