date:20160604

spark git commit: [SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

2016-06-04 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 7e4c9dd55 -> 32a64d8fc


[SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

## What changes were proposed in this pull request?
In forType function of object RandomDataGenerator, the code following:
if (maybeSqlTypeGenerator.isDefined){
  
  Some(generator)
} else{
 None
}
will be changed. Instead, maybeSqlTypeGenerator.map will be used.

## How was this patch tested?
All of the current unit tests passed.

Author: Weiqing Yang 

Closes #13448 from Sherry302/master.

(cherry picked from commit 0f307db5e17e1e8a655cfa751218ac4ed88717a7)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/32a64d8f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/32a64d8f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/32a64d8f

Branch: refs/heads/branch-2.0
Commit: 32a64d8fc9e7ddaf993bdd7e679113dc605a69a7
Parents: 7e4c9dd
Author: Weiqing Yang 
Authored: Sat Jun 4 22:44:03 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 4 22:44:12 2016 +0100

--
 .../scala/org/apache/spark/sql/RandomDataGenerator.scala | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/32a64d8f/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
index 711e870..8508697 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
@@ -236,9 +236,8 @@ object RandomDataGenerator {
 // convert it to catalyst value to call udt's deserialize.
 val toCatalystType = 
CatalystTypeConverters.createToCatalystConverter(udt.sqlType)
 
-if (maybeSqlTypeGenerator.isDefined) {
-  val sqlTypeGenerator = maybeSqlTypeGenerator.get
-  val generator = () => {
+maybeSqlTypeGenerator.map { sqlTypeGenerator =>
+  () => {
 val generatedScalaValue = sqlTypeGenerator.apply()
 if (generatedScalaValue == null) {
   null
@@ -246,9 +245,6 @@ object RandomDataGenerator {
   udt.deserialize(toCatalystType(generatedScalaValue))
 }
   }
-  Some(generator)
-} else {
-  None
 }
   case unsupportedType => None
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

2016-06-04 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 091f81e1f -> 0f307db5e


[SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

## What changes were proposed in this pull request?
In forType function of object RandomDataGenerator, the code following:
if (maybeSqlTypeGenerator.isDefined){
  
  Some(generator)
} else{
 None
}
will be changed. Instead, maybeSqlTypeGenerator.map will be used.

## How was this patch tested?
All of the current unit tests passed.

Author: Weiqing Yang 

Closes #13448 from Sherry302/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0f307db5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0f307db5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0f307db5

Branch: refs/heads/master
Commit: 0f307db5e17e1e8a655cfa751218ac4ed88717a7
Parents: 091f81e
Author: Weiqing Yang 
Authored: Sat Jun 4 22:44:03 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 4 22:44:03 2016 +0100

--
 .../scala/org/apache/spark/sql/RandomDataGenerator.scala | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0f307db5/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
index 711e870..8508697 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
@@ -236,9 +236,8 @@ object RandomDataGenerator {
 // convert it to catalyst value to call udt's deserialize.
 val toCatalystType = 
CatalystTypeConverters.createToCatalystConverter(udt.sqlType)
 
-if (maybeSqlTypeGenerator.isDefined) {
-  val sqlTypeGenerator = maybeSqlTypeGenerator.get
-  val generator = () => {
+maybeSqlTypeGenerator.map { sqlTypeGenerator =>
+  () => {
 val generatedScalaValue = sqlTypeGenerator.apply()
 if (generatedScalaValue == null) {
   null
@@ -246,9 +245,6 @@ object RandomDataGenerator {
   udt.deserialize(toCatalystType(generatedScalaValue))
 }
   }
-  Some(generator)
-} else {
-  None
 }
   case unsupportedType => None
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-15762][SQL] Cache Metadata & StructType hashCodes; use singleton Metadata.empty

2016-06-04 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 ed1e20207 -> 7e4c9dd55


[SPARK-15762][SQL] Cache Metadata & StructType hashCodes; use singleton 
Metadata.empty

We should cache `Metadata.hashCode` and use a singleton for `Metadata.empty` 
because calculating metadata hashCodes appears to be a bottleneck for certain 
workloads.

We should also cache `StructType.hashCode`.

In an optimizer stress-test benchmark run by ericl, these `hashCode` calls 
accounted for roughly 40% of the total CPU time and this bottleneck was 
completely eliminated by the caching added by this patch.

Author: Josh Rosen 

Closes #13504 from JoshRosen/metadata-fix.

(cherry picked from commit 091f81e1f7ef1581376c71e3872ce06f4c1713bd)
Signed-off-by: Josh Rosen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7e4c9dd5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7e4c9dd5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7e4c9dd5

Branch: refs/heads/branch-2.0
Commit: 7e4c9dd55532b35030f6542f6640521596eb13f3
Parents: ed1e202
Author: Josh Rosen 
Authored: Sat Jun 4 14:14:50 2016 -0700
Committer: Josh Rosen 
Committed: Sat Jun 4 14:15:03 2016 -0700

--
 .../src/main/scala/org/apache/spark/sql/types/Metadata.scala  | 7 +--
 .../main/scala/org/apache/spark/sql/types/StructType.scala| 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7e4c9dd5/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala
index 1fb2e24..657bd86 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala
@@ -104,7 +104,8 @@ sealed class Metadata private[types] (private[types] val 
map: Map[String, Any])
 }
   }
 
-  override def hashCode: Int = Metadata.hash(this)
+  private lazy val _hashCode: Int = Metadata.hash(this)
+  override def hashCode: Int = _hashCode
 
   private def get[T](key: String): T = {
 map(key).asInstanceOf[T]
@@ -115,8 +116,10 @@ sealed class Metadata private[types] (private[types] val 
map: Map[String, Any])
 
 object Metadata {
 
+  private[this] val _empty = new Metadata(Map.empty)
+
   /** Returns an empty Metadata. */
-  def empty: Metadata = new Metadata(Map.empty)
+  def empty: Metadata = _empty
 
   /** Creates a Metadata instance from JSON. */
   def fromJson(json: String): Metadata = {

http://git-wip-us.apache.org/repos/asf/spark/blob/7e4c9dd5/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
index fd2b524..9a92373 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
@@ -112,7 +112,8 @@ case class StructType(fields: Array[StructField]) extends 
DataType with Seq[Stru
 }
   }
 
-  override def hashCode(): Int = 
java.util.Arrays.hashCode(fields.asInstanceOf[Array[AnyRef]])
+  private lazy val _hashCode: Int = 
java.util.Arrays.hashCode(fields.asInstanceOf[Array[AnyRef]])
+  override def hashCode(): Int = _hashCode
 
   /**
* Creates a new [[StructType]] by adding a new field.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-15762][SQL] Cache Metadata & StructType hashCodes; use singleton Metadata.empty

2016-06-04 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master 681387b2d -> 091f81e1f


[SPARK-15762][SQL] Cache Metadata & StructType hashCodes; use singleton 
Metadata.empty

We should cache `Metadata.hashCode` and use a singleton for `Metadata.empty` 
because calculating metadata hashCodes appears to be a bottleneck for certain 
workloads.

We should also cache `StructType.hashCode`.

In an optimizer stress-test benchmark run by ericl, these `hashCode` calls 
accounted for roughly 40% of the total CPU time and this bottleneck was 
completely eliminated by the caching added by this patch.

Author: Josh Rosen 

Closes #13504 from JoshRosen/metadata-fix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/091f81e1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/091f81e1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/091f81e1

Branch: refs/heads/master
Commit: 091f81e1f7ef1581376c71e3872ce06f4c1713bd
Parents: 681387b
Author: Josh Rosen 
Authored: Sat Jun 4 14:14:50 2016 -0700
Committer: Josh Rosen 
Committed: Sat Jun 4 14:14:50 2016 -0700

--
 .../src/main/scala/org/apache/spark/sql/types/Metadata.scala  | 7 +--
 .../main/scala/org/apache/spark/sql/types/StructType.scala| 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/091f81e1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala
index 1fb2e24..657bd86 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala
@@ -104,7 +104,8 @@ sealed class Metadata private[types] (private[types] val 
map: Map[String, Any])
 }
   }
 
-  override def hashCode: Int = Metadata.hash(this)
+  private lazy val _hashCode: Int = Metadata.hash(this)
+  override def hashCode: Int = _hashCode
 
   private def get[T](key: String): T = {
 map(key).asInstanceOf[T]
@@ -115,8 +116,10 @@ sealed class Metadata private[types] (private[types] val 
map: Map[String, Any])
 
 object Metadata {
 
+  private[this] val _empty = new Metadata(Map.empty)
+
   /** Returns an empty Metadata. */
-  def empty: Metadata = new Metadata(Map.empty)
+  def empty: Metadata = _empty
 
   /** Creates a Metadata instance from JSON. */
   def fromJson(json: String): Metadata = {

http://git-wip-us.apache.org/repos/asf/spark/blob/091f81e1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
index fd2b524..9a92373 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
@@ -112,7 +112,8 @@ case class StructType(fields: Array[StructField]) extends 
DataType with Seq[Stru
 }
   }
 
-  override def hashCode(): Int = 
java.util.Arrays.hashCode(fields.asInstanceOf[Array[AnyRef]])
+  private lazy val _hashCode: Int = 
java.util.Arrays.hashCode(fields.asInstanceOf[Array[AnyRef]])
+  override def hashCode(): Int = _hashCode
 
   /**
* Creates a new [[StructType]] by adding a new field.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright

2016-06-04 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 2099e05f9 -> 681387b2d


[MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license 
copyright

## What changes were proposed in this pull request?

Per conversation on dev list, add missing modernizr license.
Specify "2014 and onwards" in copyright statement.

## How was this patch tested?

(none required)

Author: Sean Owen 

Closes #13510 from srowen/ModernizrLicense.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/681387b2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/681387b2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/681387b2

Branch: refs/heads/master
Commit: 681387b2dc9a094cfba84188a1dd1ac9192bb99c
Parents: 2099e05
Author: Sean Owen 
Authored: Sat Jun 4 21:41:27 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 4 21:41:27 2016 +0100

--
 LICENSE|  1 +
 NOTICE |  2 +-
 licenses/LICENSE-modernizr.txt | 21 +
 3 files changed, 23 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/681387b2/LICENSE
--
diff --git a/LICENSE b/LICENSE
index f403640..94fd46f 100644
--- a/LICENSE
+++ b/LICENSE
@@ -296,3 +296,4 @@ The text of each license is also included at 
licenses/LICENSE-[project].txt.
  (MIT License) blockUI (http://jquery.malsup.com/block/)
  (MIT License) RowsGroup (http://datatables.net/license/mit)
  (MIT License) jsonFormatter 
(http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
+ (MIT License) modernizr 
(https://github.com/Modernizr/Modernizr/blob/master/LICENSE)

http://git-wip-us.apache.org/repos/asf/spark/blob/681387b2/NOTICE
--
diff --git a/NOTICE b/NOTICE
index f4b1260..69b513e 100644
--- a/NOTICE
+++ b/NOTICE
@@ -1,5 +1,5 @@
 Apache Spark
-Copyright 2014 The Apache Software Foundation.
+Copyright 2014 and onwards The Apache Software Foundation.
 
 This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).

http://git-wip-us.apache.org/repos/asf/spark/blob/681387b2/licenses/LICENSE-modernizr.txt
--
diff --git a/licenses/LICENSE-modernizr.txt b/licenses/LICENSE-modernizr.txt
new file mode 100644
index 000..2bf24b9
--- /dev/null
+++ b/licenses/LICENSE-modernizr.txt
@@ -0,0 +1,21 @@
+The MIT License (MIT)
+
+Copyright (c)  
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright

2016-06-04 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 729730159 -> ed1e20207


[MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license 
copyright

## What changes were proposed in this pull request?

Per conversation on dev list, add missing modernizr license.
Specify "2014 and onwards" in copyright statement.

## How was this patch tested?

(none required)

Author: Sean Owen 

Closes #13510 from srowen/ModernizrLicense.

(cherry picked from commit 681387b2dc9a094cfba84188a1dd1ac9192bb99c)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed1e2020
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed1e2020
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed1e2020

Branch: refs/heads/branch-2.0
Commit: ed1e20207c1c2e503a22d5ad2cdf505ef6ecbcad
Parents: 7297301
Author: Sean Owen 
Authored: Sat Jun 4 21:41:27 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 4 21:41:35 2016 +0100

--
 LICENSE|  1 +
 NOTICE |  2 +-
 licenses/LICENSE-modernizr.txt | 21 +
 3 files changed, 23 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/LICENSE
--
diff --git a/LICENSE b/LICENSE
index f403640..94fd46f 100644
--- a/LICENSE
+++ b/LICENSE
@@ -296,3 +296,4 @@ The text of each license is also included at 
licenses/LICENSE-[project].txt.
  (MIT License) blockUI (http://jquery.malsup.com/block/)
  (MIT License) RowsGroup (http://datatables.net/license/mit)
  (MIT License) jsonFormatter 
(http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
+ (MIT License) modernizr 
(https://github.com/Modernizr/Modernizr/blob/master/LICENSE)

http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/NOTICE
--
diff --git a/NOTICE b/NOTICE
index f4b1260..69b513e 100644
--- a/NOTICE
+++ b/NOTICE
@@ -1,5 +1,5 @@
 Apache Spark
-Copyright 2014 The Apache Software Foundation.
+Copyright 2014 and onwards The Apache Software Foundation.
 
 This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).

http://git-wip-us.apache.org/repos/asf/spark/blob/ed1e2020/licenses/LICENSE-modernizr.txt
--
diff --git a/licenses/LICENSE-modernizr.txt b/licenses/LICENSE-modernizr.txt
new file mode 100644
index 000..2bf24b9
--- /dev/null
+++ b/licenses/LICENSE-modernizr.txt
@@ -0,0 +1,21 @@
+The MIT License (MIT)
+
+Copyright (c)  
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score

2016-06-04 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 cf8782116 -> 729730159


[SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" 
f1_score

## What changes were proposed in this pull request?
1, del precision,recall in  `ml.MulticlassClassificationEvaluator`
2, update user guide for `mlllib.weightedFMeasure`

## How was this patch tested?
local build

Author: Ruifeng Zheng 

Closes #13390 from zhengruifeng/clarify_f1.

(cherry picked from commit 2099e05f93067937cdf6cedcf493afd66e212abe)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/72973015
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/72973015
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/72973015

Branch: refs/heads/branch-2.0
Commit: 729730159c6236cb437d215388d444f16849f405
Parents: cf87821
Author: Ruifeng Zheng 
Authored: Sat Jun 4 13:56:04 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 4 13:56:16 2016 +0100

--
 docs/mllib-evaluation-metrics.md| 16 +++-
 .../MulticlassClassificationEvaluator.scala | 12 +---
 .../MulticlassClassificationEvaluatorSuite.scala|  2 +-
 python/pyspark/ml/evaluation.py |  4 +---
 4 files changed, 10 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/72973015/docs/mllib-evaluation-metrics.md
--
diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md
index a269dbf..c49bc4f 100644
--- a/docs/mllib-evaluation-metrics.md
+++ b/docs/mllib-evaluation-metrics.md
@@ -140,7 +140,7 @@ definitions of positive and negative labels is 
straightforward.
  Label based metrics
 
 Opposed to binary classification where there are only two possible labels, 
multiclass classification problems have many
-possible labels and so the concept of label-based metrics is introduced. 
Overall precision measures precision across all
+possible labels and so the concept of label-based metrics is introduced. 
Accuracy measures precision across all
 labels -  the number of times any class was predicted correctly (true 
positives) normalized by the number of data
 points. Precision by label considers only one class, and measures the number 
of time a specific label was predicted
 correctly normalized by the number of times that label appears in the output.
@@ -182,21 +182,11 @@ $$\hat{\delta}(x) = \begin{cases}1 & \text{if $x = 0$}, 
\\ 0 & \text{otherwise}.
   
 
 
-  Overall Precision
-  $PPV = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1} 
\hat{\delta}\left(\hat{\mathbf{y}}_i -
-\mathbf{y}_i\right)$
-
-
-  Overall Recall
-  $TPR = \frac{TP}{TP + FN} = \frac{1}{N}\sum_{i=0}^{N-1} 
\hat{\delta}\left(\hat{\mathbf{y}}_i -
+  Accuracy
+  $ACC = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1} 
\hat{\delta}\left(\hat{\mathbf{y}}_i -
 \mathbf{y}_i\right)$
 
 
-  Overall F1-measure
-  $F1 = 2 \cdot \left(\frac{PPV \cdot TPR}
-  {PPV + TPR}\right)$
-
-
   Precision by label
   $PPV(\ell) = \frac{TP}{TP + FP} =
   \frac{\sum_{i=0}^{N-1} \hat{\delta}(\hat{\mathbf{y}}_i - \ell) \cdot 
\hat{\delta}(\mathbf{y}_i - \ell)}

http://git-wip-us.apache.org/repos/asf/spark/blob/72973015/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
index 0b84e0a..794b1e7 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
@@ -39,16 +39,16 @@ class MulticlassClassificationEvaluator @Since("1.5.0") 
(@Since("1.5.0") overrid
   def this() = this(Identifiable.randomUID("mcEval"))
 
   /**
-   * param for metric name in evaluation (supports `"f1"` (default), 
`"precision"`, `"recall"`,
-   * `"weightedPrecision"`, `"weightedRecall"`, `"accuracy"`)
+   * param for metric name in evaluation (supports `"f1"` (default), 
`"weightedPrecision"`,
+   * `"weightedRecall"`, `"accuracy"`)
* @group param
*/
   @Since("1.5.0")
   val metricName: Param[String] = {
-val allowedParams = ParamValidators.inArray(Array("f1", "precision",
-  "recall", "weightedPrecision", "weightedRecall", "accuracy"))
+val allowedParams = ParamValidators.inArray(Array("f1", 
"weightedPrecision",
+  "weightedRecall", "accuracy"))
 new Param(this,

spark git commit: [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score

2016-06-04 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 2ca563cc4 -> 2099e05f9


[SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" 
f1_score

## What changes were proposed in this pull request?
1, del precision,recall in  `ml.MulticlassClassificationEvaluator`
2, update user guide for `mlllib.weightedFMeasure`

## How was this patch tested?
local build

Author: Ruifeng Zheng 

Closes #13390 from zhengruifeng/clarify_f1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2099e05f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2099e05f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2099e05f

Branch: refs/heads/master
Commit: 2099e05f93067937cdf6cedcf493afd66e212abe
Parents: 2ca563c
Author: Ruifeng Zheng 
Authored: Sat Jun 4 13:56:04 2016 +0100
Committer: Sean Owen 
Committed: Sat Jun 4 13:56:04 2016 +0100

--
 docs/mllib-evaluation-metrics.md| 16 +++-
 .../MulticlassClassificationEvaluator.scala | 12 +---
 .../MulticlassClassificationEvaluatorSuite.scala|  2 +-
 python/pyspark/ml/evaluation.py |  4 +---
 4 files changed, 10 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2099e05f/docs/mllib-evaluation-metrics.md
--
diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md
index a269dbf..c49bc4f 100644
--- a/docs/mllib-evaluation-metrics.md
+++ b/docs/mllib-evaluation-metrics.md
@@ -140,7 +140,7 @@ definitions of positive and negative labels is 
straightforward.
  Label based metrics
 
 Opposed to binary classification where there are only two possible labels, 
multiclass classification problems have many
-possible labels and so the concept of label-based metrics is introduced. 
Overall precision measures precision across all
+possible labels and so the concept of label-based metrics is introduced. 
Accuracy measures precision across all
 labels -  the number of times any class was predicted correctly (true 
positives) normalized by the number of data
 points. Precision by label considers only one class, and measures the number 
of time a specific label was predicted
 correctly normalized by the number of times that label appears in the output.
@@ -182,21 +182,11 @@ $$\hat{\delta}(x) = \begin{cases}1 & \text{if $x = 0$}, 
\\ 0 & \text{otherwise}.
   
 
 
-  Overall Precision
-  $PPV = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1} 
\hat{\delta}\left(\hat{\mathbf{y}}_i -
-\mathbf{y}_i\right)$
-
-
-  Overall Recall
-  $TPR = \frac{TP}{TP + FN} = \frac{1}{N}\sum_{i=0}^{N-1} 
\hat{\delta}\left(\hat{\mathbf{y}}_i -
+  Accuracy
+  $ACC = \frac{TP}{TP + FP} = \frac{1}{N}\sum_{i=0}^{N-1} 
\hat{\delta}\left(\hat{\mathbf{y}}_i -
 \mathbf{y}_i\right)$
 
 
-  Overall F1-measure
-  $F1 = 2 \cdot \left(\frac{PPV \cdot TPR}
-  {PPV + TPR}\right)$
-
-
   Precision by label
   $PPV(\ell) = \frac{TP}{TP + FP} =
   \frac{\sum_{i=0}^{N-1} \hat{\delta}(\hat{\mathbf{y}}_i - \ell) \cdot 
\hat{\delta}(\mathbf{y}_i - \ell)}

http://git-wip-us.apache.org/repos/asf/spark/blob/2099e05f/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
index 0b84e0a..794b1e7 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
@@ -39,16 +39,16 @@ class MulticlassClassificationEvaluator @Since("1.5.0") 
(@Since("1.5.0") overrid
   def this() = this(Identifiable.randomUID("mcEval"))
 
   /**
-   * param for metric name in evaluation (supports `"f1"` (default), 
`"precision"`, `"recall"`,
-   * `"weightedPrecision"`, `"weightedRecall"`, `"accuracy"`)
+   * param for metric name in evaluation (supports `"f1"` (default), 
`"weightedPrecision"`,
+   * `"weightedRecall"`, `"accuracy"`)
* @group param
*/
   @Since("1.5.0")
   val metricName: Param[String] = {
-val allowedParams = ParamValidators.inArray(Array("f1", "precision",
-  "recall", "weightedPrecision", "weightedRecall", "accuracy"))
+val allowedParams = ParamValidators.inArray(Array("f1", 
"weightedPrecision",
+  "weightedRecall", "accuracy"))
 new Param(this, "metricName", "metric name in evaluation " +
-  "(f1|precision|recall|weightedPrecision|weightedReca

spark git commit: [SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

spark git commit: [SPARK-15707][SQL] Make Code Neat - Use map instead of if check.

spark git commit: [SPARK-15762][SQL] Cache Metadata & StructType hashCodes; use singleton Metadata.empty

spark git commit: [SPARK-15762][SQL] Cache Metadata & StructType hashCodes; use singleton Metadata.empty

spark git commit: [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright

spark git commit: [MINOR][BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright

spark git commit: [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score

spark git commit: [SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" f1_score

8 matches

Site Navigation

Mail list logo

Footer information