date:20190221

[spark] branch master updated: [R] update package description

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 927081d  [R] update package description
927081d is described below

commit 927081dd959217ed6bf014557db20026d7e22672
Author: Felix Cheung 
AuthorDate: Thu Feb 21 19:00:36 2019 +0800

[R] update package description

## What changes were proposed in this pull request?

update package description

Closes #23852 from felixcheung/rdesccran.

Authored-by: Felix Cheung 
Signed-off-by: Hyukjin Kwon 
---
 R/pkg/DESCRIPTION | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 736da46..4d48cd7 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,8 +1,8 @@
 Package: SparkR
 Type: Package
 Version: 3.0.0
-Title: R Frontend for Apache Spark
-Description: Provides an R Frontend for Apache Spark.
+Title: R Front end for 'Apache Spark'
+Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
 email = "shiva...@cs.berkeley.edu"),
  person("Xiangrui", "Meng", role = "aut",
@@ -11,8 +11,8 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = 
c("aut", "cre"),
 email = "felixche...@apache.org"),
  person(family = "The Apache Software Foundation", role = c("aut", 
"cph")))
 License: Apache License (== 2.0)
-URL: http://www.apache.org/ http://spark.apache.org/
-BugReports: http://spark.apache.org/contributing.html
+URL: https://www.apache.org/ https://spark.apache.org/
+BugReports: https://spark.apache.org/contributing.html
 SystemRequirements: Java (== 8)
 Depends:
 R (>= 3.1),


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26917][SQL] Cache lock recache by condition

2019-02-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 17d0cfc  [SPARK-26917][SQL] Cache lock recache by condition
17d0cfc is described below

commit 17d0cfcaa4a43fd55b81065d907538a9c1bf569b
Author: Dave DeCaprio 
AuthorDate: Thu Feb 21 09:04:50 2019 -0600

[SPARK-26917][SQL] Cache lock recache by condition

## What changes were proposed in this pull request?

Related to SPARK-26617 and SPARK-26548.  There was a new location we found 
where we were still seeing the locks.  We traced it to the recacheByCondition 
function.  In this PR I have changed that function so that the writeLock is not 
held while the condition is being evaluated.

cloud-fan & gatorsmile This is a further tweak to the other cache PRs we 
have done (which have helped us tremendously).

## How was this patch tested?

Has been tested on a live system where the blocking was causing major 
issues and it is working well.
CacheManager has no explicit unit test but is used in many places 
internally as part of the SharedState.

Closes #23833 from DaveDeCaprio/cache-lock-recacheByCondition.

Lead-authored-by: Dave DeCaprio 
Co-authored-by: David DeCaprio 
Signed-off-by: Sean Owen 
---
 .../org/apache/spark/sql/execution/CacheManager.scala| 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
index 398d7b4..c6ee735 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
@@ -145,17 +145,19 @@ class CacheManager extends Logging {
 _.sameResult(plan)
   }
 val plansToUncache = mutable.Buffer[CachedData]()
-writeLock {
+readLock {
   val it = cachedData.iterator()
   while (it.hasNext) {
 val cd = it.next()
 if (shouldRemove(cd.plan)) {
   plansToUncache += cd
-  it.remove()
 }
   }
 }
 plansToUncache.foreach { cd =>
+  writeLock {
+cachedData.remove(cd)
+  }
   cd.cachedRepresentation.cacheBuilder.clearCache(blocking)
 }
 // Re-compile dependent cached queries after removing the cached query.
@@ -193,19 +195,21 @@ class CacheManager extends Logging {
   spark: SparkSession,
   condition: CachedData => Boolean): Unit = {
 val needToRecache = scala.collection.mutable.ArrayBuffer.empty[CachedData]
-writeLock {
+readLock {
   val it = cachedData.iterator()
   while (it.hasNext) {
 val cd = it.next()
 if (condition(cd)) {
   needToRecache += cd
-  // Remove the cache entry before we create a new one, so that we can 
have a different
-  // physical plan.
-  it.remove()
 }
   }
 }
 needToRecache.map { cd =>
+  writeLock {
+// Remove the cache entry before we create a new one, so that we can 
have a different
+// physical plan.
+cachedData.remove(cd)
+  }
   cd.cachedRepresentation.cacheBuilder.clearCache()
   val plan = spark.sessionState.executePlan(cd.plan).executedPlan
   val newCache = InMemoryRelation(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [DOCS] MINOR Complement the document of stringOrderType for StringIndexer in PySpark

2019-02-21 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 91caf0b  [DOCS] MINOR Complement the document of stringOrderType for 
StringIndexer in PySpark
91caf0b is described below

commit 91caf0bfce4706a264fcfe222fa500354ce69ff1
Author: Liang-Chi Hsieh 
AuthorDate: Thu Feb 21 08:36:48 2019 -0800

[DOCS] MINOR Complement the document of stringOrderType for StringIndexer 
in PySpark

## What changes were proposed in this pull request?

We revised the behavior of the param `stringOrderType` of `StringIndexer` 
in case of equal frequency when under frequencyDesc/Asc. This isn't reflected 
in PySpark's document. We should do it.

## How was this patch tested?

Only document change.

Closes #23849 from viirya/py-stringindexer-doc.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Holden Karau 
---
 python/pyspark/ml/feature.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index 0d1e9bd..8583046 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -2299,7 +2299,10 @@ class _StringIndexerParams(JavaParams, HasHandleInvalid, 
HasInputCol, HasOutputC
 stringOrderType = Param(Params._dummy(), "stringOrderType",
 "How to order labels of string column. The first 
label after " +
 "ordering is assigned an index of 0. Supported 
options: " +
-"frequencyDesc, frequencyAsc, alphabetDesc, 
alphabetAsc.",
+"frequencyDesc, frequencyAsc, alphabetDesc, 
alphabetAsc. " +
+"Default is frequencyDesc. In case of equal 
frequency when " +
+"under frequencyDesc/Asc, the strings are further 
sorted " +
+"alphabetically",
 typeConverter=TypeConverters.toString)
 
 handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle 
invalid data (unseen " +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [R][BACKPORT-2.4] update package description

2019-02-21 Thread felixcheung

This is an automated email from the ASF dual-hosted git repository.

felixcheung pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new d8576301 [R][BACKPORT-2.4] update package description
d8576301 is described below

commit d8576301fd1d33675a9542791e58e7963081ce04
Author: Felix Cheung 
AuthorDate: Thu Feb 21 08:42:15 2019 -0800

[R][BACKPORT-2.4] update package description

#23852

doesn't port cleanly to 2.4. we need this in branch-2.4 and branch-2.3

Author: Felix Cheung 

Closes #23860 from felixcheung/2.4rdesc.
---
 R/pkg/DESCRIPTION | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 2361289..5e3d186 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,8 +1,8 @@
 Package: SparkR
 Type: Package
 Version: 2.4.2
-Title: R Frontend for Apache Spark
-Description: Provides an R Frontend for Apache Spark.
+Title: R Front end for 'Apache Spark'
+Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
 email = "shiva...@cs.berkeley.edu"),
  person("Xiangrui", "Meng", role = "aut",
@@ -11,8 +11,8 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = 
c("aut", "cre"),
 email = "felixche...@apache.org"),
  person(family = "The Apache Software Foundation", role = c("aut", 
"cph")))
 License: Apache License (== 2.0)
-URL: http://www.apache.org/ http://spark.apache.org/
-BugReports: http://spark.apache.org/contributing.html
+URL: https://www.apache.org/ https://spark.apache.org/
+BugReports: https://spark.apache.org/contributing.html
 SystemRequirements: Java (== 8)
 Depends:
 R (>= 3.0),


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.3 updated: [R][BACKPORT-2.4] update package description

2019-02-21 Thread felixcheung

This is an automated email from the ASF dual-hosted git repository.

felixcheung pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new 6691c04  [R][BACKPORT-2.4] update package description
6691c04 is described below

commit 6691c041dcab4c19d362aaff74f56e5beeda85cd
Author: Felix Cheung 
AuthorDate: Thu Feb 21 08:42:15 2019 -0800

[R][BACKPORT-2.4] update package description

doesn't port cleanly to 2.4. we need this in branch-2.4 and branch-2.3

Author: Felix Cheung 

Closes #23860 from felixcheung/2.4rdesc.

(cherry picked from commit d8576301fd1d33675a9542791e58e7963081ce04)
Signed-off-by: Felix Cheung 
---
 R/pkg/DESCRIPTION | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index a82446e..136d782 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,8 +1,6 @@
 Package: SparkR
 Type: Package
 Version: 2.3.4
-Title: R Frontend for Apache Spark
-Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
 email = "shiva...@cs.berkeley.edu"),
  person("Xiangrui", "Meng", role = "aut",
@@ -11,8 +9,8 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = 
c("aut", "cre"),
 email = "felixche...@apache.org"),
  person(family = "The Apache Software Foundation", role = c("aut", 
"cph")))
 License: Apache License (== 2.0)
-URL: http://www.apache.org/ http://spark.apache.org/
-BugReports: http://spark.apache.org/contributing.html
+URL: https://www.apache.org/ https://spark.apache.org/
+BugReports: https://spark.apache.org/contributing.html
 SystemRequirements: Java (== 8)
 Depends:
 R (>= 3.0),


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] tag v2.4.1-rc4 created (now 79c1f7e)

2019-02-21 Thread dbtsai

This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a change to tag v2.4.1-rc4
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 79c1f7e  (commit)
This tag includes the following new commits:

 new 79c1f7e  Preparing Spark release v2.4.1-rc4

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v2.4.1-rc4

2019-02-21 Thread dbtsai

This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to tag v2.4.1-rc4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 79c1f7e89c6c15704d046fa4d334cacce3d19217
Author: DB Tsai 
AuthorDate: Thu Feb 21 23:01:58 2019 +

Preparing Spark release v2.4.1-rc4
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 5e3d186..be924c9 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.2
+Version: 2.4.1
 Title: R Front end for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index cdf..8e11fd6 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.2-SNAPSHOT
+2.4.1
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 092f85b..f0eee07 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.2-SNAPSHOT
+2.4.1
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 5236fd6..8c8bdf4 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.2-SNAPSHOT
+2.4.1
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index b70dadf..663f41d 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.2-SNAPSHOT
+2.4.1
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index e9ae143..9acade1 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.2-SNAPSHOT
+2.4.1
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 2ae4fcb..1a31a39 100644
--

[spark] 01/01: Preparing development version 2.4.2-SNAPSHOT

2019-02-21 Thread dbtsai

This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 32825442283fcd4c244c59ad7c4c9331538e790d
Author: DB Tsai 
AuthorDate: Thu Feb 21 23:02:17 2019 +

Preparing development version 2.4.2-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index be924c9..5e3d186 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.1
+Version: 2.4.2
 Title: R Front end for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 8e11fd6..cdf 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1
+2.4.2-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f0eee07..092f85b 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1
+2.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 8c8bdf4..5236fd6 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1
+2.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 663f41d..b70dadf 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1
+2.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 9acade1..e9ae143 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1
+2.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 1a31a39..2ae4

[spark] branch branch-2.4 updated (d8576301 -> 3282544)

2019-02-21 Thread dbtsai

This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d8576301 [R][BACKPORT-2.4] update package description
 add 79c1f7e  Preparing Spark release v2.4.1-rc4
 new 3282544  Preparing development version 2.4.2-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26958][SQL][TEST] Add NestedSchemaPruningBenchmark

2019-02-21 Thread dbtsai

This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6bd995b  [SPARK-26958][SQL][TEST] Add NestedSchemaPruningBenchmark
6bd995b is described below

commit 6bd995b1013f3753bf1df68d75b4bd9cd2337fae
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 21 23:39:36 2019 +

[SPARK-26958][SQL][TEST] Add NestedSchemaPruningBenchmark

## What changes were proposed in this pull request?

This adds `NestedSchemaPruningBenchmark` to show the nested schema pruning 
performance clearly and to verify new PR's performance benefit and to prevent 
the future performance degradation.

## How was this patch tested?

Manually run the benchmark.

```
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.benchmark.NestedSchemaPruningBenchmark"
```

Closes #23862 from dongjoon-hyun/SPARK-NESTED-SCHEMA-PRUNING-BM.

Lead-authored-by: Dongjoon Hyun 
Co-authored-by: DB Tsai 
Signed-off-by: DB Tsai 
---
 .../NestedSchemaPruningBenchmark-results.txt   |  40 +
 .../benchmark/NestedSchemaPruningBenchmark.scala   | 163 +
 2 files changed, 203 insertions(+)

diff --git a/sql/core/benchmarks/NestedSchemaPruningBenchmark-results.txt 
b/sql/core/benchmarks/NestedSchemaPruningBenchmark-results.txt
new file mode 100644
index 000..7585cae
--- /dev/null
+++ b/sql/core/benchmarks/NestedSchemaPruningBenchmark-results.txt
@@ -0,0 +1,40 @@
+
+Nested Schema Pruning Benchmark
+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
+Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
+Selection:   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+Top-level column59 /   68 16.9 
 59.1   1.0X
+Nested column  180 /  186  5.6 
179.7   0.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
+Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
+Limiting:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+Top-level column   241 /  246  4.2 
240.9   1.0X
+Nested column 1828 / 1904  0.5
1827.5   0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
+Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
+Repartitioning:  Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+Top-level column   201 /  208  5.0 
200.8   1.0X
+Nested column 1811 / 1864  0.6
1811.4   0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
+Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
+Repartitioning by exprs: Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+Top-level column   206 /  212  4.9 
205.8   1.0X
+Nested column 1814 / 1863  0.6
1814.3   0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
+Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
+Sorting: Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+Top-level column   282 /  302  3.5 
281.7   1.0X
+Nested column 2093 / 2199  0.5
2093.1   0.1X
+
+
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
new file mode 100644
index 000..ddfc8ae
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for addition

[spark] branch master updated: [SPARK-26958][SQL][TEST] Add NestedSchemaPruningBenchmark

2019-02-21 Thread dbtsai

This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6bd995b  [SPARK-26958][SQL][TEST] Add NestedSchemaPruningBenchmark
6bd995b is described below

commit 6bd995b1013f3753bf1df68d75b4bd9cd2337fae
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 21 23:39:36 2019 +

[SPARK-26958][SQL][TEST] Add NestedSchemaPruningBenchmark

## What changes were proposed in this pull request?

This adds `NestedSchemaPruningBenchmark` to show the nested schema pruning 
performance clearly and to verify new PR's performance benefit and to prevent 
the future performance degradation.

## How was this patch tested?

Manually run the benchmark.

```
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.benchmark.NestedSchemaPruningBenchmark"
```

Closes #23862 from dongjoon-hyun/SPARK-NESTED-SCHEMA-PRUNING-BM.

Lead-authored-by: Dongjoon Hyun 
Co-authored-by: DB Tsai 
Signed-off-by: DB Tsai 
---
 .../NestedSchemaPruningBenchmark-results.txt   |  40 +
 .../benchmark/NestedSchemaPruningBenchmark.scala   | 163 +
 2 files changed, 203 insertions(+)

diff --git a/sql/core/benchmarks/NestedSchemaPruningBenchmark-results.txt 
b/sql/core/benchmarks/NestedSchemaPruningBenchmark-results.txt
new file mode 100644
index 000..7585cae
--- /dev/null
+++ b/sql/core/benchmarks/NestedSchemaPruningBenchmark-results.txt
@@ -0,0 +1,40 @@
+
+Nested Schema Pruning Benchmark
+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
+Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
+Selection:   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+Top-level column59 /   68 16.9 
 59.1   1.0X
+Nested column  180 /  186  5.6 
179.7   0.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
+Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
+Limiting:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+Top-level column   241 /  246  4.2 
240.9   1.0X
+Nested column 1828 / 1904  0.5
1827.5   0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
+Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
+Repartitioning:  Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+Top-level column   201 /  208  5.0 
200.8   1.0X
+Nested column 1811 / 1864  0.6
1811.4   0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
+Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
+Repartitioning by exprs: Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+Top-level column   206 /  212  4.9 
205.8   1.0X
+Nested column 1814 / 1863  0.6
1814.3   0.1X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09 on Mac OS X 10.14.3
+Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
+Sorting: Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+Top-level column   282 /  302  3.5 
281.7   1.0X
+Nested column 2093 / 2199  0.5
2093.1   0.1X
+
+
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
new file mode 100644
index 000..ddfc8ae
--- /dev/null
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for addition

[spark] branch master updated: [SPARK-26960][ML] Wait for listener bus to clear in MLEventsSuite to reduce test flakiness

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new be1cadf  [SPARK-26960][ML] Wait for listener bus to clear in 
MLEventsSuite to reduce test flakiness
be1cadf is described below

commit be1cadf16dc70e22eae144b3dfce9e269ef95acc
Author: Joseph K. Bradley 
AuthorDate: Fri Feb 22 10:08:16 2019 +0800

[SPARK-26960][ML] Wait for listener bus to clear in MLEventsSuite to reduce 
test flakiness

## What changes were proposed in this pull request?

This patch aims to address flakiness I've observed in MLEventsSuite in 
these tests:
*  test("pipeline read/write events")
*  test("pipeline model read/write events")

The issue is in the "read/write events" tests, which work as follows:
* write
* wait until we see at least 1 write-related SparkListenerEvent
* read
* wait until we see at least 1 read-related SparkListenerEvent

The problem is that the last step does NOT allow any write-related 
SparkListenerEvents, but some of those events may be delayed enough that they 
are seen in this last step. We should ideally add logic before "read" to wait 
until the listener events are cleared/complete. Looking into other 
SparkListener tests, we need to use `sc.listenerBus.waitUntilEmpty(TIMEOUT)`.

This patch adds the waitUntilEmpty() call.

## How was this patch tested?

It's a test!

Closes #23863 from jkbradley/SPARK-26960.

Authored-by: Joseph K. Bradley 
Signed-off-by: Hyukjin Kwon 
---
 mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala 
b/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala
index 80ae0c7..4fe69b6 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala
@@ -239,6 +239,7 @@ class MLEventsSuite
   events.map(JsonProtocol.sparkEventToJson).foreach { event =>
 assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
   }
+  sc.listenerBus.waitUntilEmpty(timeoutMillis = 1)
 
   events.clear()
   val pipelineReader = Pipeline.read
@@ -297,6 +298,7 @@ class MLEventsSuite
   events.map(JsonProtocol.sparkEventToJson).foreach { event =>
 assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
   }
+  sc.listenerBus.waitUntilEmpty(timeoutMillis = 1)
 
   events.clear()
   val pipelineModelReader = PipelineModel.read


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26960][ML] Wait for listener bus to clear in MLEventsSuite to reduce test flakiness

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new be1cadf  [SPARK-26960][ML] Wait for listener bus to clear in 
MLEventsSuite to reduce test flakiness
be1cadf is described below

commit be1cadf16dc70e22eae144b3dfce9e269ef95acc
Author: Joseph K. Bradley 
AuthorDate: Fri Feb 22 10:08:16 2019 +0800

[SPARK-26960][ML] Wait for listener bus to clear in MLEventsSuite to reduce 
test flakiness

## What changes were proposed in this pull request?

This patch aims to address flakiness I've observed in MLEventsSuite in 
these tests:
*  test("pipeline read/write events")
*  test("pipeline model read/write events")

The issue is in the "read/write events" tests, which work as follows:
* write
* wait until we see at least 1 write-related SparkListenerEvent
* read
* wait until we see at least 1 read-related SparkListenerEvent

The problem is that the last step does NOT allow any write-related 
SparkListenerEvents, but some of those events may be delayed enough that they 
are seen in this last step. We should ideally add logic before "read" to wait 
until the listener events are cleared/complete. Looking into other 
SparkListener tests, we need to use `sc.listenerBus.waitUntilEmpty(TIMEOUT)`.

This patch adds the waitUntilEmpty() call.

## How was this patch tested?

It's a test!

Closes #23863 from jkbradley/SPARK-26960.

Authored-by: Joseph K. Bradley 
Signed-off-by: Hyukjin Kwon 
---
 mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala 
b/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala
index 80ae0c7..4fe69b6 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala
@@ -239,6 +239,7 @@ class MLEventsSuite
   events.map(JsonProtocol.sparkEventToJson).foreach { event =>
 assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
   }
+  sc.listenerBus.waitUntilEmpty(timeoutMillis = 1)
 
   events.clear()
   val pipelineReader = Pipeline.read
@@ -297,6 +298,7 @@ class MLEventsSuite
   events.map(JsonProtocol.sparkEventToJson).foreach { event =>
 assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
   }
+  sc.listenerBus.waitUntilEmpty(timeoutMillis = 1)
 
   events.clear()
   val pipelineModelReader = PipelineModel.read


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.3 updated: [R][BACKPORT-2.3] update package description

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new 36db45d  [R][BACKPORT-2.3] update package description
36db45d is described below

commit 36db45d5b90ddc3ce54febff2ed41cd29c0a8a04
Author: Felix Cheung 
AuthorDate: Fri Feb 22 10:12:38 2019 +0800

[R][BACKPORT-2.3] update package description

## What changes were proposed in this pull request?

#23852

doesn't port cleanly to 2.3. we need this in branch-2.4 and branch-2.3

Closes #23861 from felixcheung/2.3rdesc.

Authored-by: Felix Cheung 
Signed-off-by: Hyukjin Kwon 
---
 R/pkg/DESCRIPTION | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 136d782..464caf1 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,8 @@
 Package: SparkR
 Type: Package
 Version: 2.3.4
+Title: R Front end for 'Apache Spark'
+Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
 email = "shiva...@cs.berkeley.edu"),
  person("Xiangrui", "Meng", role = "aut",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.3 updated: [R][BACKPORT-2.3] update package description

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new 36db45d  [R][BACKPORT-2.3] update package description
36db45d is described below

commit 36db45d5b90ddc3ce54febff2ed41cd29c0a8a04
Author: Felix Cheung 
AuthorDate: Fri Feb 22 10:12:38 2019 +0800

[R][BACKPORT-2.3] update package description

## What changes were proposed in this pull request?

#23852

doesn't port cleanly to 2.3. we need this in branch-2.4 and branch-2.3

Closes #23861 from felixcheung/2.3rdesc.

Authored-by: Felix Cheung 
Signed-off-by: Hyukjin Kwon 
---
 R/pkg/DESCRIPTION | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 136d782..464caf1 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,8 @@
 Package: SparkR
 Type: Package
 Version: 2.3.4
+Title: R Front end for 'Apache Spark'
+Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
 email = "shiva...@cs.berkeley.edu"),
  person("Xiangrui", "Meng", role = "aut",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [R][BACKPORT-2.3] update package description

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 8d68d54  [R][BACKPORT-2.3] update package description
8d68d54 is described below

commit 8d68d54f2e2cbbe55a4bb87c2216cff896add517
Author: Felix Cheung 
AuthorDate: Fri Feb 22 10:12:38 2019 +0800

[R][BACKPORT-2.3] update package description

doesn't port cleanly to 2.3. we need this in branch-2.4 and branch-2.3

Closes #23861 from felixcheung/2.3rdesc.

Authored-by: Felix Cheung 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 36db45d5b90ddc3ce54febff2ed41cd29c0a8a04)
Signed-off-by: Hyukjin Kwon 
---
 R/append/commits/0   | 2 ++
 R/append/commits/1   | 2 ++
 R/append/metadata| 1 +
 R/append/offsets/0   | 3 +++
 R/append/offsets/1   | 3 +++
 R/append/sources/0/0 | 2 ++
 R/append/sources/0/1 | 2 ++
 7 files changed, 15 insertions(+)

diff --git a/R/append/commits/0 b/R/append/commits/0
new file mode 100644
index 000..9c1e302
--- /dev/null
+++ b/R/append/commits/0
@@ -0,0 +1,2 @@
+v1
+{"nextBatchWatermarkMs":0}
\ No newline at end of file
diff --git a/R/append/commits/1 b/R/append/commits/1
new file mode 100644
index 000..9c1e302
--- /dev/null
+++ b/R/append/commits/1
@@ -0,0 +1,2 @@
+v1
+{"nextBatchWatermarkMs":0}
\ No newline at end of file
diff --git a/R/append/metadata b/R/append/metadata
new file mode 100644
index 000..e10d274
--- /dev/null
+++ b/R/append/metadata
@@ -0,0 +1 @@
+{"id":"816b9eb3-4e0e-4419-aa6b-042fe770fe9e"}
\ No newline at end of file
diff --git a/R/append/offsets/0 b/R/append/offsets/0
new file mode 100644
index 000..f725b7e
--- /dev/null
+++ b/R/append/offsets/0
@@ -0,0 +1,3 @@
+v1
+{"batchWatermarkMs":0,"batchTimestampMs":1550545145189,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"200"}}
+{"logOffset":0}
\ No newline at end of file
diff --git a/R/append/offsets/1 b/R/append/offsets/1
new file mode 100644
index 000..6a8b0cf
--- /dev/null
+++ b/R/append/offsets/1
@@ -0,0 +1,3 @@
+v1
+{"batchWatermarkMs":0,"batchTimestampMs":1550546700082,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"200"}}
+{"logOffset":1}
\ No newline at end of file
diff --git a/R/append/sources/0/0 b/R/append/sources/0/0
new file mode 100644
index 000..72abd6c
--- /dev/null
+++ b/R/append/sources/0/0
@@ -0,0 +1,2 @@
+v1
+{"path":"file:///var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/RtmpYrC5NR/sparkr-test403b46ee34f0.parquet/part-0-b8e0fa75-2067-4518-abc9-9f187ef289c4-c000.snappy.parquet","timestamp":1550545144000,"batchId":0}
\ No newline at end of file
diff --git a/R/append/sources/0/1 b/R/append/sources/0/1
new file mode 100644
index 000..b336c6b
--- /dev/null
+++ b/R/append/sources/0/1
@@ -0,0 +1,2 @@
+v1
+{"path":"file:///var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/RtmpDDmJpK/sparkr-testb1994d9aae56.parquet/part-0-9f3a8856-ef41-47d7-86a1-6f5a9ae8501d-c000.snappy.parquet","timestamp":1550546699000,"batchId":1}
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [R][BACKPORT-2.3] update package description

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 8d68d54  [R][BACKPORT-2.3] update package description
8d68d54 is described below

commit 8d68d54f2e2cbbe55a4bb87c2216cff896add517
Author: Felix Cheung 
AuthorDate: Fri Feb 22 10:12:38 2019 +0800

[R][BACKPORT-2.3] update package description

doesn't port cleanly to 2.3. we need this in branch-2.4 and branch-2.3

Closes #23861 from felixcheung/2.3rdesc.

Authored-by: Felix Cheung 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 36db45d5b90ddc3ce54febff2ed41cd29c0a8a04)
Signed-off-by: Hyukjin Kwon 
---
 R/append/commits/0   | 2 ++
 R/append/commits/1   | 2 ++
 R/append/metadata| 1 +
 R/append/offsets/0   | 3 +++
 R/append/offsets/1   | 3 +++
 R/append/sources/0/0 | 2 ++
 R/append/sources/0/1 | 2 ++
 7 files changed, 15 insertions(+)

diff --git a/R/append/commits/0 b/R/append/commits/0
new file mode 100644
index 000..9c1e302
--- /dev/null
+++ b/R/append/commits/0
@@ -0,0 +1,2 @@
+v1
+{"nextBatchWatermarkMs":0}
\ No newline at end of file
diff --git a/R/append/commits/1 b/R/append/commits/1
new file mode 100644
index 000..9c1e302
--- /dev/null
+++ b/R/append/commits/1
@@ -0,0 +1,2 @@
+v1
+{"nextBatchWatermarkMs":0}
\ No newline at end of file
diff --git a/R/append/metadata b/R/append/metadata
new file mode 100644
index 000..e10d274
--- /dev/null
+++ b/R/append/metadata
@@ -0,0 +1 @@
+{"id":"816b9eb3-4e0e-4419-aa6b-042fe770fe9e"}
\ No newline at end of file
diff --git a/R/append/offsets/0 b/R/append/offsets/0
new file mode 100644
index 000..f725b7e
--- /dev/null
+++ b/R/append/offsets/0
@@ -0,0 +1,3 @@
+v1
+{"batchWatermarkMs":0,"batchTimestampMs":1550545145189,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"200"}}
+{"logOffset":0}
\ No newline at end of file
diff --git a/R/append/offsets/1 b/R/append/offsets/1
new file mode 100644
index 000..6a8b0cf
--- /dev/null
+++ b/R/append/offsets/1
@@ -0,0 +1,3 @@
+v1
+{"batchWatermarkMs":0,"batchTimestampMs":1550546700082,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"200"}}
+{"logOffset":1}
\ No newline at end of file
diff --git a/R/append/sources/0/0 b/R/append/sources/0/0
new file mode 100644
index 000..72abd6c
--- /dev/null
+++ b/R/append/sources/0/0
@@ -0,0 +1,2 @@
+v1
+{"path":"file:///var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/RtmpYrC5NR/sparkr-test403b46ee34f0.parquet/part-0-b8e0fa75-2067-4518-abc9-9f187ef289c4-c000.snappy.parquet","timestamp":1550545144000,"batchId":0}
\ No newline at end of file
diff --git a/R/append/sources/0/1 b/R/append/sources/0/1
new file mode 100644
index 000..b336c6b
--- /dev/null
+++ b/R/append/sources/0/1
@@ -0,0 +1,2 @@
+v1
+{"path":"file:///var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/RtmpDDmJpK/sparkr-testb1994d9aae56.parquet/part-0-9f3a8856-ef41-47d7-86a1-6f5a9ae8501d-c000.snappy.parquet","timestamp":1550546699000,"batchId":1}
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: Revert "[R][BACKPORT-2.3] update package description"

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new b403612  Revert "[R][BACKPORT-2.3] update package description"
b403612 is described below

commit b40361249f8b9dae2ade1e8579c9a271e248b5a9
Author: Hyukjin Kwon 
AuthorDate: Fri Feb 22 10:14:56 2019 +0800

Revert "[R][BACKPORT-2.3] update package description"

This reverts commit 8d68d54f2e2cbbe55a4bb87c2216cff896add517.
---
 R/append/commits/0   | 2 --
 R/append/commits/1   | 2 --
 R/append/metadata| 1 -
 R/append/offsets/0   | 3 ---
 R/append/offsets/1   | 3 ---
 R/append/sources/0/0 | 2 --
 R/append/sources/0/1 | 2 --
 7 files changed, 15 deletions(-)

diff --git a/R/append/commits/0 b/R/append/commits/0
deleted file mode 100644
index 9c1e302..000
--- a/R/append/commits/0
+++ /dev/null
@@ -1,2 +0,0 @@
-v1
-{"nextBatchWatermarkMs":0}
\ No newline at end of file
diff --git a/R/append/commits/1 b/R/append/commits/1
deleted file mode 100644
index 9c1e302..000
--- a/R/append/commits/1
+++ /dev/null
@@ -1,2 +0,0 @@
-v1
-{"nextBatchWatermarkMs":0}
\ No newline at end of file
diff --git a/R/append/metadata b/R/append/metadata
deleted file mode 100644
index e10d274..000
--- a/R/append/metadata
+++ /dev/null
@@ -1 +0,0 @@
-{"id":"816b9eb3-4e0e-4419-aa6b-042fe770fe9e"}
\ No newline at end of file
diff --git a/R/append/offsets/0 b/R/append/offsets/0
deleted file mode 100644
index f725b7e..000
--- a/R/append/offsets/0
+++ /dev/null
@@ -1,3 +0,0 @@
-v1
-{"batchWatermarkMs":0,"batchTimestampMs":1550545145189,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"200"}}
-{"logOffset":0}
\ No newline at end of file
diff --git a/R/append/offsets/1 b/R/append/offsets/1
deleted file mode 100644
index 6a8b0cf..000
--- a/R/append/offsets/1
+++ /dev/null
@@ -1,3 +0,0 @@
-v1
-{"batchWatermarkMs":0,"batchTimestampMs":1550546700082,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"200"}}
-{"logOffset":1}
\ No newline at end of file
diff --git a/R/append/sources/0/0 b/R/append/sources/0/0
deleted file mode 100644
index 72abd6c..000
--- a/R/append/sources/0/0
+++ /dev/null
@@ -1,2 +0,0 @@
-v1
-{"path":"file:///var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/RtmpYrC5NR/sparkr-test403b46ee34f0.parquet/part-0-b8e0fa75-2067-4518-abc9-9f187ef289c4-c000.snappy.parquet","timestamp":1550545144000,"batchId":0}
\ No newline at end of file
diff --git a/R/append/sources/0/1 b/R/append/sources/0/1
deleted file mode 100644
index b336c6b..000
--- a/R/append/sources/0/1
+++ /dev/null
@@ -1,2 +0,0 @@
-v1
-{"path":"file:///var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/RtmpDDmJpK/sparkr-testb1994d9aae56.parquet/part-0-9f3a8856-ef41-47d7-86a1-6f5a9ae8501d-c000.snappy.parquet","timestamp":1550546699000,"batchId":1}
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: Revert "[R][BACKPORT-2.3] update package description"

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new b403612  Revert "[R][BACKPORT-2.3] update package description"
b403612 is described below

commit b40361249f8b9dae2ade1e8579c9a271e248b5a9
Author: Hyukjin Kwon 
AuthorDate: Fri Feb 22 10:14:56 2019 +0800

Revert "[R][BACKPORT-2.3] update package description"

This reverts commit 8d68d54f2e2cbbe55a4bb87c2216cff896add517.
---
 R/append/commits/0   | 2 --
 R/append/commits/1   | 2 --
 R/append/metadata| 1 -
 R/append/offsets/0   | 3 ---
 R/append/offsets/1   | 3 ---
 R/append/sources/0/0 | 2 --
 R/append/sources/0/1 | 2 --
 7 files changed, 15 deletions(-)

diff --git a/R/append/commits/0 b/R/append/commits/0
deleted file mode 100644
index 9c1e302..000
--- a/R/append/commits/0
+++ /dev/null
@@ -1,2 +0,0 @@
-v1
-{"nextBatchWatermarkMs":0}
\ No newline at end of file
diff --git a/R/append/commits/1 b/R/append/commits/1
deleted file mode 100644
index 9c1e302..000
--- a/R/append/commits/1
+++ /dev/null
@@ -1,2 +0,0 @@
-v1
-{"nextBatchWatermarkMs":0}
\ No newline at end of file
diff --git a/R/append/metadata b/R/append/metadata
deleted file mode 100644
index e10d274..000
--- a/R/append/metadata
+++ /dev/null
@@ -1 +0,0 @@
-{"id":"816b9eb3-4e0e-4419-aa6b-042fe770fe9e"}
\ No newline at end of file
diff --git a/R/append/offsets/0 b/R/append/offsets/0
deleted file mode 100644
index f725b7e..000
--- a/R/append/offsets/0
+++ /dev/null
@@ -1,3 +0,0 @@
-v1
-{"batchWatermarkMs":0,"batchTimestampMs":1550545145189,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"200"}}
-{"logOffset":0}
\ No newline at end of file
diff --git a/R/append/offsets/1 b/R/append/offsets/1
deleted file mode 100644
index 6a8b0cf..000
--- a/R/append/offsets/1
+++ /dev/null
@@ -1,3 +0,0 @@
-v1
-{"batchWatermarkMs":0,"batchTimestampMs":1550546700082,"conf":{"spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider","spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion":"2","spark.sql.streaming.multipleWatermarkPolicy":"min","spark.sql.streaming.aggregation.stateFormatVersion":"2","spark.sql.shuffle.partitions":"200"}}
-{"logOffset":1}
\ No newline at end of file
diff --git a/R/append/sources/0/0 b/R/append/sources/0/0
deleted file mode 100644
index 72abd6c..000
--- a/R/append/sources/0/0
+++ /dev/null
@@ -1,2 +0,0 @@
-v1
-{"path":"file:///var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/RtmpYrC5NR/sparkr-test403b46ee34f0.parquet/part-0-b8e0fa75-2067-4518-abc9-9f187ef289c4-c000.snappy.parquet","timestamp":1550545144000,"batchId":0}
\ No newline at end of file
diff --git a/R/append/sources/0/1 b/R/append/sources/0/1
deleted file mode 100644
index b336c6b..000
--- a/R/append/sources/0/1
+++ /dev/null
@@ -1,2 +0,0 @@
-v1
-{"path":"file:///var/folders/71/484zt4z10ks1vydt03bhp6hrgp/T/RtmpDDmJpK/sparkr-testb1994d9aae56.parquet/part-0-9f3a8856-ef41-47d7-86a1-6f5a9ae8501d-c000.snappy.parquet","timestamp":1550546699000,"batchId":1}
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][SQL] Fix typo in exception about set table properties.

2019-02-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f9776e3  [MINOR][SQL] Fix typo in exception about set table properties.
f9776e3 is described below

commit f9776e389215255dc61efaa2eddd92a1fa754b48
Author: gengjiaan 
AuthorDate: Thu Feb 21 22:13:47 2019 -0600

[MINOR][SQL] Fix typo in exception about set table properties.

## What changes were proposed in this pull request?

The function of the method named verifyTableProperties is

`If the given table properties contains datasource properties, throw an 
exception. We will do this check when create or alter a table, i.e. when we try 
to write table metadata to Hive metastore.`

But the message of AnalysisException in verifyTableProperties contains one 
typo and one unsuited word.
So I change the exception from

`Cannot persistent ${table.qualifiedName} into hive metastore`

to

`Cannot persist ${table.qualifiedName} into Hive metastore`

## How was this patch tested?

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Closes #23574 from beliefer/incorrect-analysis-exception.

Authored-by: gengjiaan 
Signed-off-by: Sean Owen 
---
 .../src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index c1178ad..11a2192 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -128,7 +128,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
   private def verifyTableProperties(table: CatalogTable): Unit = {
 val invalidKeys = 
table.properties.keys.filter(_.startsWith(SPARK_SQL_PREFIX))
 if (invalidKeys.nonEmpty) {
-  throw new AnalysisException(s"Cannot persistent ${table.qualifiedName} 
into hive metastore " +
+  throw new AnalysisException(s"Cannot persist ${table.qualifiedName} into 
Hive metastore " +
 s"as table property keys may not start with '$SPARK_SQL_PREFIX': " +
 invalidKeys.mkString("[", ", ", "]"))
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][SQL] Fix typo in exception about set table properties.

2019-02-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f9776e3  [MINOR][SQL] Fix typo in exception about set table properties.
f9776e3 is described below

commit f9776e389215255dc61efaa2eddd92a1fa754b48
Author: gengjiaan 
AuthorDate: Thu Feb 21 22:13:47 2019 -0600

[MINOR][SQL] Fix typo in exception about set table properties.

## What changes were proposed in this pull request?

The function of the method named verifyTableProperties is

`If the given table properties contains datasource properties, throw an 
exception. We will do this check when create or alter a table, i.e. when we try 
to write table metadata to Hive metastore.`

But the message of AnalysisException in verifyTableProperties contains one 
typo and one unsuited word.
So I change the exception from

`Cannot persistent ${table.qualifiedName} into hive metastore`

to

`Cannot persist ${table.qualifiedName} into Hive metastore`

## How was this patch tested?

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Closes #23574 from beliefer/incorrect-analysis-exception.

Authored-by: gengjiaan 
Signed-off-by: Sean Owen 
---
 .../src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index c1178ad..11a2192 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -128,7 +128,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
   private def verifyTableProperties(table: CatalogTable): Unit = {
 val invalidKeys = 
table.properties.keys.filter(_.startsWith(SPARK_SQL_PREFIX))
 if (invalidKeys.nonEmpty) {
-  throw new AnalysisException(s"Cannot persistent ${table.qualifiedName} 
into hive metastore " +
+  throw new AnalysisException(s"Cannot persist ${table.qualifiedName} into 
Hive metastore " +
 s"as table property keys may not start with '$SPARK_SQL_PREFIX': " +
 invalidKeys.mkString("[", ", ", "]"))
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26955][CORE] Align Spark's TimSort to jdk11 implementation

2019-02-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1304974  [SPARK-26955][CORE] Align Spark's TimSort to jdk11 
implementation
1304974 is described below

commit 13049745391f97601ebc1402e77210638d908e40
Author: Maxim Gekk 
AuthorDate: Thu Feb 21 22:18:23 2019 -0600

[SPARK-26955][CORE] Align Spark's TimSort to jdk11 implementation

## What changes were proposed in this pull request?

Spark's TimSort deviates from JDK 11 TimSort in a couple places:
- `stackLen` was increased in jdk
- additional cases for break in `mergeCollapse`: `n < 0`

In the PR, I propose to align Spark TimSort to jdk implementation.

## How was this patch tested?

By existing test suites, in particular, `SorterSuite`.

Closes #23858 from MaxGekk/timsort-java-alignment.

Authored-by: Maxim Gekk 
Signed-off-by: Sean Owen 
---
 .../java/org/apache/spark/util/collection/TimSort.java  | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/util/collection/TimSort.java 
b/core/src/main/java/org/apache/spark/util/collection/TimSort.java
index 40b5fb7..3142866 100644
--- a/core/src/main/java/org/apache/spark/util/collection/TimSort.java
+++ b/core/src/main/java/org/apache/spark/util/collection/TimSort.java
@@ -409,10 +409,14 @@ class TimSort {
* large) stack lengths for smaller arrays.  The "magic numbers" in the
* computation below must be changed if MIN_MERGE is decreased.  See
* the MIN_MERGE declaration above for more information.
+   * The maximum value of 49 allows for an array up to length
+   * Integer.MAX_VALUE-4, if array is filled by the worst case stack size
+   * increasing scenario. More explanations are given in section 4 of:
+   * http://envisage-project.eu/wp-content/uploads/2015/02/sorting.pdf
*/
   int stackLen = (len <120  ?  5 :
   len <   1542  ? 10 :
-  len < 119151  ? 19 : 40);
+  len < 119151  ? 24 : 49);
   runBase = new int[stackLen];
   runLen = new int[stackLen];
 }
@@ -439,15 +443,20 @@ class TimSort {
  * This method is called each time a new run is pushed onto the stack,
  * so the invariants are guaranteed to hold for i < stackSize upon
  * entry to the method.
+ *
+ * Thanks to Stijn de Gouw, Jurriaan Rot, Frank S. de Boer,
+ * Richard Bubel and Reiner Hahnle, this is fixed with respect to
+ * the analysis in "On the Worst-Case Complexity of TimSort" by
+ * Nicolas Auger, Vincent Jug, Cyril Nicaud, and Carine Pivoteau.
  */
 private void mergeCollapse() {
   while (stackSize > 1) {
 int n = stackSize - 2;
-if ( (n >= 1 && runLen[n-1] <= runLen[n] + runLen[n+1])
-  || (n >= 2 && runLen[n-2] <= runLen[n] + runLen[n-1])) {
+if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1] ||
+n > 1 && runLen[n-2] <= runLen[n] + runLen[n-1]) {
   if (runLen[n - 1] < runLen[n + 1])
 n--;
-} else if (runLen[n] > runLen[n + 1]) {
+} else if (n < 0 || runLen[n] > runLen[n + 1]) {
   break; // Invariant is established
 }
 mergeAt(n);


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26955][CORE] Align Spark's TimSort to jdk11 implementation

2019-02-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1304974  [SPARK-26955][CORE] Align Spark's TimSort to jdk11 
implementation
1304974 is described below

commit 13049745391f97601ebc1402e77210638d908e40
Author: Maxim Gekk 
AuthorDate: Thu Feb 21 22:18:23 2019 -0600

[SPARK-26955][CORE] Align Spark's TimSort to jdk11 implementation

## What changes were proposed in this pull request?

Spark's TimSort deviates from JDK 11 TimSort in a couple places:
- `stackLen` was increased in jdk
- additional cases for break in `mergeCollapse`: `n < 0`

In the PR, I propose to align Spark TimSort to jdk implementation.

## How was this patch tested?

By existing test suites, in particular, `SorterSuite`.

Closes #23858 from MaxGekk/timsort-java-alignment.

Authored-by: Maxim Gekk 
Signed-off-by: Sean Owen 
---
 .../java/org/apache/spark/util/collection/TimSort.java  | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/util/collection/TimSort.java 
b/core/src/main/java/org/apache/spark/util/collection/TimSort.java
index 40b5fb7..3142866 100644
--- a/core/src/main/java/org/apache/spark/util/collection/TimSort.java
+++ b/core/src/main/java/org/apache/spark/util/collection/TimSort.java
@@ -409,10 +409,14 @@ class TimSort {
* large) stack lengths for smaller arrays.  The "magic numbers" in the
* computation below must be changed if MIN_MERGE is decreased.  See
* the MIN_MERGE declaration above for more information.
+   * The maximum value of 49 allows for an array up to length
+   * Integer.MAX_VALUE-4, if array is filled by the worst case stack size
+   * increasing scenario. More explanations are given in section 4 of:
+   * http://envisage-project.eu/wp-content/uploads/2015/02/sorting.pdf
*/
   int stackLen = (len <120  ?  5 :
   len <   1542  ? 10 :
-  len < 119151  ? 19 : 40);
+  len < 119151  ? 24 : 49);
   runBase = new int[stackLen];
   runLen = new int[stackLen];
 }
@@ -439,15 +443,20 @@ class TimSort {
  * This method is called each time a new run is pushed onto the stack,
  * so the invariants are guaranteed to hold for i < stackSize upon
  * entry to the method.
+ *
+ * Thanks to Stijn de Gouw, Jurriaan Rot, Frank S. de Boer,
+ * Richard Bubel and Reiner Hahnle, this is fixed with respect to
+ * the analysis in "On the Worst-Case Complexity of TimSort" by
+ * Nicolas Auger, Vincent Jug, Cyril Nicaud, and Carine Pivoteau.
  */
 private void mergeCollapse() {
   while (stackSize > 1) {
 int n = stackSize - 2;
-if ( (n >= 1 && runLen[n-1] <= runLen[n] + runLen[n+1])
-  || (n >= 2 && runLen[n-2] <= runLen[n] + runLen[n-1])) {
+if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1] ||
+n > 1 && runLen[n-2] <= runLen[n] + runLen[n-1]) {
   if (runLen[n - 1] < runLen[n + 1])
 n--;
-} else if (runLen[n] > runLen[n + 1]) {
+} else if (n < 0 || runLen[n] > runLen[n + 1]) {
   break; // Invariant is established
 }
 mergeAt(n);


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-25097][ML] Support prediction on single instance in KMeans/BiKMeans/GMM

2019-02-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 89d42dc  [SPARK-25097][ML] Support prediction on single instance in 
KMeans/BiKMeans/GMM
89d42dc is described below

commit 89d42dc6d38c9508b7009652323d6b343742c5b8
Author: zhengruifeng 
AuthorDate: Thu Feb 21 22:21:28 2019 -0600

[SPARK-25097][ML] Support prediction on single instance in 
KMeans/BiKMeans/GMM

## What changes were proposed in this pull request?
expose method `predict` in KMeans/BiKMeans/GMM

## How was this patch tested?
added testsuites

Closes #22087 from zhengruifeng/clu_pre_instance.

Authored-by: zhengruifeng 
Signed-off-by: Sean Owen 
---
 .../spark/ml/clustering/BisectingKMeans.scala  |  6 ++---
 .../spark/ml/clustering/GaussianMixture.scala  |  6 +++--
 .../org/apache/spark/ml/clustering/KMeans.scala|  7 +++---
 .../spark/ml/clustering/BisectingKMeansSuite.scala |  7 ++
 .../spark/ml/clustering/GaussianMixtureSuite.scala | 10 
 .../apache/spark/ml/clustering/KMeansSuite.scala   |  7 ++
 .../scala/org/apache/spark/ml/util/MLTest.scala| 28 +-
 7 files changed, 61 insertions(+), 10 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
index d846f17..03afdbe 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
@@ -19,7 +19,6 @@ package org.apache.spark.ml.clustering
 
 import org.apache.hadoop.fs.Path
 
-import org.apache.spark.SparkException
 import org.apache.spark.annotation.{Experimental, Since}
 import org.apache.spark.ml.{Estimator, Model}
 import org.apache.spark.ml.linalg.Vector
@@ -30,7 +29,7 @@ import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.clustering.{BisectingKMeans => 
MLlibBisectingKMeans,
   BisectingKMeansModel => MLlibBisectingKMeansModel}
 import org.apache.spark.mllib.linalg.VectorImplicits._
-import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.{DataFrame, Dataset}
 import org.apache.spark.sql.functions.udf
 import org.apache.spark.sql.types.{IntegerType, StructType}
 
@@ -118,7 +117,8 @@ class BisectingKMeansModel private[ml] (
 validateAndTransformSchema(schema)
   }
 
-  private[clustering] def predict(features: Vector): Int = 
parentModel.predict(features)
+  @Since("3.0.0")
+  def predict(features: Vector): Int = parentModel.predict(features)
 
   @Since("2.0.0")
   def clusterCenters: Array[Vector] = parentModel.clusterCenters.map(_.asML)
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala
index c27ba55..3d6d1e3 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala
@@ -121,12 +121,14 @@ class GaussianMixtureModel private[ml] (
 validateAndTransformSchema(schema)
   }
 
-  private[clustering] def predict(features: Vector): Int = {
+  @Since("3.0.0")
+  def predict(features: Vector): Int = {
 val r = predictProbability(features)
 r.argmax
   }
 
-  private[clustering] def predictProbability(features: Vector): Vector = {
+  @Since("3.0.0")
+  def predictProbability(features: Vector): Vector = {
 val probs: Array[Double] =
   
GaussianMixtureModel.computeProbabilities(features.asBreeze.toDenseVector, 
gaussians, weights)
 Vectors.dense(probs)
diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
index 319747d..b48a966 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
@@ -21,7 +21,6 @@ import scala.collection.mutable
 
 import org.apache.hadoop.fs.Path
 
-import org.apache.spark.SparkException
 import org.apache.spark.annotation.{Experimental, Since}
 import org.apache.spark.ml.{Estimator, Model, PipelineStage}
 import org.apache.spark.ml.linalg.Vector
@@ -32,8 +31,7 @@ import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.clustering.{DistanceMeasure, KMeans => 
MLlibKMeans, KMeansModel => MLlibKMeansModel}
 import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => 
OldVectors}
 import org.apache.spark.mllib.linalg.VectorImplicits._
-import org.apache.spark.rdd.RDD
-import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession}
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
 import org.apache.spark.sql.func

[spark] branch master updated: [SPARK-25097][ML] Support prediction on single instance in KMeans/BiKMeans/GMM

2019-02-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 89d42dc  [SPARK-25097][ML] Support prediction on single instance in 
KMeans/BiKMeans/GMM
89d42dc is described below

commit 89d42dc6d38c9508b7009652323d6b343742c5b8
Author: zhengruifeng 
AuthorDate: Thu Feb 21 22:21:28 2019 -0600

[SPARK-25097][ML] Support prediction on single instance in 
KMeans/BiKMeans/GMM

## What changes were proposed in this pull request?
expose method `predict` in KMeans/BiKMeans/GMM

## How was this patch tested?
added testsuites

Closes #22087 from zhengruifeng/clu_pre_instance.

Authored-by: zhengruifeng 
Signed-off-by: Sean Owen 
---
 .../spark/ml/clustering/BisectingKMeans.scala  |  6 ++---
 .../spark/ml/clustering/GaussianMixture.scala  |  6 +++--
 .../org/apache/spark/ml/clustering/KMeans.scala|  7 +++---
 .../spark/ml/clustering/BisectingKMeansSuite.scala |  7 ++
 .../spark/ml/clustering/GaussianMixtureSuite.scala | 10 
 .../apache/spark/ml/clustering/KMeansSuite.scala   |  7 ++
 .../scala/org/apache/spark/ml/util/MLTest.scala| 28 +-
 7 files changed, 61 insertions(+), 10 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
index d846f17..03afdbe 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
@@ -19,7 +19,6 @@ package org.apache.spark.ml.clustering
 
 import org.apache.hadoop.fs.Path
 
-import org.apache.spark.SparkException
 import org.apache.spark.annotation.{Experimental, Since}
 import org.apache.spark.ml.{Estimator, Model}
 import org.apache.spark.ml.linalg.Vector
@@ -30,7 +29,7 @@ import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.clustering.{BisectingKMeans => 
MLlibBisectingKMeans,
   BisectingKMeansModel => MLlibBisectingKMeansModel}
 import org.apache.spark.mllib.linalg.VectorImplicits._
-import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.{DataFrame, Dataset}
 import org.apache.spark.sql.functions.udf
 import org.apache.spark.sql.types.{IntegerType, StructType}
 
@@ -118,7 +117,8 @@ class BisectingKMeansModel private[ml] (
 validateAndTransformSchema(schema)
   }
 
-  private[clustering] def predict(features: Vector): Int = 
parentModel.predict(features)
+  @Since("3.0.0")
+  def predict(features: Vector): Int = parentModel.predict(features)
 
   @Since("2.0.0")
   def clusterCenters: Array[Vector] = parentModel.clusterCenters.map(_.asML)
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala
index c27ba55..3d6d1e3 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala
@@ -121,12 +121,14 @@ class GaussianMixtureModel private[ml] (
 validateAndTransformSchema(schema)
   }
 
-  private[clustering] def predict(features: Vector): Int = {
+  @Since("3.0.0")
+  def predict(features: Vector): Int = {
 val r = predictProbability(features)
 r.argmax
   }
 
-  private[clustering] def predictProbability(features: Vector): Vector = {
+  @Since("3.0.0")
+  def predictProbability(features: Vector): Vector = {
 val probs: Array[Double] =
   
GaussianMixtureModel.computeProbabilities(features.asBreeze.toDenseVector, 
gaussians, weights)
 Vectors.dense(probs)
diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
index 319747d..b48a966 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
@@ -21,7 +21,6 @@ import scala.collection.mutable
 
 import org.apache.hadoop.fs.Path
 
-import org.apache.spark.SparkException
 import org.apache.spark.annotation.{Experimental, Since}
 import org.apache.spark.ml.{Estimator, Model, PipelineStage}
 import org.apache.spark.ml.linalg.Vector
@@ -32,8 +31,7 @@ import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.clustering.{DistanceMeasure, KMeans => 
MLlibKMeans, KMeansModel => MLlibKMeansModel}
 import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => 
OldVectors}
 import org.apache.spark.mllib.linalg.VectorImplicits._
-import org.apache.spark.rdd.RDD
-import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession}
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
 import org.apache.spark.sql.func

[spark] branch master updated: [SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values

2019-02-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ffef3d4  [SPARK-26950][SQL][TEST] Make RandomDataGenerator use 
Float.NaN or Double.NaN for all NaN values
ffef3d4 is described below

commit ffef3d40741b0be321421aa52a6e17a26d89f541
Author: Dongjoon Hyun 
AuthorDate: Fri Feb 22 12:25:26 2019 +0800

[SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or 
Double.NaN for all NaN values

## What changes were proposed in this pull request?

Apache Spark uses the predefined `Float.NaN` and `Double.NaN` for NaN 
values, but there exists more NaN values with different binary presentations.

```scala
scala> java.nio.ByteBuffer.allocate(4).putFloat(Float.NaN).array
res1: Array[Byte] = Array(127, -64, 0, 0)

scala> val x = java.lang.Float.intBitsToFloat(-6966608)
x: Float = NaN

scala> java.nio.ByteBuffer.allocate(4).putFloat(x).array
res2: Array[Byte] = Array(-1, -107, -78, -80)
```

Since users can have these values, `RandomDataGenerator` generates these 
NaN values. However, this causes `checkEvaluationWithUnsafeProjection` failures 
due to the difference between `UnsafeRow` binary presentation. The following is 
the UT failure instance. This PR aims to fix this UT flakiness.

- 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102528/testReport/

## How was this patch tested?

Pass the Jenkins with the newly added test cases.

Closes #23851 from dongjoon-hyun/SPARK-26950.

Authored-by: Dongjoon Hyun 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/RandomDataGenerator.scala | 24 +++--
 .../spark/sql/RandomDataGeneratorSuite.scala   | 31 ++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
index 8ae3ff5..d361e62 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
@@ -17,8 +17,6 @@
 
 package org.apache.spark.sql
 
-import java.lang.Double.longBitsToDouble
-import java.lang.Float.intBitsToFloat
 import java.math.MathContext
 
 import scala.collection.mutable
@@ -70,6 +68,28 @@ object RandomDataGenerator {
   }
 
   /**
+   * A wrapper of Float.intBitsToFloat to use a unique NaN value for all NaN 
values.
+   * This prevents `checkEvaluationWithUnsafeProjection` from failing due to
+   * the difference between `UnsafeRow` binary presentation for NaN.
+   * This is visible for testing.
+   */
+  def intBitsToFloat(bits: Int): Float = {
+val value = java.lang.Float.intBitsToFloat(bits)
+if (value.isNaN) Float.NaN else value
+  }
+
+  /**
+   * A wrapper of Double.longBitsToDouble to use a unique NaN value for all 
NaN values.
+   * This prevents `checkEvaluationWithUnsafeProjection` from failing due to
+   * the difference between `UnsafeRow` binary presentation for NaN.
+   * This is visible for testing.
+   */
+  def longBitsToDouble(bits: Long): Double = {
+val value = java.lang.Double.longBitsToDouble(bits)
+if (value.isNaN) Double.NaN else value
+  }
+
+  /**
* Returns a randomly generated schema, based on the given accepted types.
*
* @param numFields the number of fields in this schema
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
index 3c2f8a2..3e62ca0 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
@@ -17,6 +17,9 @@
 
 package org.apache.spark.sql
 
+import java.nio.ByteBuffer
+import java.util.Arrays
+
 import scala.util.Random
 
 import org.apache.spark.SparkFunSuite
@@ -106,4 +109,32 @@ class RandomDataGeneratorSuite extends SparkFunSuite {
   assert(deviation.toDouble / expectedTotalElements < 2e-1)
 }
   }
+
+  test("Use Float.NaN for all NaN values") {
+val bits = -6966608
+val nan1 = java.lang.Float.intBitsToFloat(bits)
+val nan2 = RandomDataGenerator.intBitsToFloat(bits)
+assert(nan1.isNaN)
+assert(nan2.isNaN)
+
+val arrayExpected = ByteBuffer.allocate(4).putFloat(Float.NaN).array
+val array1 = ByteBuffer.allocate(4).putFloat(nan1).array
+val array2 = ByteBuffer.allocate(4).putFloat(nan2).array
+assert(!Arrays.equals(array1, arrayExpected))
+assert(Arrays.equals(array2, arrayExpected))
+  }
+
+  test("Use Double.NaN for all NaN values") {
+val bits = -6966608
+val nan1 = java.lang.Double.longBi

[spark] branch master updated: [SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values

2019-02-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ffef3d4  [SPARK-26950][SQL][TEST] Make RandomDataGenerator use 
Float.NaN or Double.NaN for all NaN values
ffef3d4 is described below

commit ffef3d40741b0be321421aa52a6e17a26d89f541
Author: Dongjoon Hyun 
AuthorDate: Fri Feb 22 12:25:26 2019 +0800

[SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or 
Double.NaN for all NaN values

## What changes were proposed in this pull request?

Apache Spark uses the predefined `Float.NaN` and `Double.NaN` for NaN 
values, but there exists more NaN values with different binary presentations.

```scala
scala> java.nio.ByteBuffer.allocate(4).putFloat(Float.NaN).array
res1: Array[Byte] = Array(127, -64, 0, 0)

scala> val x = java.lang.Float.intBitsToFloat(-6966608)
x: Float = NaN

scala> java.nio.ByteBuffer.allocate(4).putFloat(x).array
res2: Array[Byte] = Array(-1, -107, -78, -80)
```

Since users can have these values, `RandomDataGenerator` generates these 
NaN values. However, this causes `checkEvaluationWithUnsafeProjection` failures 
due to the difference between `UnsafeRow` binary presentation. The following is 
the UT failure instance. This PR aims to fix this UT flakiness.

- 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102528/testReport/

## How was this patch tested?

Pass the Jenkins with the newly added test cases.

Closes #23851 from dongjoon-hyun/SPARK-26950.

Authored-by: Dongjoon Hyun 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/RandomDataGenerator.scala | 24 +++--
 .../spark/sql/RandomDataGeneratorSuite.scala   | 31 ++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
index 8ae3ff5..d361e62 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
@@ -17,8 +17,6 @@
 
 package org.apache.spark.sql
 
-import java.lang.Double.longBitsToDouble
-import java.lang.Float.intBitsToFloat
 import java.math.MathContext
 
 import scala.collection.mutable
@@ -70,6 +68,28 @@ object RandomDataGenerator {
   }
 
   /**
+   * A wrapper of Float.intBitsToFloat to use a unique NaN value for all NaN 
values.
+   * This prevents `checkEvaluationWithUnsafeProjection` from failing due to
+   * the difference between `UnsafeRow` binary presentation for NaN.
+   * This is visible for testing.
+   */
+  def intBitsToFloat(bits: Int): Float = {
+val value = java.lang.Float.intBitsToFloat(bits)
+if (value.isNaN) Float.NaN else value
+  }
+
+  /**
+   * A wrapper of Double.longBitsToDouble to use a unique NaN value for all 
NaN values.
+   * This prevents `checkEvaluationWithUnsafeProjection` from failing due to
+   * the difference between `UnsafeRow` binary presentation for NaN.
+   * This is visible for testing.
+   */
+  def longBitsToDouble(bits: Long): Double = {
+val value = java.lang.Double.longBitsToDouble(bits)
+if (value.isNaN) Double.NaN else value
+  }
+
+  /**
* Returns a randomly generated schema, based on the given accepted types.
*
* @param numFields the number of fields in this schema
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
index 3c2f8a2..3e62ca0 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
@@ -17,6 +17,9 @@
 
 package org.apache.spark.sql
 
+import java.nio.ByteBuffer
+import java.util.Arrays
+
 import scala.util.Random
 
 import org.apache.spark.SparkFunSuite
@@ -106,4 +109,32 @@ class RandomDataGeneratorSuite extends SparkFunSuite {
   assert(deviation.toDouble / expectedTotalElements < 2e-1)
 }
   }
+
+  test("Use Float.NaN for all NaN values") {
+val bits = -6966608
+val nan1 = java.lang.Float.intBitsToFloat(bits)
+val nan2 = RandomDataGenerator.intBitsToFloat(bits)
+assert(nan1.isNaN)
+assert(nan2.isNaN)
+
+val arrayExpected = ByteBuffer.allocate(4).putFloat(Float.NaN).array
+val array1 = ByteBuffer.allocate(4).putFloat(nan1).array
+val array2 = ByteBuffer.allocate(4).putFloat(nan2).array
+assert(!Arrays.equals(array1, arrayExpected))
+assert(Arrays.equals(array2, arrayExpected))
+  }
+
+  test("Use Double.NaN for all NaN values") {
+val bits = -6966608
+val nan1 = java.lang.Double.longBi

[spark] branch branch-2.4 updated: [SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values

2019-02-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new ef67be3  [SPARK-26950][SQL][TEST] Make RandomDataGenerator use 
Float.NaN or Double.NaN for all NaN values
ef67be3 is described below

commit ef67be363be6d6b6954b55ef1c243a0672b84abb
Author: Dongjoon Hyun 
AuthorDate: Fri Feb 22 12:25:26 2019 +0800

[SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or 
Double.NaN for all NaN values

## What changes were proposed in this pull request?

Apache Spark uses the predefined `Float.NaN` and `Double.NaN` for NaN 
values, but there exists more NaN values with different binary presentations.

```scala
scala> java.nio.ByteBuffer.allocate(4).putFloat(Float.NaN).array
res1: Array[Byte] = Array(127, -64, 0, 0)

scala> val x = java.lang.Float.intBitsToFloat(-6966608)
x: Float = NaN

scala> java.nio.ByteBuffer.allocate(4).putFloat(x).array
res2: Array[Byte] = Array(-1, -107, -78, -80)
```

Since users can have these values, `RandomDataGenerator` generates these 
NaN values. However, this causes `checkEvaluationWithUnsafeProjection` failures 
due to the difference between `UnsafeRow` binary presentation. The following is 
the UT failure instance. This PR aims to fix this UT flakiness.

- 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102528/testReport/

## How was this patch tested?

Pass the Jenkins with the newly added test cases.

Closes #23851 from dongjoon-hyun/SPARK-26950.

Authored-by: Dongjoon Hyun 
Signed-off-by: Wenchen Fan 
(cherry picked from commit ffef3d40741b0be321421aa52a6e17a26d89f541)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/RandomDataGenerator.scala | 24 +++--
 .../spark/sql/RandomDataGeneratorSuite.scala   | 31 ++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
index 8ae3ff5..d361e62 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
@@ -17,8 +17,6 @@
 
 package org.apache.spark.sql
 
-import java.lang.Double.longBitsToDouble
-import java.lang.Float.intBitsToFloat
 import java.math.MathContext
 
 import scala.collection.mutable
@@ -70,6 +68,28 @@ object RandomDataGenerator {
   }
 
   /**
+   * A wrapper of Float.intBitsToFloat to use a unique NaN value for all NaN 
values.
+   * This prevents `checkEvaluationWithUnsafeProjection` from failing due to
+   * the difference between `UnsafeRow` binary presentation for NaN.
+   * This is visible for testing.
+   */
+  def intBitsToFloat(bits: Int): Float = {
+val value = java.lang.Float.intBitsToFloat(bits)
+if (value.isNaN) Float.NaN else value
+  }
+
+  /**
+   * A wrapper of Double.longBitsToDouble to use a unique NaN value for all 
NaN values.
+   * This prevents `checkEvaluationWithUnsafeProjection` from failing due to
+   * the difference between `UnsafeRow` binary presentation for NaN.
+   * This is visible for testing.
+   */
+  def longBitsToDouble(bits: Long): Double = {
+val value = java.lang.Double.longBitsToDouble(bits)
+if (value.isNaN) Double.NaN else value
+  }
+
+  /**
* Returns a randomly generated schema, based on the given accepted types.
*
* @param numFields the number of fields in this schema
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
index 3c2f8a2..3e62ca0 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
@@ -17,6 +17,9 @@
 
 package org.apache.spark.sql
 
+import java.nio.ByteBuffer
+import java.util.Arrays
+
 import scala.util.Random
 
 import org.apache.spark.SparkFunSuite
@@ -106,4 +109,32 @@ class RandomDataGeneratorSuite extends SparkFunSuite {
   assert(deviation.toDouble / expectedTotalElements < 2e-1)
 }
   }
+
+  test("Use Float.NaN for all NaN values") {
+val bits = -6966608
+val nan1 = java.lang.Float.intBitsToFloat(bits)
+val nan2 = RandomDataGenerator.intBitsToFloat(bits)
+assert(nan1.isNaN)
+assert(nan2.isNaN)
+
+val arrayExpected = ByteBuffer.allocate(4).putFloat(Float.NaN).array
+val array1 = ByteBuffer.allocate(4).putFloat(nan1).array
+val array2 = ByteBuffer.allocate(4).putFloat(nan2).array
+assert(!Arrays.equals(array1, arrayExpected))
+assert(Arrays.equals(array2, arrayExpected))
+  }

[spark] branch branch-2.4 updated: [SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values

2019-02-21 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new ef67be3  [SPARK-26950][SQL][TEST] Make RandomDataGenerator use 
Float.NaN or Double.NaN for all NaN values
ef67be3 is described below

commit ef67be363be6d6b6954b55ef1c243a0672b84abb
Author: Dongjoon Hyun 
AuthorDate: Fri Feb 22 12:25:26 2019 +0800

[SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or 
Double.NaN for all NaN values

## What changes were proposed in this pull request?

Apache Spark uses the predefined `Float.NaN` and `Double.NaN` for NaN 
values, but there exists more NaN values with different binary presentations.

```scala
scala> java.nio.ByteBuffer.allocate(4).putFloat(Float.NaN).array
res1: Array[Byte] = Array(127, -64, 0, 0)

scala> val x = java.lang.Float.intBitsToFloat(-6966608)
x: Float = NaN

scala> java.nio.ByteBuffer.allocate(4).putFloat(x).array
res2: Array[Byte] = Array(-1, -107, -78, -80)
```

Since users can have these values, `RandomDataGenerator` generates these 
NaN values. However, this causes `checkEvaluationWithUnsafeProjection` failures 
due to the difference between `UnsafeRow` binary presentation. The following is 
the UT failure instance. This PR aims to fix this UT flakiness.

- 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/102528/testReport/

## How was this patch tested?

Pass the Jenkins with the newly added test cases.

Closes #23851 from dongjoon-hyun/SPARK-26950.

Authored-by: Dongjoon Hyun 
Signed-off-by: Wenchen Fan 
(cherry picked from commit ffef3d40741b0be321421aa52a6e17a26d89f541)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/RandomDataGenerator.scala | 24 +++--
 .../spark/sql/RandomDataGeneratorSuite.scala   | 31 ++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
index 8ae3ff5..d361e62 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
@@ -17,8 +17,6 @@
 
 package org.apache.spark.sql
 
-import java.lang.Double.longBitsToDouble
-import java.lang.Float.intBitsToFloat
 import java.math.MathContext
 
 import scala.collection.mutable
@@ -70,6 +68,28 @@ object RandomDataGenerator {
   }
 
   /**
+   * A wrapper of Float.intBitsToFloat to use a unique NaN value for all NaN 
values.
+   * This prevents `checkEvaluationWithUnsafeProjection` from failing due to
+   * the difference between `UnsafeRow` binary presentation for NaN.
+   * This is visible for testing.
+   */
+  def intBitsToFloat(bits: Int): Float = {
+val value = java.lang.Float.intBitsToFloat(bits)
+if (value.isNaN) Float.NaN else value
+  }
+
+  /**
+   * A wrapper of Double.longBitsToDouble to use a unique NaN value for all 
NaN values.
+   * This prevents `checkEvaluationWithUnsafeProjection` from failing due to
+   * the difference between `UnsafeRow` binary presentation for NaN.
+   * This is visible for testing.
+   */
+  def longBitsToDouble(bits: Long): Double = {
+val value = java.lang.Double.longBitsToDouble(bits)
+if (value.isNaN) Double.NaN else value
+  }
+
+  /**
* Returns a randomly generated schema, based on the given accepted types.
*
* @param numFields the number of fields in this schema
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
index 3c2f8a2..3e62ca0 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGeneratorSuite.scala
@@ -17,6 +17,9 @@
 
 package org.apache.spark.sql
 
+import java.nio.ByteBuffer
+import java.util.Arrays
+
 import scala.util.Random
 
 import org.apache.spark.SparkFunSuite
@@ -106,4 +109,32 @@ class RandomDataGeneratorSuite extends SparkFunSuite {
   assert(deviation.toDouble / expectedTotalElements < 2e-1)
 }
   }
+
+  test("Use Float.NaN for all NaN values") {
+val bits = -6966608
+val nan1 = java.lang.Float.intBitsToFloat(bits)
+val nan2 = RandomDataGenerator.intBitsToFloat(bits)
+assert(nan1.isNaN)
+assert(nan2.isNaN)
+
+val arrayExpected = ByteBuffer.allocate(4).putFloat(Float.NaN).array
+val array1 = ByteBuffer.allocate(4).putFloat(nan1).array
+val array2 = ByteBuffer.allocate(4).putFloat(nan2).array
+assert(!Arrays.equals(array1, arrayExpected))
+assert(Arrays.equals(array2, arrayExpected))
+  }

[spark] branch master updated: [SPARK-26930][SQL] Tests in ParquetFilterSuite don't verify filter class

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0663797  [SPARK-26930][SQL] Tests in ParquetFilterSuite don't verify 
filter class
0663797 is described below

commit 066379783af154f1c9e2fae6daaf444b6e383ab0
Author: nandorKollar 
AuthorDate: Fri Feb 22 14:07:55 2019 +0800

[SPARK-26930][SQL] Tests in ParquetFilterSuite don't verify filter class

## What changes were proposed in this pull request?

Add assert to verify predicate class in ParquetFilterSuite

## How was this patch tested?

Ran ParquetFilterSuite, tests passed

Closes #23855 from nandorKollar/SPARK-26930.

Lead-authored-by: nandorKollar 
Co-authored-by: Hyukjin Kwon 
Co-authored-by: Nandor Kollar 
Signed-off-by: Hyukjin Kwon 
---
 .../datasources/parquet/ParquetFilterSuite.scala | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
index 9cfc943..255f7db 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
@@ -29,6 +29,7 @@ import org.apache.spark.SparkException
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints
 import org.apache.spark.sql.catalyst.planning.PhysicalOperation
 import org.apache.spark.sql.execution.datasources.{DataSourceStrategy, 
HadoopFsRelation, LogicalRelation}
 import org.apache.spark.sql.functions._
@@ -91,6 +92,10 @@ class ParquetFilterSuite extends QueryTest with ParquetTest 
with SharedSQLContex
   SQLConf.PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED.key -> "true",
   SQLConf.PARQUET_FILTER_PUSHDOWN_DECIMAL_ENABLED.key -> "true",
   SQLConf.PARQUET_FILTER_PUSHDOWN_STRING_STARTSWITH_ENABLED.key -> "true",
+  // Disable adding filters from constraints because it adds, for instance,
+  // is-not-null to pushed filters, which makes it hard to test if the 
pushed
+  // filter is expected or not (this had to be fixed with SPARK-13495).
+  SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
InferFiltersFromConstraints.ruleName,
   SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
 val query = df
   .select(output.map(e => Column(e)): _*)
@@ -109,13 +114,16 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
   DataSourceStrategy.selectFilters(maybeRelation.get, 
maybeAnalyzedPredicate.toSeq)
 assert(selectedFilters.nonEmpty, "No filter is pushed down")
 
-selectedFilters.foreach { pred =>
+val pushedParquetFilters = selectedFilters.map { pred =>
   val maybeFilter = parquetFilters.createFilter(
 new SparkToParquetSchemaConverter(conf).convert(df.schema), pred)
   assert(maybeFilter.isDefined, s"Couldn't generate filter predicate 
for $pred")
-  // Doesn't bother checking type parameters here (e.g. `Eq[Integer]`)
-  maybeFilter.exists(_.getClass === filterClass)
+  maybeFilter.get
 }
+// Doesn't bother checking type parameters here (e.g. `Eq[Integer]`)
+assert(pushedParquetFilters.exists(_.getClass === filterClass),
+  s"${pushedParquetFilters.map(_.getClass).toList} did not contain 
${filterClass}.")
+
 checker(stripSparkFilter(query), expected)
 }
   }
@@ -1073,20 +1081,20 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 
   checkFilterPredicate(
 !'_1.startsWith("").asInstanceOf[Predicate],
-classOf[UserDefinedByInstance[_, _]],
+classOf[Operators.Not],
 Seq().map(Row(_)))
 
   Seq("2", "2s", "2st", "2str", "2str2").foreach { prefix =>
 checkFilterPredicate(
   !'_1.startsWith(prefix).asInstanceOf[Predicate],
-  classOf[UserDefinedByInstance[_, _]],
+  classOf[Operators.Not],
   Seq("1str1", "3str3", "4str4").map(Row(_)))
   }
 
   Seq("2S", "null", "2str22").foreach { prefix =>
 checkFilterPredicate(
   !'_1.startsWith(prefix).asInstanceOf[Predicate],
-  classOf[UserDefinedByInstance[_, _]],
+  classOf[Operators.Not],
   Seq("1str1", "2str2", "3str3", "4str4").map(Row(_)))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional c

[spark] branch master updated: [SPARK-26930][SQL] Tests in ParquetFilterSuite don't verify filter class

2019-02-21 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0663797  [SPARK-26930][SQL] Tests in ParquetFilterSuite don't verify 
filter class
0663797 is described below

commit 066379783af154f1c9e2fae6daaf444b6e383ab0
Author: nandorKollar 
AuthorDate: Fri Feb 22 14:07:55 2019 +0800

[SPARK-26930][SQL] Tests in ParquetFilterSuite don't verify filter class

## What changes were proposed in this pull request?

Add assert to verify predicate class in ParquetFilterSuite

## How was this patch tested?

Ran ParquetFilterSuite, tests passed

Closes #23855 from nandorKollar/SPARK-26930.

Lead-authored-by: nandorKollar 
Co-authored-by: Hyukjin Kwon 
Co-authored-by: Nandor Kollar 
Signed-off-by: Hyukjin Kwon 
---
 .../datasources/parquet/ParquetFilterSuite.scala | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
index 9cfc943..255f7db 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
@@ -29,6 +29,7 @@ import org.apache.spark.SparkException
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints
 import org.apache.spark.sql.catalyst.planning.PhysicalOperation
 import org.apache.spark.sql.execution.datasources.{DataSourceStrategy, 
HadoopFsRelation, LogicalRelation}
 import org.apache.spark.sql.functions._
@@ -91,6 +92,10 @@ class ParquetFilterSuite extends QueryTest with ParquetTest 
with SharedSQLContex
   SQLConf.PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED.key -> "true",
   SQLConf.PARQUET_FILTER_PUSHDOWN_DECIMAL_ENABLED.key -> "true",
   SQLConf.PARQUET_FILTER_PUSHDOWN_STRING_STARTSWITH_ENABLED.key -> "true",
+  // Disable adding filters from constraints because it adds, for instance,
+  // is-not-null to pushed filters, which makes it hard to test if the 
pushed
+  // filter is expected or not (this had to be fixed with SPARK-13495).
+  SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
InferFiltersFromConstraints.ruleName,
   SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
 val query = df
   .select(output.map(e => Column(e)): _*)
@@ -109,13 +114,16 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
   DataSourceStrategy.selectFilters(maybeRelation.get, 
maybeAnalyzedPredicate.toSeq)
 assert(selectedFilters.nonEmpty, "No filter is pushed down")
 
-selectedFilters.foreach { pred =>
+val pushedParquetFilters = selectedFilters.map { pred =>
   val maybeFilter = parquetFilters.createFilter(
 new SparkToParquetSchemaConverter(conf).convert(df.schema), pred)
   assert(maybeFilter.isDefined, s"Couldn't generate filter predicate 
for $pred")
-  // Doesn't bother checking type parameters here (e.g. `Eq[Integer]`)
-  maybeFilter.exists(_.getClass === filterClass)
+  maybeFilter.get
 }
+// Doesn't bother checking type parameters here (e.g. `Eq[Integer]`)
+assert(pushedParquetFilters.exists(_.getClass === filterClass),
+  s"${pushedParquetFilters.map(_.getClass).toList} did not contain 
${filterClass}.")
+
 checker(stripSparkFilter(query), expected)
 }
   }
@@ -1073,20 +1081,20 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 
   checkFilterPredicate(
 !'_1.startsWith("").asInstanceOf[Predicate],
-classOf[UserDefinedByInstance[_, _]],
+classOf[Operators.Not],
 Seq().map(Row(_)))
 
   Seq("2", "2s", "2st", "2str", "2str2").foreach { prefix =>
 checkFilterPredicate(
   !'_1.startsWith(prefix).asInstanceOf[Predicate],
-  classOf[UserDefinedByInstance[_, _]],
+  classOf[Operators.Not],
   Seq("1str1", "3str3", "4str4").map(Row(_)))
   }
 
   Seq("2S", "null", "2str22").foreach { prefix =>
 checkFilterPredicate(
   !'_1.startsWith(prefix).asInstanceOf[Predicate],
-  classOf[UserDefinedByInstance[_, _]],
+  classOf[Operators.Not],
   Seq("1str1", "2str2", "3str3", "4str4").map(Row(_)))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional c

[spark] branch master updated: [SPARK-26851][SQL][FOLLOWUP] Fix cachedColumnBuffers field for Scala 2.11 build

2019-02-21 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 95bb012  [SPARK-26851][SQL][FOLLOWUP] Fix cachedColumnBuffers field 
for Scala 2.11 build
95bb012 is described below

commit 95bb01282cc94f95bbc69aafcbc1550b137238be
Author: Sean Owen 
AuthorDate: Fri Feb 22 15:22:52 2019 +0900

[SPARK-26851][SQL][FOLLOWUP] Fix cachedColumnBuffers field for Scala 2.11 
build

## What changes were proposed in this pull request?

Per https://github.com/apache/spark/pull/23768/files#r259083019 the last 
change to this line here caused the 2.11 build to fail. It's worked around by 
making `_cachedColumnBuffers` a field, as it was never set by callers to 
anything other than its default of null.

## How was this patch tested?

Existing tests.

Closes #23864 from srowen/SPARK-26851.2.

Authored-by: Sean Owen 
Signed-off-by: Takeshi Yamamuro 
---
 .../main/scala/org/apache/spark/sql/execution/CacheManager.scala   | 3 +--
 .../org/apache/spark/sql/execution/columnar/InMemoryRelation.scala | 7 ---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
index c6ee735..f7a78ea 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
@@ -213,8 +213,7 @@ class CacheManager extends Logging {
   cd.cachedRepresentation.cacheBuilder.clearCache()
   val plan = spark.sessionState.executePlan(cd.plan).executedPlan
   val newCache = InMemoryRelation(
-cacheBuilder = cd.cachedRepresentation
-  .cacheBuilder.copy(cachedPlan = plan)(_cachedColumnBuffers = null),
+cacheBuilder = cd.cachedRepresentation.cacheBuilder.copy(cachedPlan = 
plan),
 logicalPlan = cd.plan)
   val recomputedPlan = cd.copy(cachedRepresentation = newCache)
   writeLock {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
index bc6e958..7180853 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
@@ -48,8 +48,9 @@ case class CachedRDDBuilder(
 batchSize: Int,
 storageLevel: StorageLevel,
 @transient cachedPlan: SparkPlan,
-tableName: Option[String])(
-@transient @volatile private var _cachedColumnBuffers: RDD[CachedBatch] = 
null) {
+tableName: Option[String]) {
+
+  @transient @volatile private var _cachedColumnBuffers: RDD[CachedBatch] = 
null
 
   val sizeInBytesStats: LongAccumulator = 
cachedPlan.sqlContext.sparkContext.longAccumulator
 
@@ -143,7 +144,7 @@ object InMemoryRelation {
   child: SparkPlan,
   tableName: Option[String],
   logicalPlan: LogicalPlan): InMemoryRelation = {
-val cacheBuilder = CachedRDDBuilder(useCompression, batchSize, 
storageLevel, child, tableName)()
+val cacheBuilder = CachedRDDBuilder(useCompression, batchSize, 
storageLevel, child, tableName)
 new InMemoryRelation(child.output, cacheBuilder, 
logicalPlan.outputOrdering)(
   statsOfPlanToCache = logicalPlan.stats)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26851][SQL][FOLLOWUP] Fix cachedColumnBuffers field for Scala 2.11 build

2019-02-21 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 95bb012  [SPARK-26851][SQL][FOLLOWUP] Fix cachedColumnBuffers field 
for Scala 2.11 build
95bb012 is described below

commit 95bb01282cc94f95bbc69aafcbc1550b137238be
Author: Sean Owen 
AuthorDate: Fri Feb 22 15:22:52 2019 +0900

[SPARK-26851][SQL][FOLLOWUP] Fix cachedColumnBuffers field for Scala 2.11 
build

## What changes were proposed in this pull request?

Per https://github.com/apache/spark/pull/23768/files#r259083019 the last 
change to this line here caused the 2.11 build to fail. It's worked around by 
making `_cachedColumnBuffers` a field, as it was never set by callers to 
anything other than its default of null.

## How was this patch tested?

Existing tests.

Closes #23864 from srowen/SPARK-26851.2.

Authored-by: Sean Owen 
Signed-off-by: Takeshi Yamamuro 
---
 .../main/scala/org/apache/spark/sql/execution/CacheManager.scala   | 3 +--
 .../org/apache/spark/sql/execution/columnar/InMemoryRelation.scala | 7 ---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
index c6ee735..f7a78ea 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
@@ -213,8 +213,7 @@ class CacheManager extends Logging {
   cd.cachedRepresentation.cacheBuilder.clearCache()
   val plan = spark.sessionState.executePlan(cd.plan).executedPlan
   val newCache = InMemoryRelation(
-cacheBuilder = cd.cachedRepresentation
-  .cacheBuilder.copy(cachedPlan = plan)(_cachedColumnBuffers = null),
+cacheBuilder = cd.cachedRepresentation.cacheBuilder.copy(cachedPlan = 
plan),
 logicalPlan = cd.plan)
   val recomputedPlan = cd.copy(cachedRepresentation = newCache)
   writeLock {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
index bc6e958..7180853 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
@@ -48,8 +48,9 @@ case class CachedRDDBuilder(
 batchSize: Int,
 storageLevel: StorageLevel,
 @transient cachedPlan: SparkPlan,
-tableName: Option[String])(
-@transient @volatile private var _cachedColumnBuffers: RDD[CachedBatch] = 
null) {
+tableName: Option[String]) {
+
+  @transient @volatile private var _cachedColumnBuffers: RDD[CachedBatch] = 
null
 
   val sizeInBytesStats: LongAccumulator = 
cachedPlan.sqlContext.sparkContext.longAccumulator
 
@@ -143,7 +144,7 @@ object InMemoryRelation {
   child: SparkPlan,
   tableName: Option[String],
   logicalPlan: LogicalPlan): InMemoryRelation = {
-val cacheBuilder = CachedRDDBuilder(useCompression, batchSize, 
storageLevel, child, tableName)()
+val cacheBuilder = CachedRDDBuilder(useCompression, batchSize, 
storageLevel, child, tableName)
 new InMemoryRelation(child.output, cacheBuilder, 
logicalPlan.outputOrdering)(
   statsOfPlanToCache = logicalPlan.stats)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [R] update package description

[spark] branch master updated: [SPARK-26917][SQL] Cache lock recache by condition

[spark] branch master updated: [DOCS] MINOR Complement the document of stringOrderType for StringIndexer in PySpark

[spark] branch branch-2.4 updated: [R][BACKPORT-2.4] update package description

[spark] branch branch-2.3 updated: [R][BACKPORT-2.4] update package description

[spark] tag v2.4.1-rc4 created (now 79c1f7e)

[spark] 01/01: Preparing Spark release v2.4.1-rc4

[spark] 01/01: Preparing development version 2.4.2-SNAPSHOT

[spark] branch branch-2.4 updated (d8576301 -> 3282544)

[spark] branch master updated: [SPARK-26958][SQL][TEST] Add NestedSchemaPruningBenchmark

[spark] branch master updated: [SPARK-26958][SQL][TEST] Add NestedSchemaPruningBenchmark

[spark] branch master updated: [SPARK-26960][ML] Wait for listener bus to clear in MLEventsSuite to reduce test flakiness

[spark] branch master updated: [SPARK-26960][ML] Wait for listener bus to clear in MLEventsSuite to reduce test flakiness

[spark] branch branch-2.3 updated: [R][BACKPORT-2.3] update package description

[spark] branch branch-2.3 updated: [R][BACKPORT-2.3] update package description

[spark] branch branch-2.4 updated: [R][BACKPORT-2.3] update package description

[spark] branch branch-2.4 updated: [R][BACKPORT-2.3] update package description

[spark] branch branch-2.4 updated: Revert "[R][BACKPORT-2.3] update package description"

[spark] branch branch-2.4 updated: Revert "[R][BACKPORT-2.3] update package description"

[spark] branch master updated: [MINOR][SQL] Fix typo in exception about set table properties.

[spark] branch master updated: [MINOR][SQL] Fix typo in exception about set table properties.

[spark] branch master updated: [SPARK-26955][CORE] Align Spark's TimSort to jdk11 implementation

[spark] branch master updated: [SPARK-26955][CORE] Align Spark's TimSort to jdk11 implementation

[spark] branch master updated: [SPARK-25097][ML] Support prediction on single instance in KMeans/BiKMeans/GMM

[spark] branch master updated: [SPARK-25097][ML] Support prediction on single instance in KMeans/BiKMeans/GMM

[spark] branch master updated: [SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values

[spark] branch master updated: [SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values

[spark] branch branch-2.4 updated: [SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values

[spark] branch branch-2.4 updated: [SPARK-26950][SQL][TEST] Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values

[spark] branch master updated: [SPARK-26930][SQL] Tests in ParquetFilterSuite don't verify filter class

[spark] branch master updated: [SPARK-26930][SQL] Tests in ParquetFilterSuite don't verify filter class

[spark] branch master updated: [SPARK-26851][SQL][FOLLOWUP] Fix cachedColumnBuffers field for Scala 2.11 build

[spark] branch master updated: [SPARK-26851][SQL][FOLLOWUP] Fix cachedColumnBuffers field for Scala 2.11 build

33 matches

Site Navigation

Mail list logo

Footer information