[spark] 01/01: Preparing development version 2.4.2-SNAPSHOT

2019-03-09 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit dba5bac66760cf8b47052c94f3d79ffeb00e2eb3
Author: DB Tsai 
AuthorDate: Sun Mar 10 06:34:15 2019 +

Preparing development version 2.4.2-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index be924c9..5e3d186 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.1
+Version: 2.4.2
 Title: R Front end for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 8e11fd6..cdf 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1
+2.4.2-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f0eee07..092f85b 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1
+2.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 8c8bdf4..5236fd6 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1
+2.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 663f41d..b70dadf 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1
+2.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 9acade1..e9ae143 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.1
+2.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 

[spark] branch branch-2.4 updated (a017a1c -> dba5bac)

2019-03-09 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a017a1c  [SPARK-27097][CHERRY-PICK 2.4] Avoid embedding 
platform-dependent offsets literally in whole-stage generated code
 add 746b3dd  Preparing Spark release v2.4.1-rc8
 new dba5bac  Preparing development version 2.4.2-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] tag v2.4.1-rc8 created (now 746b3dd)

2019-03-09 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a change to tag v2.4.1-rc8
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 746b3dd  (commit)
This tag includes the following new commits:

 new 746b3dd  Preparing Spark release v2.4.1-rc8

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v2.4.1-rc8

2019-03-09 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to tag v2.4.1-rc8
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 746b3ddee6f7ad3464e326228ea226f5b1f39a41
Author: DB Tsai 
AuthorDate: Sun Mar 10 06:33:54 2019 +

Preparing Spark release v2.4.1-rc8
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/flume-assembly/pom.xml| 2 +-
 external/flume-sink/pom.xml| 2 +-
 external/flume/pom.xml | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kafka-0-8-assembly/pom.xml| 2 +-
 external/kafka-0-8/pom.xml | 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 5e3d186..be924c9 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.4.2
+Version: 2.4.1
 Title: R Front end for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
diff --git a/assembly/pom.xml b/assembly/pom.xml
index cdf..8e11fd6 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.2-SNAPSHOT
+2.4.1
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 092f85b..f0eee07 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.2-SNAPSHOT
+2.4.1
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 5236fd6..8c8bdf4 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.2-SNAPSHOT
+2.4.1
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index b70dadf..663f41d 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.2-SNAPSHOT
+2.4.1
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index e9ae143..9acade1 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.4.2-SNAPSHOT
+2.4.1
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 2ae4fcb..1a31a39 100644

[spark] branch master updated: [SPARK-27102][R][PYTHON][CORE] Remove the references to Python's Scala codes in R's Scala codes

2019-03-09 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 28d0030  [SPARK-27102][R][PYTHON][CORE] Remove the references to 
Python's Scala codes in R's Scala codes
28d0030 is described below

commit 28d003097b114623b891e0ae5dbe1709a54da891
Author: Hyukjin Kwon 
AuthorDate: Sun Mar 10 15:08:23 2019 +0900

[SPARK-27102][R][PYTHON][CORE] Remove the references to Python's Scala 
codes in R's Scala codes

## What changes were proposed in this pull request?

Currently, R's Scala codes happened to refer Python's Scala codes for code 
deduplications. It's a bit odd. For instance, when we face an exception from R, 
it shows python related code path, which makes confusing to debug. It should 
rather have one code base and R's and Python's should share.

This PR proposes:

1. Make a `SocketAuthServer` and move `PythonServer` so that `PythonRDD` 
and `RRDD` can share it.
2. Move `readRDDFromFile` and `readRDDFromInputStream` into `JavaRDD`.
3. Reuse `RAuthHelper` and remove `RSocketAuthHelper` in `RRDD`.
4. Rename `getEncryptionEnabled` to `isEncryptionEnabled` while I am here.

So, now, the places below:

- `sql/core/src/main/scala/org/apache/spark/sql/api/r`
- `core/src/main/scala/org/apache/spark/api/r`
- `mllib/src/main/scala/org/apache/spark/ml/r`

don't refer Python's Scala codes.

## How was this patch tested?

Existing tests should cover this.

Closes #24023 from HyukjinKwon/SPARK-27102.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 R/pkg/R/context.R  |   2 +-
 R/pkg/tests/fulltests/test_Serde.R |   2 +-
 .../scala/org/apache/spark/api/java/JavaRDD.scala  |  33 +
 .../org/apache/spark/api/python/PythonRDD.scala| 135 +++--
 .../org/apache/spark/api/python/PythonUtils.scala  |   2 +-
 .../main/scala/org/apache/spark/api/r/RRDD.scala   |  28 ++---
 .../main/scala/org/apache/spark/api/r/RUtils.scala |   5 +-
 .../apache/spark/security/SocketAuthHelper.scala   |  18 ++-
 .../apache/spark/security/SocketAuthServer.scala   | 108 +
 .../apache/spark/api/python/PythonRDDSuite.scala   |   4 +-
 python/pyspark/context.py  |   2 +-
 11 files changed, 188 insertions(+), 151 deletions(-)

diff --git a/R/pkg/R/context.R b/R/pkg/R/context.R
index 1c064a6..6191536 100644
--- a/R/pkg/R/context.R
+++ b/R/pkg/R/context.R
@@ -175,7 +175,7 @@ parallelize <- function(sc, coll, numSlices = 1) {
   if (objectSize < sizeLimit) {
 jrdd <- callJStatic("org.apache.spark.api.r.RRDD", "createRDDFromArray", 
sc, serializedSlices)
   } else {
-if (callJStatic("org.apache.spark.api.r.RUtils", "getEncryptionEnabled", 
sc)) {
+if (callJStatic("org.apache.spark.api.r.RUtils", "isEncryptionEnabled", 
sc)) {
   connectionTimeout <- 
as.numeric(Sys.getenv("SPARKR_BACKEND_CONNECTION_TIMEOUT", "6000"))
   # the length of slices here is the parallelism to use in the jvm's 
sc.parallelize()
   parallelism <- as.integer(numSlices)
diff --git a/R/pkg/tests/fulltests/test_Serde.R 
b/R/pkg/tests/fulltests/test_Serde.R
index 1525bdb..e01f6ee 100644
--- a/R/pkg/tests/fulltests/test_Serde.R
+++ b/R/pkg/tests/fulltests/test_Serde.R
@@ -138,7 +138,7 @@ test_that("createDataFrame large objects", {
 enableHiveSupport = FALSE))
 
 sc <- getSparkContext()
-actual <- callJStatic("org.apache.spark.api.r.RUtils", 
"getEncryptionEnabled", sc)
+actual <- callJStatic("org.apache.spark.api.r.RUtils", 
"isEncryptionEnabled", sc)
 expected <- as.logical(encryptionEnabled)
 expect_equal(actual, expected)
 
diff --git a/core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala 
b/core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala
index 41b5cab..6f01822 100644
--- a/core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala
+++ b/core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala
@@ -17,6 +17,9 @@
 
 package org.apache.spark.api.java
 
+import java.io.{DataInputStream, EOFException, FileInputStream, InputStream}
+
+import scala.collection.mutable
 import scala.language.implicitConversions
 import scala.reflect.ClassTag
 
@@ -213,4 +216,34 @@ object JavaRDD {
   implicit def fromRDD[T: ClassTag](rdd: RDD[T]): JavaRDD[T] = new 
JavaRDD[T](rdd)
 
   implicit def toRDD[T](rdd: JavaRDD[T]): RDD[T] = rdd.rdd
+
+  private[api] def readRDDFromFile(
+  sc: JavaSparkContext,
+  filename: String,
+  parallelism: Int): JavaRDD[Array[Byte]] = {
+readRDDFromInputStream(sc.sc, new FileInputStream(filename), parallelism)
+  }
+
+  private[api] def readRDDFromInputStream(
+  sc: SparkContext,
+  in: InputStream,
+  parallelism: Int): 

[spark] branch branch-2.4 updated: [SPARK-27097][CHERRY-PICK 2.4] Avoid embedding platform-dependent offsets literally in whole-stage generated code

2019-03-09 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new a017a1c  [SPARK-27097][CHERRY-PICK 2.4] Avoid embedding 
platform-dependent offsets literally in whole-stage generated code
a017a1c is described below

commit a017a1c1afc2e49e61df7c1d23c9c5058708fac8
Author: Kris Mok 
AuthorDate: Sun Mar 10 06:00:36 2019 +

[SPARK-27097][CHERRY-PICK 2.4] Avoid embedding platform-dependent offsets 
literally in whole-stage generated code

## What changes were proposed in this pull request?

Spark SQL performs whole-stage code generation to speed up query execution. 
There are two steps to it:
- Java source code is generated from the physical query plan on the driver. 
A single version of the source code is generated from a query plan, and sent to 
all executors.
  - It's compiled to bytecode on the driver to catch compilation errors 
before sending to executors, but currently only the generated source code gets 
sent to the executors. The bytecode compilation is for fail-fast only.
- Executors receive the generated source code and compile to bytecode, then 
the query runs like a hand-written Java program.

In this model, there's an implicit assumption about the driver and 
executors being run on similar platforms. Some code paths accidentally embedded 
platform-dependent object layout information into the generated code, such as:
```java
Platform.putLong(buffer, /* offset */ 24, /* value */ 1);
```
This code expects a field to be at offset +24 of the `buffer` object, and 
sets a value to that field.
But whole-stage code generation generally uses platform-dependent 
information from the driver. If the object layout is significantly different on 
the driver and executors, the generated code can be reading/writing to wrong 
offsets on the executors, causing all kinds of data corruption.

One code pattern that leads to such problem is the use of `Platform.XXX` 
constants in generated code, e.g. `Platform.BYTE_ARRAY_OFFSET`.

Bad:
```scala
val baseOffset = Platform.BYTE_ARRAY_OFFSET
// codegen template:
s"Platform.putLong($buffer, $baseOffset, $value);"
```
This will embed the value of `Platform.BYTE_ARRAY_OFFSET` on the driver 
into the generated code.

Good:
```scala
val baseOffset = "Platform.BYTE_ARRAY_OFFSET"
// codegen template:
s"Platform.putLong($buffer, $baseOffset, $value);"
```
This will generate the offset symbolically -- `Platform.putLong(buffer, 
Platform.BYTE_ARRAY_OFFSET, value)`, which will be able to pick up the correct 
value on the executors.

Caveat: these offset constants are declared as runtime-initialized `static 
final` in Java, so they're not compile-time constants from the Java language's 
perspective. It does lead to a slightly increased size of the generated code, 
but this is necessary for correctness.

NOTE: there can be other patterns that generate platform-dependent code on 
the driver which is invalid on the executors. e.g. if the endianness is 
different between the driver and the executors, and if some generated code 
makes strong assumption about endianness, it would also be problematic.

## How was this patch tested?

Added a new test suite `WholeStageCodegenSparkSubmitSuite`. This test suite 
needs to set the driver's extraJavaOptions to force the driver and executor use 
different Java object layouts, so it's run as an actual SparkSubmit job.

Authored-by: Kris Mok 

Closes #24032 from gatorsmile/testFailure.

Lead-authored-by: Kris Mok 
Co-authored-by: gatorsmile 
Signed-off-by: DB Tsai 
---
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  3 +-
 .../codegen/GenerateUnsafeRowJoiner.scala  | 20 ++---
 .../expressions/collectionOperations.scala |  6 +-
 .../catalyst/expressions/complexTypeCreator.scala  |  2 +-
 .../WholeStageCodegenSparkSubmitSuite.scala| 93 ++
 5 files changed, 108 insertions(+), 16 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
index fb179d0..bfca670 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
@@ -51,7 +51,6 @@ public final class UnsafeSorterSpillReader extends 
UnsafeSorterIterator implemen
 
   private byte[] arr = new byte[1024 * 1024];
   private Object baseObject = arr;
-  private final long baseOffset = Platform.BYTE_ARRAY_OFFSET;
   private final TaskContext taskContext = 

[spark] branch master updated: [SPARK-27118][SQL] Upgrade Hive Metastore Client to the latest versions for Hive 1.0.x/1.1.x

2019-03-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 470313e  [SPARK-27118][SQL] Upgrade Hive Metastore Client to the 
latest versions for Hive 1.0.x/1.1.x
470313e is described below

commit 470313e6602600d9f81d954741a3fa108790d449
Author: Yuming Wang 
AuthorDate: Sat Mar 9 16:50:10 2019 -0800

[SPARK-27118][SQL] Upgrade Hive Metastore Client to the latest versions for 
Hive 1.0.x/1.1.x

## What changes were proposed in this pull request?

Hive 1.1.1 and Hive 1.0.1 released. We should upgrade Hive Metastore Client 
version.


https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329444=Text=12310843

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329557=Text=12310843

## How was this patch tested?

N/A

Closes #24040 from wangyum/SPARK-27118.

Authored-by: Yuming Wang 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala | 4 ++--
 .../src/main/scala/org/apache/spark/sql/hive/client/package.scala | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
index 1f7ab9b..efa97b2 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
@@ -93,8 +93,8 @@ private[hive] object IsolatedClientLoader extends Logging {
 case "12" | "0.12" | "0.12.0" => hive.v12
 case "13" | "0.13" | "0.13.0" | "0.13.1" => hive.v13
 case "14" | "0.14" | "0.14.0" => hive.v14
-case "1.0" | "1.0.0" => hive.v1_0
-case "1.1" | "1.1.0" => hive.v1_1
+case "1.0" | "1.0.0" | "1.0.1" => hive.v1_0
+case "1.1" | "1.1.0" | "1.1.1" => hive.v1_1
 case "1.2" | "1.2.0" | "1.2.1" | "1.2.2" => hive.v1_2
 case "2.0" | "2.0.0" | "2.0.1" => hive.v2_0
 case "2.1" | "2.1.0" | "2.1.1" => hive.v2_1
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
index 70d042e..040f445 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
@@ -39,7 +39,7 @@ package object client {
 "org.apache.calcite:calcite-avatica",
 "org.pentaho:pentaho-aggdesigner-algorithm"))
 
-case object v1_0 extends HiveVersion("1.0.0",
+case object v1_0 extends HiveVersion("1.0.1",
   exclusions = Seq("eigenbase:eigenbase-properties",
 "org.apache.calcite:calcite-core",
 "org.apache.calcite:calcite-avatica",
@@ -50,7 +50,7 @@ package object client {
 // The curator dependency was added to the exclusions here because it 
seems to confuse the ivy
 // library. org.apache.curator:curator is a pom dependency but ivy tries 
to find the jar for it,
 // and fails.
-case object v1_1 extends HiveVersion("1.1.0",
+case object v1_1 extends HiveVersion("1.1.1",
   exclusions = Seq("eigenbase:eigenbase-properties",
 "org.apache.calcite:calcite-core",
 "org.apache.calcite:calcite-avatica",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27054][BUILD][SQL] Remove the Calcite dependency

2019-03-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f732647  [SPARK-27054][BUILD][SQL] Remove the Calcite dependency
f732647 is described below

commit f732647ae4f486531440dd1239719085c367181e
Author: Yuming Wang 
AuthorDate: Sat Mar 9 16:34:24 2019 -0800

[SPARK-27054][BUILD][SQL] Remove the Calcite dependency

## What changes were proposed in this pull request?

Calcite is only used for 
[runSqlHive](https://github.com/apache/spark/blob/02bbe977abaf7006b845a7e99d612b0235aa0025/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L699-L705)
 when 
`hive.cbo.enable=true`([SemanticAnalyzer](https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java#L278-L280)).
So we can disable `hive.cbo.enable` and remove Calcite dependency.

## How was this patch tested?

Exist tests

Closes #23970 from wangyum/SPARK-27054.

Lead-authored-by: Yuming Wang 
Co-authored-by: Yuming Wang 
Signed-off-by: Dongjoon Hyun 
---
 LICENSE-binary |  3 -
 NOTICE-binary  |  9 ---
 dev/deps/spark-deps-hadoop-2.7 |  4 --
 dev/deps/spark-deps-hadoop-3.1 |  4 --
 pom.xml| 72 ++
 sql/hive/pom.xml   |  8 ---
 .../spark/sql/hive/client/HiveClientImpl.scala |  2 +
 .../org/apache/spark/sql/hive/client/package.scala | 41 
 8 files changed, 37 insertions(+), 106 deletions(-)

diff --git a/LICENSE-binary b/LICENSE-binary
index 541cc4c..0c157cf 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -260,9 +260,6 @@ net.sf.supercsv:super-csv
 org.apache.arrow:arrow-format
 org.apache.arrow:arrow-memory
 org.apache.arrow:arrow-vector
-org.apache.calcite:calcite-avatica
-org.apache.calcite:calcite-core
-org.apache.calcite:calcite-linq4j
 org.apache.commons:commons-crypto
 org.apache.commons:commons-lang3
 org.apache.hadoop:hadoop-annotations
diff --git a/NOTICE-binary b/NOTICE-binary
index b707c43..df41618 100644
--- a/NOTICE-binary
+++ b/NOTICE-binary
@@ -792,15 +792,6 @@ Copyright 2005-2006 The Apache Software Foundation
 Apache Jakarta HttpClient
 Copyright 1999-2007 The Apache Software Foundation
 
-Calcite Avatica
-Copyright 2012-2015 The Apache Software Foundation
-
-Calcite Core
-Copyright 2012-2015 The Apache Software Foundation
-
-Calcite Linq4j
-Copyright 2012-2015 The Apache Software Foundation
-
 Apache HttpClient
 Copyright 1999-2017 The Apache Software Foundation
 
diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index d53039f..53267ea 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -24,9 +24,6 @@ avro-mapred-1.8.2-hadoop2.jar
 bonecp-0.8.0.RELEASE.jar
 breeze-macros_2.12-0.13.2.jar
 breeze_2.12-0.13.2.jar
-calcite-avatica-1.2.0-incubating.jar
-calcite-core-1.2.0-incubating.jar
-calcite-linq4j-1.2.0-incubating.jar
 chill-java-0.9.3.jar
 chill_2.12-0.9.3.jar
 commons-beanutils-1.7.0.jar
@@ -57,7 +54,6 @@ datanucleus-api-jdo-3.2.6.jar
 datanucleus-core-3.2.10.jar
 datanucleus-rdbms-3.2.9.jar
 derby-10.12.1.1.jar
-eigenbase-properties-1.1.5.jar
 flatbuffers-java-1.9.0.jar
 generex-1.0.1.jar
 gson-2.2.4.jar
diff --git a/dev/deps/spark-deps-hadoop-3.1 b/dev/deps/spark-deps-hadoop-3.1
index d1a6b27..367bd45 100644
--- a/dev/deps/spark-deps-hadoop-3.1
+++ b/dev/deps/spark-deps-hadoop-3.1
@@ -22,9 +22,6 @@ avro-mapred-1.8.2-hadoop2.jar
 bonecp-0.8.0.RELEASE.jar
 breeze-macros_2.12-0.13.2.jar
 breeze_2.12-0.13.2.jar
-calcite-avatica-1.2.0-incubating.jar
-calcite-core-1.2.0-incubating.jar
-calcite-linq4j-1.2.0-incubating.jar
 chill-java-0.9.3.jar
 chill_2.12-0.9.3.jar
 commons-beanutils-1.9.3.jar
@@ -56,7 +53,6 @@ datanucleus-rdbms-3.2.9.jar
 derby-10.12.1.1.jar
 dnsjava-2.1.7.jar
 ehcache-3.3.1.jar
-eigenbase-properties-1.1.5.jar
 flatbuffers-java-1.9.0.jar
 generex-1.0.1.jar
 geronimo-jcache_1.0_spec-1.0-alpha-1.jar
diff --git a/pom.xml b/pom.xml
index 0e1913a..1608309 100644
--- a/pom.xml
+++ b/pom.xml
@@ -168,7 +168,6 @@
 2.9.8
 1.1.7.1
 1.1.2
-1.2.0-incubating
 1.10
 2.4
 
@@ -1467,12 +1466,16 @@
 org.apache.avro
 avro-mapred
   
-  
+  
   
 org.apache.calcite
 calcite-core
   
   
+org.apache.calcite
+calcite-avatica
+  
+  
 org.apache.curator
 apache-curator
   
@@ -1842,71 +1845,6 @@
 compile
   
   
-org.apache.calcite
-calcite-core
-${calcite.version}
-
-  
-

[spark] branch branch-2.3 updated: [SPARK-27111][SS] Fix a race that a continuous query may fail with InterruptedException

2019-03-09 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new 4d1d0a4  [SPARK-27111][SS] Fix a race that a continuous query may fail 
with InterruptedException
4d1d0a4 is described below

commit 4d1d0a41a862c234acb9b8b68e96da7bf079eb8d
Author: Shixiong Zhu 
AuthorDate: Sat Mar 9 14:26:58 2019 -0800

[SPARK-27111][SS] Fix a race that a continuous query may fail with 
InterruptedException

Before a Kafka consumer gets assigned with partitions, its offset will 
contain 0 partitions. However, runContinuous will still run and launch a Spark 
job having 0 partitions. In this case, there is a race that epoch may interrupt 
the query execution thread after `lastExecution.toRdd`, and either 
`epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next 
`runContinuous` will get interrupted unintentionally.

To handle this case, this PR has the following changes:

- Clean up the resources in `queryExecutionThread.runUninterruptibly`. This 
may increase the waiting time of `stop` but should be minor because the 
operations here are very fast (just sending an RPC message in the same process 
and stopping a very simple thread).
- Clear the interrupted status at the end so that it won't impact the 
`runContinuous` call. We may clear the interrupted status set by `stop`, but it 
doesn't affect the query termination because `runActivatedStream` will check 
`state` and exit accordingly.

I also updated the clean up codes to make sure exceptions thrown from 
`epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` won't stop the 
clean up.

Jenkins

Closes #24034 from zsxwing/SPARK-27111.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 
(cherry picked from commit 6e1c0827ece1cdc615196e60cb11c76b917b8eeb)
Signed-off-by: Shixiong Zhu 
---
 .../streaming/continuous/ContinuousExecution.scala | 32 --
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
index 62adedb..dad7f9b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
@@ -270,14 +270,30 @@ class ContinuousExecution(
 logInfo(s"Query $id ignoring exception from reconfiguring: $t")
 // interrupted by reconfiguration - swallow exception so we can 
restart the query
 } finally {
-  epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)
-  SparkEnv.get.rpcEnv.stop(epochEndpoint)
-
-  epochUpdateThread.interrupt()
-  epochUpdateThread.join()
-
-  stopSources()
-  sparkSession.sparkContext.cancelJobGroup(runId.toString)
+  // The above execution may finish before getting interrupted, for 
example, a Spark job having
+  // 0 partitions will complete immediately. Then the interrupted status 
will sneak here.
+  //
+  // To handle this case, we do the two things here:
+  //
+  // 1. Clean up the resources in 
`queryExecutionThread.runUninterruptibly`. This may increase
+  //the waiting time of `stop` but should be minor because the 
operations here are very fast
+  //(just sending an RPC message in the same process and stopping a 
very simple thread).
+  // 2. Clear the interrupted status at the end so that it won't impact 
the `runContinuous`
+  //call. We may clear the interrupted status set by `stop`, but it 
doesn't affect the query
+  //termination because `runActivatedStream` will check `state` and 
exit accordingly.
+  queryExecutionThread.runUninterruptibly {
+try {
+  epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)
+} finally {
+  SparkEnv.get.rpcEnv.stop(epochEndpoint)
+  epochUpdateThread.interrupt()
+  epochUpdateThread.join()
+  stopSources()
+  // The following line must be the last line because it may fail if 
SparkContext is stopped
+  sparkSession.sparkContext.cancelJobGroup(runId.toString)
+}
+  }
+  Thread.interrupted()
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-27111][SS] Fix a race that a continuous query may fail with InterruptedException

2019-03-09 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 53590f2  [SPARK-27111][SS] Fix a race that a continuous query may fail 
with InterruptedException
53590f2 is described below

commit 53590f275a7ebcd015120b576905ce999e50331e
Author: Shixiong Zhu 
AuthorDate: Sat Mar 9 14:26:58 2019 -0800

[SPARK-27111][SS] Fix a race that a continuous query may fail with 
InterruptedException

Before a Kafka consumer gets assigned with partitions, its offset will 
contain 0 partitions. However, runContinuous will still run and launch a Spark 
job having 0 partitions. In this case, there is a race that epoch may interrupt 
the query execution thread after `lastExecution.toRdd`, and either 
`epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next 
`runContinuous` will get interrupted unintentionally.

To handle this case, this PR has the following changes:

- Clean up the resources in `queryExecutionThread.runUninterruptibly`. This 
may increase the waiting time of `stop` but should be minor because the 
operations here are very fast (just sending an RPC message in the same process 
and stopping a very simple thread).
- Clear the interrupted status at the end so that it won't impact the 
`runContinuous` call. We may clear the interrupted status set by `stop`, but it 
doesn't affect the query termination because `runActivatedStream` will check 
`state` and exit accordingly.

I also updated the clean up codes to make sure exceptions thrown from 
`epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` won't stop the 
clean up.

Jenkins

Closes #24034 from zsxwing/SPARK-27111.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 
(cherry picked from commit 6e1c0827ece1cdc615196e60cb11c76b917b8eeb)
Signed-off-by: Shixiong Zhu 
---
 .../streaming/continuous/ContinuousExecution.scala | 32 --
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
index 2e24fa6..3037c01 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
@@ -272,14 +272,30 @@ class ContinuousExecution(
 logInfo(s"Query $id ignoring exception from reconfiguring: $t")
 // interrupted by reconfiguration - swallow exception so we can 
restart the query
 } finally {
-  epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)
-  SparkEnv.get.rpcEnv.stop(epochEndpoint)
-
-  epochUpdateThread.interrupt()
-  epochUpdateThread.join()
-
-  stopSources()
-  sparkSession.sparkContext.cancelJobGroup(runId.toString)
+  // The above execution may finish before getting interrupted, for 
example, a Spark job having
+  // 0 partitions will complete immediately. Then the interrupted status 
will sneak here.
+  //
+  // To handle this case, we do the two things here:
+  //
+  // 1. Clean up the resources in 
`queryExecutionThread.runUninterruptibly`. This may increase
+  //the waiting time of `stop` but should be minor because the 
operations here are very fast
+  //(just sending an RPC message in the same process and stopping a 
very simple thread).
+  // 2. Clear the interrupted status at the end so that it won't impact 
the `runContinuous`
+  //call. We may clear the interrupted status set by `stop`, but it 
doesn't affect the query
+  //termination because `runActivatedStream` will check `state` and 
exit accordingly.
+  queryExecutionThread.runUninterruptibly {
+try {
+  epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)
+} finally {
+  SparkEnv.get.rpcEnv.stop(epochEndpoint)
+  epochUpdateThread.interrupt()
+  epochUpdateThread.join()
+  stopSources()
+  // The following line must be the last line because it may fail if 
SparkContext is stopped
+  sparkSession.sparkContext.cancelJobGroup(runId.toString)
+}
+  }
+  Thread.interrupted()
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27111][SS] Fix a race that a continuous query may fail with InterruptedException

2019-03-09 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository.

zsxwing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6e1c082  [SPARK-27111][SS] Fix a race that a continuous query may fail 
with InterruptedException
6e1c082 is described below

commit 6e1c0827ece1cdc615196e60cb11c76b917b8eeb
Author: Shixiong Zhu 
AuthorDate: Sat Mar 9 14:26:58 2019 -0800

[SPARK-27111][SS] Fix a race that a continuous query may fail with 
InterruptedException

## What changes were proposed in this pull request?

Before a Kafka consumer gets assigned with partitions, its offset will 
contain 0 partitions. However, runContinuous will still run and launch a Spark 
job having 0 partitions. In this case, there is a race that epoch may interrupt 
the query execution thread after `lastExecution.toRdd`, and either 
`epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next 
`runContinuous` will get interrupted unintentionally.

To handle this case, this PR has the following changes:

- Clean up the resources in `queryExecutionThread.runUninterruptibly`. This 
may increase the waiting time of `stop` but should be minor because the 
operations here are very fast (just sending an RPC message in the same process 
and stopping a very simple thread).
- Clear the interrupted status at the end so that it won't impact the 
`runContinuous` call. We may clear the interrupted status set by `stop`, but it 
doesn't affect the query termination because `runActivatedStream` will check 
`state` and exit accordingly.

I also updated the clean up codes to make sure exceptions thrown from 
`epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` won't stop the 
clean up.

## How was this patch tested?

Jenkins

Closes #24034 from zsxwing/SPARK-27111.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 
---
 .../streaming/continuous/ContinuousExecution.scala | 30 +-
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
index 26b5642..aef556d 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala
@@ -268,13 +268,29 @@ class ContinuousExecution(
 logInfo(s"Query $id ignoring exception from reconfiguring: $t")
 // interrupted by reconfiguration - swallow exception so we can 
restart the query
 } finally {
-  epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)
-  SparkEnv.get.rpcEnv.stop(epochEndpoint)
-
-  epochUpdateThread.interrupt()
-  epochUpdateThread.join()
-
-  sparkSession.sparkContext.cancelJobGroup(runId.toString)
+  // The above execution may finish before getting interrupted, for 
example, a Spark job having
+  // 0 partitions will complete immediately. Then the interrupted status 
will sneak here.
+  //
+  // To handle this case, we do the two things here:
+  //
+  // 1. Clean up the resources in 
`queryExecutionThread.runUninterruptibly`. This may increase
+  //the waiting time of `stop` but should be minor because the 
operations here are very fast
+  //(just sending an RPC message in the same process and stopping a 
very simple thread).
+  // 2. Clear the interrupted status at the end so that it won't impact 
the `runContinuous`
+  //call. We may clear the interrupted status set by `stop`, but it 
doesn't affect the query
+  //termination because `runActivatedStream` will check `state` and 
exit accordingly.
+  queryExecutionThread.runUninterruptibly {
+try {
+  epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)
+} finally {
+  SparkEnv.get.rpcEnv.stop(epochEndpoint)
+  epochUpdateThread.interrupt()
+  epochUpdateThread.join()
+  // The following line must be the last line because it may fail if 
SparkContext is stopped
+  sparkSession.sparkContext.cancelJobGroup(runId.toString)
+}
+  }
+  Thread.interrupted()
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-25838][ML] Remove formatVersion from Saveable

2019-03-09 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 25bcf59  [SPARK-25838][ML] Remove formatVersion from Saveable
25bcf59 is described below

commit 25bcf59b3b566b77bfc8a40a4f4253b81f340aa4
Author: Marco Gaido 
AuthorDate: Sat Mar 9 09:44:20 2019 -0600

[SPARK-25838][ML] Remove formatVersion from Saveable

## What changes were proposed in this pull request?

`Saveable` interface introduces `formatVersion` which is protected and it 
is used nowhere. So the PR proposes to remove it.

## How was this patch tested?

existing tests

Closes #22830 from mgaido91/SPARK-25838.

Authored-by: Marco Gaido 
Signed-off-by: Sean Owen 
---
 .../mllib/classification/LogisticRegression.scala   |  2 --
 .../spark/mllib/classification/NaiveBayes.scala |  2 --
 .../org/apache/spark/mllib/classification/SVM.scala |  2 --
 .../mllib/clustering/BisectingKMeansModel.scala |  2 --
 .../mllib/clustering/GaussianMixtureModel.scala |  2 --
 .../apache/spark/mllib/clustering/KMeansModel.scala |  2 --
 .../apache/spark/mllib/clustering/LDAModel.scala|  4 
 .../mllib/clustering/PowerIterationClustering.scala |  2 --
 .../apache/spark/mllib/feature/ChiSqSelector.scala  |  2 --
 .../org/apache/spark/mllib/feature/Word2Vec.scala   |  2 --
 .../scala/org/apache/spark/mllib/fpm/FPGrowth.scala |  2 --
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala |  2 --
 .../recommendation/MatrixFactorizationModel.scala   |  2 --
 .../spark/mllib/regression/IsotonicRegression.scala |  2 --
 .../org/apache/spark/mllib/regression/Lasso.scala   |  2 --
 .../spark/mllib/regression/LinearRegression.scala   |  2 --
 .../spark/mllib/regression/RidgeRegression.scala|  2 --
 .../spark/mllib/tree/model/DecisionTreeModel.scala  |  4 
 .../spark/mllib/tree/model/treeEnsembleModels.scala |  8 
 .../org/apache/spark/mllib/util/modelSaveLoad.scala |  3 ---
 project/MimaExcludes.scala  | 21 +
 21 files changed, 21 insertions(+), 51 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
index 4b65000..d86aa01 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
@@ -163,8 +163,6 @@ class LogisticRegressionModel @Since("1.3.0") (
   numFeatures, numClasses, weights, intercept, threshold)
   }
 
-  override protected def formatVersion: String = "1.0"
-
   override def toString: String = {
 s"${super.toString}, numClasses = ${numClasses}, threshold = 
${threshold.getOrElse("None")}"
   }
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
index 16ba6ca..79bb4ad 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
@@ -170,8 +170,6 @@ class NaiveBayesModel private[spark] (
 val data = NaiveBayesModel.SaveLoadV2_0.Data(labels, pi, theta, modelType)
 NaiveBayesModel.SaveLoadV2_0.save(sc, path, data)
   }
-
-  override protected def formatVersion: String = "2.0"
 }
 
 @Since("1.3.0")
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/SVM.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/SVM.scala
index 5fb04ed..087c2c2 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/classification/SVM.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/classification/SVM.scala
@@ -85,8 +85,6 @@ class SVMModel @Since("1.1.0") (
   numFeatures = weights.size, numClasses = 2, weights, intercept, 
threshold)
   }
 
-  override protected def formatVersion: String = "1.0"
-
   override def toString: String = {
 s"${super.toString}, numClasses = 2, threshold = 
${threshold.getOrElse("None")}"
   }
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
index b54b891..c397911 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
@@ -112,8 +112,6 @@ class BisectingKMeansModel private[clustering] (
   override def save(sc: SparkContext, path: String): Unit = {
 BisectingKMeansModel.SaveLoadV3_0.save(sc, this, path)
   }
-
-  override protected def formatVersion: String = "3.0"
 }
 
 @Since("2.0.0")
diff --git 

[spark] branch branch-2.3 updated: [SPARK-27080][SQL] bug fix: mergeWithMetastoreSchema with uniform lower case comparison

2019-03-09 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new b6d5b0a  [SPARK-27080][SQL] bug fix: mergeWithMetastoreSchema with 
uniform lower case comparison
b6d5b0a is described below

commit b6d5b0a6347faf4dd95321c9646e78d8bb6bb00d
Author: CodeGod <>
AuthorDate: Sat Mar 9 21:28:10 2019 +0800

[SPARK-27080][SQL] bug fix: mergeWithMetastoreSchema with uniform lower 
case comparison

When reading parquet file with merging metastore schema and file schema, we 
should compare field names using uniform case. In current implementation, 
lowercase is used but one omission. And this patch fix it.

Unit test

Closes #24001 from codeborui/mergeSchemaBugFix.

Authored-by: CodeGod <>
Signed-off-by: Wenchen Fan 
(cherry picked from commit a29df5fa02111f57965be2ab5e208f5c815265fe)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/hive/HiveMetastoreCatalog.scala  |  2 +-
 .../spark/sql/hive/HiveSchemaInferenceSuite.scala  | 26 ++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
index 8adfda0..0138de6 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
@@ -282,7 +282,7 @@ private[hive] object HiveMetastoreCatalog {
 // Merge missing nullable fields to inferred schema and build a 
case-insensitive field map.
 val inferredFields = StructType(inferredSchema ++ missingNullables)
   .map(f => f.name.toLowerCase -> f).toMap
-StructType(metastoreSchema.map(f => f.copy(name = 
inferredFields(f.name).name)))
+StructType(metastoreSchema.map(f => f.copy(name = 
inferredFields(f.name.toLowerCase).name)))
   } catch {
 case NonFatal(_) =>
   val msg = s"""Detected conflicting schemas when merging the schema 
obtained from the Hive
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
index f2d2767..784434f 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
@@ -262,6 +262,32 @@ class HiveSchemaInferenceSuite
 StructType(Seq(StructField("lowerCase", BinaryType
 }
 
+// Parquet schema is subset of metaStore schema and has uppercase field 
name
+assertResult(
+  StructType(Seq(
+StructField("UPPERCase", DoubleType, nullable = true),
+StructField("lowerCase", BinaryType, nullable = true {
+
+  HiveMetastoreCatalog.mergeWithMetastoreSchema(
+StructType(Seq(
+  StructField("UPPERCase", DoubleType, nullable = true),
+  StructField("lowerCase", BinaryType, nullable = true))),
+
+StructType(Seq(
+  StructField("lowerCase", BinaryType, nullable = true
+}
+
+// Metastore schema contains additional nullable fields.
+assert(intercept[Throwable] {
+  HiveMetastoreCatalog.mergeWithMetastoreSchema(
+StructType(Seq(
+  StructField("UPPERCase", DoubleType, nullable = false),
+  StructField("lowerCase", BinaryType, nullable = true))),
+
+StructType(Seq(
+  StructField("lowerCase", BinaryType, nullable = true
+}.getMessage.contains("Detected conflicting schemas"))
+
 // Check that merging missing nullable fields works as expected.
 assertResult(
   StructType(Seq(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-27080][SQL] bug fix: mergeWithMetastoreSchema with uniform lower case comparison

2019-03-09 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new c1b6fe4  [SPARK-27080][SQL] bug fix: mergeWithMetastoreSchema with 
uniform lower case comparison
c1b6fe4 is described below

commit c1b6fe479649c482947dfce6b6db67b159bd78a3
Author: CodeGod <>
AuthorDate: Sat Mar 9 21:28:10 2019 +0800

[SPARK-27080][SQL] bug fix: mergeWithMetastoreSchema with uniform lower 
case comparison

When reading parquet file with merging metastore schema and file schema, we 
should compare field names using uniform case. In current implementation, 
lowercase is used but one omission. And this patch fix it.

Unit test

Closes #24001 from codeborui/mergeSchemaBugFix.

Authored-by: CodeGod <>
Signed-off-by: Wenchen Fan 
(cherry picked from commit a29df5fa02111f57965be2ab5e208f5c815265fe)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/hive/HiveMetastoreCatalog.scala  |  2 +-
 .../spark/sql/hive/HiveSchemaInferenceSuite.scala  | 26 ++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
index 8adfda0..0138de6 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
@@ -282,7 +282,7 @@ private[hive] object HiveMetastoreCatalog {
 // Merge missing nullable fields to inferred schema and build a 
case-insensitive field map.
 val inferredFields = StructType(inferredSchema ++ missingNullables)
   .map(f => f.name.toLowerCase -> f).toMap
-StructType(metastoreSchema.map(f => f.copy(name = 
inferredFields(f.name).name)))
+StructType(metastoreSchema.map(f => f.copy(name = 
inferredFields(f.name.toLowerCase).name)))
   } catch {
 case NonFatal(_) =>
   val msg = s"""Detected conflicting schemas when merging the schema 
obtained from the Hive
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
index 51a48a2..b62c3c7 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
@@ -263,6 +263,32 @@ class HiveSchemaInferenceSuite
 StructType(Seq(StructField("lowerCase", BinaryType
 }
 
+// Parquet schema is subset of metaStore schema and has uppercase field 
name
+assertResult(
+  StructType(Seq(
+StructField("UPPERCase", DoubleType, nullable = true),
+StructField("lowerCase", BinaryType, nullable = true {
+
+  HiveMetastoreCatalog.mergeWithMetastoreSchema(
+StructType(Seq(
+  StructField("UPPERCase", DoubleType, nullable = true),
+  StructField("lowerCase", BinaryType, nullable = true))),
+
+StructType(Seq(
+  StructField("lowerCase", BinaryType, nullable = true
+}
+
+// Metastore schema contains additional nullable fields.
+assert(intercept[Throwable] {
+  HiveMetastoreCatalog.mergeWithMetastoreSchema(
+StructType(Seq(
+  StructField("UPPERCase", DoubleType, nullable = false),
+  StructField("lowerCase", BinaryType, nullable = true))),
+
+StructType(Seq(
+  StructField("lowerCase", BinaryType, nullable = true
+}.getMessage.contains("Detected conflicting schemas"))
+
 // Check that merging missing nullable fields works as expected.
 assertResult(
   StructType(Seq(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27080][SQL] bug fix: mergeWithMetastoreSchema with uniform lower case comparison

2019-03-09 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a29df5f  [SPARK-27080][SQL] bug fix: mergeWithMetastoreSchema with 
uniform lower case comparison
a29df5f is described below

commit a29df5fa02111f57965be2ab5e208f5c815265fe
Author: CodeGod <>
AuthorDate: Sat Mar 9 21:28:10 2019 +0800

[SPARK-27080][SQL] bug fix: mergeWithMetastoreSchema with uniform lower 
case comparison

## What changes were proposed in this pull request?
When reading parquet file with merging metastore schema and file schema, we 
should compare field names using uniform case. In current implementation, 
lowercase is used but one omission. And this patch fix it.

## How was this patch tested?
Unit test

Closes #24001 from codeborui/mergeSchemaBugFix.

Authored-by: CodeGod <>
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/hive/HiveMetastoreCatalog.scala  |  2 +-
 .../spark/sql/hive/HiveSchemaInferenceSuite.scala  | 26 ++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
index 03f4b8d..d6b2945 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
@@ -326,8 +326,8 @@ private[hive] object HiveMetastoreCatalog {
 // Merge missing nullable fields to inferred schema and build a 
case-insensitive field map.
 val inferredFields = StructType(inferredSchema ++ missingNullables)
   .map(f => f.name.toLowerCase -> f).toMap
+StructType(metastoreSchema.map(f => f.copy(name = 
inferredFields(f.name.toLowerCase).name)))
 // scalastyle:on caselocale
-StructType(metastoreSchema.map(f => f.copy(name = 
inferredFields(f.name).name)))
   } catch {
 case NonFatal(_) =>
   val msg = s"""Detected conflicting schemas when merging the schema 
obtained from the Hive
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
index aa4fc13..590ef94 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
@@ -264,6 +264,32 @@ class HiveSchemaInferenceSuite
 StructType(Seq(StructField("lowerCase", BinaryType
 }
 
+// Parquet schema is subset of metaStore schema and has uppercase field 
name
+assertResult(
+  StructType(Seq(
+StructField("UPPERCase", DoubleType, nullable = true),
+StructField("lowerCase", BinaryType, nullable = true {
+
+  HiveMetastoreCatalog.mergeWithMetastoreSchema(
+StructType(Seq(
+  StructField("UPPERCase", DoubleType, nullable = true),
+  StructField("lowerCase", BinaryType, nullable = true))),
+
+StructType(Seq(
+  StructField("lowerCase", BinaryType, nullable = true
+}
+
+// Metastore schema contains additional nullable fields.
+assert(intercept[Throwable] {
+  HiveMetastoreCatalog.mergeWithMetastoreSchema(
+StructType(Seq(
+  StructField("UPPERCase", DoubleType, nullable = false),
+  StructField("lowerCase", BinaryType, nullable = true))),
+
+StructType(Seq(
+  StructField("lowerCase", BinaryType, nullable = true
+}.getMessage.contains("Detected conflicting schemas"))
+
 // Check that merging missing nullable fields works as expected.
 assertResult(
   StructType(Seq(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org