date:20180919

svn commit: r29533 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_22_02-dfcff38-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-09-19 Thread pwendell

Author: pwendell
Date: Thu Sep 20 05:19:10 2018
New Revision: 29533

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_09_19_22_02-dfcff38 docs


[This commit notification would consist of 1475 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r29532 - in /dev/spark/2.3.3-SNAPSHOT-2018_09_19_22_02-e319a62-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-09-19 Thread pwendell

Author: pwendell
Date: Thu Sep 20 05:17:54 2018
New Revision: 29532

Log:
Apache Spark 2.3.3-SNAPSHOT-2018_09_19_22_02-e319a62 docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 95b177c8f -> 0e31a6f25


[SPARK-25339][TEST] Refactor FilterPushdownBenchmark

## What changes were proposed in this pull request?

Refactor `FilterPushdownBenchmark` use `main` method. we can use 3 ways to run 
this test now:

1. bin/spark-submit --class 
org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark 
spark-sql_2.11-2.5.0-SNAPSHOT-tests.jar
2. build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark"
3. SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark"

The method 2 and the method 3 do not need to compile the 
`spark-sql_*-tests.jar` package. So these two methods are mainly for developers 
to quickly do benchmark.

## How was this patch tested?

manual tests

Closes #22443 from wangyum/SPARK-25339.

Authored-by: Yuming Wang 
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0e31a6f2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0e31a6f2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0e31a6f2

Branch: refs/heads/master
Commit: 0e31a6f25e0263b144255a6630e1d381fe2d27a7
Parents: 95b177c
Author: Yuming Wang 
Authored: Thu Sep 20 12:34:39 2018 +0800
Committer: Wenchen Fan 
Committed: Thu Sep 20 12:34:39 2018 +0800

--
 .../org/apache/spark/util/BenchmarkBase.scala   |  57 
 .../benchmark/FilterPushdownBenchmark.scala | 333 +--
 2 files changed, 206 insertions(+), 184 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0e31a6f2/core/src/main/scala/org/apache/spark/util/BenchmarkBase.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/BenchmarkBase.scala 
b/core/src/main/scala/org/apache/spark/util/BenchmarkBase.scala
new file mode 100644
index 000..c84032b
--- /dev/null
+++ b/core/src/main/scala/org/apache/spark/util/BenchmarkBase.scala
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.io.{File, FileOutputStream, OutputStream}
+
+/**
+ * A base class for generate benchmark results to a file.
+ */
+abstract class BenchmarkBase {
+  var output: Option[OutputStream] = None
+
+  def benchmark(): Unit
+
+  final def runBenchmark(benchmarkName: String)(func: => Any): Unit = {
+val separator = "=" * 96
+val testHeader = (separator + '\n' + benchmarkName + '\n' + separator + 
'\n' + '\n').getBytes
+output.foreach(_.write(testHeader))
+func
+output.foreach(_.write('\n'))
+  }
+
+  def main(args: Array[String]): Unit = {
+val regenerateBenchmarkFiles: Boolean = 
System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1"
+if (regenerateBenchmarkFiles) {
+  val resultFileName = s"${this.getClass.getSimpleName.replace("$", 
"")}-results.txt"
+  val file = new File(s"benchmarks/$resultFileName")
+  if (!file.exists()) {
+file.createNewFile()
+  }
+  output = Some(new FileOutputStream(file))
+}
+
+benchmark()
+
+output.foreach { o =>
+  if (o != null) {
+o.close()
+  }
+}
+  }
+}

http://git-wip-us.apache.org/repos/asf/spark/blob/0e31a6f2/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
index d6dfdec..9ecea99 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
@@ -17,29 +17,28 @@
 
 package org.apache.spark.sql.execution.benchmark
 
-import java.io.{File, FileOutputStream, OutputStream}
+import

spark git commit: [SPARK-23648][R][SQL] Adds more types for hint in SparkR

2018-09-19 Thread felixcheung

Repository: spark
Updated Branches:
  refs/heads/master 76399d75e -> 95b177c8f


[SPARK-23648][R][SQL] Adds more types for hint in SparkR

## What changes were proposed in this pull request?

Addition of numeric and list hints for  SparkR.

## How was this patch tested?
Add test in test_sparkSQL.R

Author: Huaxin Gao 

Closes #21649 from huaxingao/spark-23648.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/95b177c8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/95b177c8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/95b177c8

Branch: refs/heads/master
Commit: 95b177c8f0862c6965a7c3cd76b3935c975adee9
Parents: 76399d7
Author: Huaxin Gao 
Authored: Wed Sep 19 21:27:30 2018 -0700
Committer: Felix Cheung 
Committed: Wed Sep 19 21:27:30 2018 -0700

--
 R/pkg/R/DataFrame.R   | 12 +++-
 R/pkg/tests/fulltests/test_sparkSQL.R |  9 +
 2 files changed, 20 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/95b177c8/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 458deca..a1cb478 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -3985,7 +3985,17 @@ setMethod("hint",
   signature(x = "SparkDataFrame", name = "character"),
   function(x, name, ...) {
 parameters <- list(...)
-stopifnot(all(sapply(parameters, is.character)))
+if (!all(sapply(parameters, function(y) {
+  if (is.character(y) || is.numeric(y)) {
+TRUE
+  } else if (is.list(y)) {
+all(sapply(y, function(z) { is.character(z) || is.numeric(z) 
}))
+  } else {
+FALSE
+  }
+}))) {
+  stop("sql hint should be character, numeric, or list with 
character or numeric.")
+}
 jdf <- callJMethod(x@sdf, "hint", name, parameters)
 dataFrame(jdf)
   })

http://git-wip-us.apache.org/repos/asf/spark/blob/95b177c8/R/pkg/tests/fulltests/test_sparkSQL.R
--
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index 0c4bdb3..40d8f80 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -2419,6 +2419,15 @@ test_that("join(), crossJoin() and merge() on a 
DataFrame", {
   expect_true(any(grepl("BroadcastHashJoin", execution_plan_broadcast)))
 })
 
+test_that("test hint", {
+  df <- sql("SELECT * FROM range(10e10)")
+  hintList <- list("hint2", "hint3", "hint4")
+  execution_plan_hint <- capture.output(
+explain(hint(df, "hint1", 1.23456, "aa", hintList), TRUE)
+  )
+  expect_true(any(grepl("1.23456, aa", execution_plan_hint)))
+})
+
 test_that("toJSON() on DataFrame", {
   df <- as.DataFrame(cars)
   df_json <- toJSON(df)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled

2018-09-19 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 06efed290 -> dfcff3839


[SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled

## What changes were proposed in this pull request?
This patch adds an "optimizer" prefix to nested schema pruning.

## How was this patch tested?
Should be covered by existing tests.

Closes #22475 from rxin/SPARK-4502.

Authored-by: Reynold Xin 
Signed-off-by: gatorsmile 
(cherry picked from commit 76399d75e23f2c7d6c2a1fb77a4387c5e15c809b)
Signed-off-by: gatorsmile 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dfcff383
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dfcff383
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dfcff383

Branch: refs/heads/branch-2.4
Commit: dfcff38394929970fee454c69864d0e10d59f8d4
Parents: 06efed2
Author: Reynold Xin 
Authored: Wed Sep 19 21:23:35 2018 -0700
Committer: gatorsmile 
Committed: Wed Sep 19 21:23:49 2018 -0700

--
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dfcff383/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 8b82fe1..c5901ee 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1437,7 +1437,7 @@ object SQLConf {
 .createWithDefault(true)
 
   val NESTED_SCHEMA_PRUNING_ENABLED =
-buildConf("spark.sql.nestedSchemaPruning.enabled")
+buildConf("spark.sql.optimizer.nestedSchemaPruning.enabled")
   .internal()
   .doc("Prune nested fields from a logical relation's output which are 
unnecessary in " +
 "satisfying a query. This optimization allows columnar file format 
readers to avoid " +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled

2018-09-19 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 47d6e80a2 -> 76399d75e


[SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled

## What changes were proposed in this pull request?
This patch adds an "optimizer" prefix to nested schema pruning.

## How was this patch tested?
Should be covered by existing tests.

Closes #22475 from rxin/SPARK-4502.

Authored-by: Reynold Xin 
Signed-off-by: gatorsmile 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/76399d75
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/76399d75
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/76399d75

Branch: refs/heads/master
Commit: 76399d75e23f2c7d6c2a1fb77a4387c5e15c809b
Parents: 47d6e80
Author: Reynold Xin 
Authored: Wed Sep 19 21:23:35 2018 -0700
Committer: gatorsmile 
Committed: Wed Sep 19 21:23:35 2018 -0700

--
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/76399d75/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 907221c..a01e87c 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1457,7 +1457,7 @@ object SQLConf {
 .createWithDefault(true)
 
   val NESTED_SCHEMA_PRUNING_ENABLED =
-buildConf("spark.sql.nestedSchemaPruning.enabled")
+buildConf("spark.sql.optimizer.nestedSchemaPruning.enabled")
   .internal()
   .doc("Prune nested fields from a logical relation's output which are 
unnecessary in " +
 "satisfying a query. This optimization allows columnar file format 
readers to avoid " +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r29531 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_20_02-47d6e80-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-09-19 Thread pwendell

Author: pwendell
Date: Thu Sep 20 03:16:55 2018
New Revision: 29531

Log:
Apache Spark 2.5.0-SNAPSHOT-2018_09_19_20_02-47d6e80 docs


[This commit notification would consist of 1484 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25457][SQL] IntegralDivide returns data type of the operands

2018-09-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 8aae49afc -> 47d6e80a2


[SPARK-25457][SQL] IntegralDivide returns data type of the operands

## What changes were proposed in this pull request?

The PR proposes to return the data type of the operands as a result for the 
`div` operator. Before the PR, `bigint` is always returned. It introduces also 
a `spark.sql.legacy.integralDivide.returnBigint` config in order to let the 
users restore the legacy behavior.

## How was this patch tested?

added UTs

Closes #22465 from mgaido91/SPARK-25457.

Authored-by: Marco Gaido 
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47d6e80a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47d6e80a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/47d6e80a

Branch: refs/heads/master
Commit: 47d6e80a2e64823fabb596503fb6a6cc6f51f713
Parents: 8aae49a
Author: Marco Gaido 
Authored: Thu Sep 20 10:23:37 2018 +0800
Committer: Wenchen Fan 
Committed: Thu Sep 20 10:23:37 2018 +0800

--
 .../sql/catalyst/expressions/arithmetic.scala   |  17 +-
 .../org/apache/spark/sql/internal/SQLConf.scala |   9 +
 .../expressions/ArithmeticExpressionSuite.scala |  26 +-
 .../resources/sql-tests/inputs/operator-div.sql |  14 ++
 .../resources/sql-tests/inputs/operators.sql|   6 +-
 .../sql-tests/results/operator-div.sql.out  |  82 ++
 .../sql-tests/results/operators.sql.out | 248 ---
 7 files changed, 246 insertions(+), 156 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/47d6e80a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 1b1808f..f59b2a2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -22,6 +22,7 @@ import 
org.apache.spark.sql.catalyst.analysis.{TypeCheckResult, TypeCoercion}
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.expressions.codegen.Block._
 import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.CalendarInterval
 
@@ -327,16 +328,24 @@ case class Divide(left: Expression, right: Expression) 
extends DivModLike {
 case class IntegralDivide(left: Expression, right: Expression) extends 
DivModLike {
 
   override def inputType: AbstractDataType = IntegralType
-  override def dataType: DataType = LongType
+  override def dataType: DataType = if (SQLConf.get.integralDivideReturnLong) {
+LongType
+  } else {
+left.dataType
+  }
 
   override def symbol: String = "/"
   override def sqlOperator: String = "div"
 
-  private lazy val div: (Any, Any) => Long = left.dataType match {
+  private lazy val div: (Any, Any) => Any = left.dataType match {
 case i: IntegralType =>
   val divide = i.integral.asInstanceOf[Integral[Any]].quot _
-  val toLong = i.integral.asInstanceOf[Integral[Any]].toLong _
-  (x, y) => toLong(divide(x, y))
+  if (SQLConf.get.integralDivideReturnLong) {
+val toLong = i.integral.asInstanceOf[Integral[Any]].toLong _
+(x, y) => toLong(divide(x, y))
+  } else {
+divide
+  }
   }
 
   override def evalOperation(left: Any, right: Any): Any = div(left, right)

http://git-wip-us.apache.org/repos/asf/spark/blob/47d6e80a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index c3328a6..907221c 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1561,6 +1561,13 @@ object SQLConf {
 "are performed before any UNION, EXCEPT and MINUS operations.")
   .booleanConf
   .createWithDefault(false)
+
+  val LEGACY_INTEGRALDIVIDE_RETURN_LONG = 
buildConf("spark.sql.legacy.integralDivide.returnBigint")
+.doc("If it is set to true, the div operator returns always a bigint. This 
behavior was " +
+  "inherited from Hive. Otherwise, the return type is the data type of the 
operands.")
+.internal()
+.booleanConf
+.createWithDefault(false)
 }
 
 /**
@@ -1973,6 +1980,8 @@

spark git commit: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior

2018-09-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 535bf1cc9 -> 06efed290


[SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior

## What changes were proposed in this pull request?

The PR updates the migration guide in order to explain the changes introduced 
in the behavior of the IN operator with subqueries, in particular, the improved 
handling of struct attributes in these situations.

## How was this patch tested?

NA

Closes #22469 from mgaido91/SPARK-24341_followup.

Authored-by: Marco Gaido 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 8aae49afc7997aa1da61029409ef6d8ce0ab256a)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/06efed29
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/06efed29
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/06efed29

Branch: refs/heads/branch-2.4
Commit: 06efed29047041096910f300cb14e8cfff540efc
Parents: 535bf1c
Author: Marco Gaido 
Authored: Thu Sep 20 10:10:20 2018 +0800
Committer: Wenchen Fan 
Committed: Thu Sep 20 10:10:36 2018 +0800

--
 docs/sql-programming-guide.md | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/06efed29/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 2fa29a0..c76f2e3 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1879,6 +1879,7 @@ working with timestamps in `pandas_udf`s to get the best 
performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - Since Spark 2.4, when there is a struct field in front of the IN operator 
before a subquery, the inner query must contain a struct field as well. In 
previous versions, instead, the fields of the struct were compared to the 
output of the inner query. Eg. if `a` is a `struct(a string, b int)`, in Spark 
2.4 `a in (select (1 as a, 'a' as b) from range(1))` is a valid query, while `a 
in (select 1, 'a' from range(1))` is not. In previous version it was the 
opposite.
   - In versions 2.2.1+ and 2.3, if `spark.sql.caseSensitive` is set to true, 
then the `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions incorrectly became 
case-sensitive and would resolve to columns (unless typed in lower case). In 
Spark 2.4 this has been fixed and the functions are no longer case-sensitive.
   - Since Spark 2.4, Spark will evaluate the set operations referenced in a 
query by following a precedence rule as per the SQL standard. If the order is 
not specified by parentheses, set operations are performed from left to right 
with the exception that all INTERSECT operations are performed before any 
UNION, EXCEPT or MINUS operations. The old behaviour of giving equal precedence 
to all the set operations are preserved under a newly added configuration 
`spark.sql.legacy.setopsPrecedence.enabled` with a default value of `false`. 
When this property is set to `true`, spark will evaluate the set operators from 
left to right as they appear in the query given no explicit ordering is 
enforced by usage of parenthesis.
   - Since Spark 2.4, Spark will display table description column Last Access 
value as UNKNOWN when the value was Jan 01 1970.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior

2018-09-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 936c92034 -> 8aae49afc


[SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior

## What changes were proposed in this pull request?

The PR updates the migration guide in order to explain the changes introduced 
in the behavior of the IN operator with subqueries, in particular, the improved 
handling of struct attributes in these situations.

## How was this patch tested?

NA

Closes #22469 from mgaido91/SPARK-24341_followup.

Authored-by: Marco Gaido 
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8aae49af
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8aae49af
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8aae49af

Branch: refs/heads/master
Commit: 8aae49afc7997aa1da61029409ef6d8ce0ab256a
Parents: 936c920
Author: Marco Gaido 
Authored: Thu Sep 20 10:10:20 2018 +0800
Committer: Wenchen Fan 
Committed: Thu Sep 20 10:10:20 2018 +0800

--
 docs/sql-programming-guide.md | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8aae49af/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 2fa29a0..c76f2e3 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1879,6 +1879,7 @@ working with timestamps in `pandas_udf`s to get the best 
performance, see
 
 ## Upgrading From Spark SQL 2.3 to 2.4
 
+  - Since Spark 2.4, when there is a struct field in front of the IN operator 
before a subquery, the inner query must contain a struct field as well. In 
previous versions, instead, the fields of the struct were compared to the 
output of the inner query. Eg. if `a` is a `struct(a string, b int)`, in Spark 
2.4 `a in (select (1 as a, 'a' as b) from range(1))` is a valid query, while `a 
in (select 1, 'a' from range(1))` is not. In previous version it was the 
opposite.
   - In versions 2.2.1+ and 2.3, if `spark.sql.caseSensitive` is set to true, 
then the `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions incorrectly became 
case-sensitive and would resolve to columns (unless typed in lower case). In 
Spark 2.4 this has been fixed and the functions are no longer case-sensitive.
   - Since Spark 2.4, Spark will evaluate the set operations referenced in a 
query by following a precedence rule as per the SQL standard. If the order is 
not specified by parentheses, set operations are performed from left to right 
with the exception that all INTERSECT operations are performed before any 
UNION, EXCEPT or MINUS operations. The old behaviour of giving equal precedence 
to all the set operations are preserved under a newly added configuration 
`spark.sql.legacy.setopsPrecedence.enabled` with a default value of `false`. 
When this property is set to `true`, spark will evaluate the set operators from 
left to right as they appear in the query given no explicit ordering is 
enforced by usage of parenthesis.
   - Since Spark 2.4, Spark will display table description column Last Access 
value as UNKNOWN when the value was Jan 01 1970.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled

2018-09-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 99ae693b3 -> 535bf1cc9


[SPARK-24157][SS][FOLLOWUP] Rename to 
spark.sql.streaming.noDataMicroBatches.enabled

## What changes were proposed in this pull request?
This patch changes the config option 
`spark.sql.streaming.noDataMicroBatchesEnabled` to 
`spark.sql.streaming.noDataMicroBatches.enabled` to be more consistent with 
rest of the configs. Unfortunately there is one streaming config called 
`spark.sql.streaming.metricsEnabled`. For that one we should just use a 
fallback config and change it in a separate patch.

## How was this patch tested?
Made sure no other references to this config are in the code base:
```
> git grep "noDataMicro"
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
buildConf("spark.sql.streaming.noDataMicroBatches.enabled")
```

Closes #22476 from rxin/SPARK-24157.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 
(cherry picked from commit 936c920347e196381b48bc3656ca81a06f2ff46d)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/535bf1cc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/535bf1cc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/535bf1cc

Branch: refs/heads/branch-2.4
Commit: 535bf1cc9e6b54df7059ac3109b8cba30057d040
Parents: 99ae693
Author: Reynold Xin 
Authored: Wed Sep 19 18:51:20 2018 -0700
Committer: Reynold Xin 
Committed: Wed Sep 19 18:51:31 2018 -0700

--
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/535bf1cc/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 3e9cde4..8b82fe1 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1056,7 +1056,7 @@ object SQLConf {
   .createWithDefault(1L)
 
   val STREAMING_NO_DATA_MICRO_BATCHES_ENABLED =
-buildConf("spark.sql.streaming.noDataMicroBatchesEnabled")
+buildConf("spark.sql.streaming.noDataMicroBatches.enabled")
   .doc(
 "Whether streaming micro-batch engine will execute batches without 
data " +
   "for eager state management for stateful streaming queries.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled

2018-09-19 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 90e3955f3 -> 936c92034


[SPARK-24157][SS][FOLLOWUP] Rename to 
spark.sql.streaming.noDataMicroBatches.enabled

## What changes were proposed in this pull request?
This patch changes the config option 
`spark.sql.streaming.noDataMicroBatchesEnabled` to 
`spark.sql.streaming.noDataMicroBatches.enabled` to be more consistent with 
rest of the configs. Unfortunately there is one streaming config called 
`spark.sql.streaming.metricsEnabled`. For that one we should just use a 
fallback config and change it in a separate patch.

## How was this patch tested?
Made sure no other references to this config are in the code base:
```
> git grep "noDataMicro"
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
buildConf("spark.sql.streaming.noDataMicroBatches.enabled")
```

Closes #22476 from rxin/SPARK-24157.

Authored-by: Reynold Xin 
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/936c9203
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/936c9203
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/936c9203

Branch: refs/heads/master
Commit: 936c920347e196381b48bc3656ca81a06f2ff46d
Parents: 90e3955
Author: Reynold Xin 
Authored: Wed Sep 19 18:51:20 2018 -0700
Committer: Reynold Xin 
Committed: Wed Sep 19 18:51:20 2018 -0700

--
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/936c9203/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index b1e9b17..c3328a6 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1076,7 +1076,7 @@ object SQLConf {
   .createWithDefault(1L)
 
   val STREAMING_NO_DATA_MICRO_BATCHES_ENABLED =
-buildConf("spark.sql.streaming.noDataMicroBatchesEnabled")
+buildConf("spark.sql.streaming.noDataMicroBatches.enabled")
   .doc(
 "Whether streaming micro-batch engine will execute batches without 
data " +
   "for eager state management for stateful streaming queries.")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23

2018-09-19 Thread gurwls223

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 7b5da37c0 -> e319a624e


[SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 
and Pandas 0.23

## What changes were proposed in this pull request?

Fix test that constructs a Pandas DataFrame by specifying the column order. 
Previously this test assumed the columns would be sorted alphabetically, 
however when using Python 3.6 with Pandas 0.23 or higher, the original column 
order is maintained. This causes the columns to get mixed up and the test 
errors.

Manually tested with `python/run-tests` using Python 3.6.6 and Pandas 0.23.4

Closes #22477 from BryanCutler/pyspark-tests-py36-pd23-SPARK-25471.

Authored-by: Bryan Cutler 
Signed-off-by: hyukjinkwon 
(cherry picked from commit 90e3955f384ca07bdf24faa6cdb60ded944cf0d8)
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e319a624
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e319a624
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e319a624

Branch: refs/heads/branch-2.3
Commit: e319a624e2f366a941bd92a685e1b48504c887b1
Parents: 7b5da37
Author: Bryan Cutler 
Authored: Thu Sep 20 09:29:29 2018 +0800
Committer: hyukjinkwon 
Committed: Thu Sep 20 09:30:06 2018 +0800

--
 python/pyspark/sql/tests.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e319a624/python/pyspark/sql/tests.py
--
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 6bfb329..3c5fc97 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -2885,7 +2885,7 @@ class SQLTests(ReusedSQLTestCase):
 import pandas as pd
 from datetime import datetime
 pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)],
-"d": [pd.Timestamp.now().date()]})
+"d": [pd.Timestamp.now().date()]}, columns=["d", 
"ts"])
 # test types are inferred correctly without specifying schema
 df = self.spark.createDataFrame(pdf)
 self.assertTrue(isinstance(df.schema['ts'].dataType, TimestampType))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23

2018-09-19 Thread gurwls223

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 a9a8d3a4b -> 99ae693b3


[SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 
and Pandas 0.23

## What changes were proposed in this pull request?

Fix test that constructs a Pandas DataFrame by specifying the column order. 
Previously this test assumed the columns would be sorted alphabetically, 
however when using Python 3.6 with Pandas 0.23 or higher, the original column 
order is maintained. This causes the columns to get mixed up and the test 
errors.

Manually tested with `python/run-tests` using Python 3.6.6 and Pandas 0.23.4

Closes #22477 from BryanCutler/pyspark-tests-py36-pd23-SPARK-25471.

Authored-by: Bryan Cutler 
Signed-off-by: hyukjinkwon 
(cherry picked from commit 90e3955f384ca07bdf24faa6cdb60ded944cf0d8)
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99ae693b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99ae693b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99ae693b

Branch: refs/heads/branch-2.4
Commit: 99ae693b3722db6e01825b8cf2c3f2ef74a65ddb
Parents: a9a8d3a
Author: Bryan Cutler 
Authored: Thu Sep 20 09:29:29 2018 +0800
Committer: hyukjinkwon 
Committed: Thu Sep 20 09:29:49 2018 +0800

--
 python/pyspark/sql/tests.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/99ae693b/python/pyspark/sql/tests.py
--
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 08d7cfa..603f994 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -3266,7 +3266,7 @@ class SQLTests(ReusedSQLTestCase):
 import pandas as pd
 from datetime import datetime
 pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)],
-"d": [pd.Timestamp.now().date()]})
+"d": [pd.Timestamp.now().date()]}, columns=["d", 
"ts"])
 # test types are inferred correctly without specifying schema
 df = self.spark.createDataFrame(pdf)
 self.assertTrue(isinstance(df.schema['ts'].dataType, TimestampType))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23

2018-09-19 Thread gurwls223

Repository: spark
Updated Branches:
  refs/heads/master 6f681d429 -> 90e3955f3


[SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 
and Pandas 0.23

## What changes were proposed in this pull request?

Fix test that constructs a Pandas DataFrame by specifying the column order. 
Previously this test assumed the columns would be sorted alphabetically, 
however when using Python 3.6 with Pandas 0.23 or higher, the original column 
order is maintained. This causes the columns to get mixed up and the test 
errors.

Manually tested with `python/run-tests` using Python 3.6.6 and Pandas 0.23.4

Closes #22477 from BryanCutler/pyspark-tests-py36-pd23-SPARK-25471.

Authored-by: Bryan Cutler 
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/90e3955f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/90e3955f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/90e3955f

Branch: refs/heads/master
Commit: 90e3955f384ca07bdf24faa6cdb60ded944cf0d8
Parents: 6f681d4
Author: Bryan Cutler 
Authored: Thu Sep 20 09:29:29 2018 +0800
Committer: hyukjinkwon 
Committed: Thu Sep 20 09:29:29 2018 +0800

--
 python/pyspark/sql/tests.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/90e3955f/python/pyspark/sql/tests.py
--
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 08d7cfa..603f994 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -3266,7 +3266,7 @@ class SQLTests(ReusedSQLTestCase):
 import pandas as pd
 from datetime import datetime
 pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)],
-"d": [pd.Timestamp.now().date()]})
+"d": [pd.Timestamp.now().date()]}, columns=["d", 
"ts"])
 # test types are inferred correctly without specifying schema
 df = self.spark.createDataFrame(pdf)
 self.assertTrue(isinstance(df.schema['ts'].dataType, TimestampType))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r29529 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_18_03-a9a8d3a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-09-19 Thread pwendell

Author: pwendell
Date: Thu Sep 20 01:17:31 2018
New Revision: 29529

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_09_19_18_03-a9a8d3a docs


[This commit notification would consist of 1475 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25425][SQL][BACKPORT-2.4] Extra options should override session options in DataSource V2

2018-09-19 Thread dongjoon

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 9031c7848 -> a9a8d3a4b


[SPARK-25425][SQL][BACKPORT-2.4] Extra options should override session options 
in DataSource V2

## What changes were proposed in this pull request?

In the PR, I propose overriding session options by extra options in DataSource 
V2. Extra options are more specific and set via `.option()`, and should 
overwrite more generic session options.

## How was this patch tested?

Added tests for read and write paths.

Closes #22474 from MaxGekk/session-options-2.4.

Authored-by: Maxim Gekk 
Signed-off-by: Dongjoon Hyun 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9a8d3a4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a9a8d3a4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a9a8d3a4

Branch: refs/heads/branch-2.4
Commit: a9a8d3a4b92be89defd82d5f2eeb3f9af45c687d
Parents: 9031c78
Author: Maxim Gekk 
Authored: Wed Sep 19 16:53:26 2018 -0700
Committer: Dongjoon Hyun 
Committed: Wed Sep 19 16:53:26 2018 -0700

--
 .../org/apache/spark/sql/DataFrameReader.scala  |  2 +-
 .../org/apache/spark/sql/DataFrameWriter.scala  |  8 +++--
 .../sql/sources/v2/DataSourceV2Suite.scala  | 33 
 .../sources/v2/SimpleWritableDataSource.scala   |  7 -
 4 files changed, 45 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a9a8d3a4/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index 371ec70..27a1af2 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -202,7 +202,7 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   DataSourceOptions.PATHS_KEY -> 
objectMapper.writeValueAsString(paths.toArray)
 }
 Dataset.ofRows(sparkSession, DataSourceV2Relation.create(
-  ds, extraOptions.toMap ++ sessionOptions + pathsOption,
+  ds, sessionOptions ++ extraOptions.toMap + pathsOption,
   userSpecifiedSchema = userSpecifiedSchema))
   } else {
 loadV1Source(paths: _*)

http://git-wip-us.apache.org/repos/asf/spark/blob/a9a8d3a4/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index 4aeddfd..80ade7c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -241,10 +241,12 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   val source = cls.newInstance().asInstanceOf[DataSourceV2]
   source match {
 case ws: WriteSupport =>
-  val options = extraOptions ++
-  DataSourceV2Utils.extractSessionConfigs(source, 
df.sparkSession.sessionState.conf)
+  val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
+source,
+df.sparkSession.sessionState.conf)
+  val options = sessionOptions ++ extraOptions
+  val relation = DataSourceV2Relation.create(source, options)
 
-  val relation = DataSourceV2Relation.create(source, options.toMap)
   if (mode == SaveMode.Append) {
 runCommand(df.sparkSession, "save") {
   AppendData.byName(relation, df.logicalPlan)

http://git-wip-us.apache.org/repos/asf/spark/blob/a9a8d3a4/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala
index 12beca2..bafde50 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.sql.sources.v2
 
+import java.io.File
 import java.util.{ArrayList, List => JList}
 
 import test.org.apache.spark.sql.sources.v2._
@@ -322,6 +323,38 @@ class DataSourceV2Suite extends QueryTest with 
SharedSQLContext {
 checkCanonicalizedOutput(df, 2, 2)
 checkCanonicalizedOutput(df.select('i), 2, 1)
   }
+
+  test("SPARK-25425: extra options should override sessions options during 
reading") {
+val

svn commit: r29527 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_16_02-6f681d4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-09-19 Thread pwendell

Author: pwendell
Date: Wed Sep 19 23:17:10 2018
New Revision: 29527

Log:
Apache Spark 2.5.0-SNAPSHOT-2018_09_19_16_02-6f681d4 docs


[This commit notification would consist of 1484 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark.memory limit for K8S

2018-09-19 Thread holden

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 83a75a83c -> 9031c7848


[SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark.memory limit for K8S

## What changes were proposed in this pull request?

Add spark.executor.pyspark.memory limit for K8S [BACKPORT]

## How was this patch tested?

Unit and Integration tests

Closes #22376 from ifilonenko/SPARK-25021-2.4.

Authored-by: Ilan Filonenko 
Signed-off-by: Holden Karau 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9031c784
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9031c784
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9031c784

Branch: refs/heads/branch-2.4
Commit: 9031c784847353051bc0978f63ef4146ae9095ff
Parents: 83a75a8
Author: Ilan Filonenko 
Authored: Wed Sep 19 15:37:56 2018 -0700
Committer: Holden Karau 
Committed: Wed Sep 19 15:37:56 2018 -0700

--
 dev/make-distribution.sh|  1 +
 docs/configuration.md   |  2 +-
 examples/src/main/python/py_container_checks.py | 32 -
 examples/src/main/python/pyfiles.py | 38 
 .../org/apache/spark/deploy/k8s/Config.scala|  7 +++
 .../k8s/features/BasicExecutorFeatureStep.scala | 14 +-
 .../bindings/JavaDriverFeatureStep.scala|  4 +-
 .../bindings/PythonDriverFeatureStep.scala  |  4 +-
 .../features/bindings/RDriverFeatureStep.scala  |  4 +-
 .../features/BasicDriverFeatureStepSuite.scala  |  1 -
 .../BasicExecutorFeatureStepSuite.scala | 24 ++
 .../bindings/JavaDriverFeatureStepSuite.scala   |  1 -
 .../src/main/dockerfiles/spark/Dockerfile   |  1 +
 .../dockerfiles/spark/bindings/R/Dockerfile |  2 +-
 .../spark/bindings/python/Dockerfile|  2 +-
 .../k8s/integrationtest/KubernetesSuite.scala   | 33 ++
 .../k8s/integrationtest/PythonTestsSuite.scala  | 34 +++---
 .../tests/py_container_checks.py| 32 +
 .../integration-tests/tests/pyfiles.py  | 38 
 .../tests/worker_memory_check.py| 47 
 20 files changed, 235 insertions(+), 86 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9031c784/dev/make-distribution.sh
--
diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh
index 126d39d..668682f 100755
--- a/dev/make-distribution.sh
+++ b/dev/make-distribution.sh
@@ -192,6 +192,7 @@ fi
 if [ -d "$SPARK_HOME"/resource-managers/kubernetes/core/target/ ]; then
   mkdir -p "$DISTDIR/kubernetes/"
   cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src/main/dockerfiles 
"$DISTDIR/kubernetes/"
+  cp -a "$SPARK_HOME"/resource-managers/kubernetes/integration-tests/tests 
"$DISTDIR/kubernetes/"
 fi
 
 # Copy examples and dependencies

http://git-wip-us.apache.org/repos/asf/spark/blob/9031c784/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index a3e59a0..782ccff 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -188,7 +188,7 @@ of the most common options to set are:
 unless otherwise specified.  If set, PySpark memory for an executor will be
 limited to this amount. If not set, Spark will not limit Python's memory 
use
 and it is up to the application to avoid exceeding the overhead memory 
space
-shared with other non-JVM processes. When PySpark is run in YARN, this 
memory
+shared with other non-JVM processes. When PySpark is run in YARN or 
Kubernetes, this memory
 is added to executor resource requests.
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/9031c784/examples/src/main/python/py_container_checks.py
--
diff --git a/examples/src/main/python/py_container_checks.py 
b/examples/src/main/python/py_container_checks.py
deleted file mode 100644
index f6b3be2..000
--- a/examples/src/main/python/py_container_checks.py
+++ /dev/null
@@ -1,32 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express

spark git commit: [SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different schema representation

2018-09-19 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master cb1b55cf7 -> 6f681d429


[SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different schema 
representation

## What changes were proposed in this pull request?

Improve testcase "image datasource test: read non image" to tolerate different 
schema representation.
Because file:/path and file:///path are both valid URI-ifications so in some 
environment the testcase will fail.

## How was this patch tested?

Manual.

Closes #22449 from WeichenXu123/image_url.

Authored-by: WeichenXu 
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f681d42
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f681d42
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f681d42

Branch: refs/heads/master
Commit: 6f681d42964884d19bf22deb614550d712223117
Parents: cb1b55c
Author: WeichenXu 
Authored: Wed Sep 19 15:16:20 2018 -0700
Committer: Xiangrui Meng 
Committed: Wed Sep 19 15:16:20 2018 -0700

--
 .../spark/ml/source/image/ImageFileFormatSuite.scala | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6f681d42/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
index 1a6a8d6..38e2513 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.ml.source.image
 
+import java.net.URI
 import java.nio.file.Paths
 
 import org.apache.spark.SparkFunSuite
@@ -58,8 +59,14 @@ class ImageFileFormatSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   .load(filePath)
 assert(df2.count() === 1)
 val result = df2.head()
-assert(result === invalidImageRow(
-  Paths.get(filePath).toAbsolutePath().normalize().toUri().toString))
+
+val resultOrigin = result.getStruct(0).getString(0)
+// covert `origin` to `java.net.URI` object and then compare.
+// because `file:/path` and `file:///path` are both valid URI-ifications
+assert(new URI(resultOrigin) === 
Paths.get(filePath).toAbsolutePath().normalize().toUri())
+
+// Compare other columns in the row to be the same with the 
`invalidImageRow`
+assert(result === invalidImageRow(resultOrigin))
   }
 
   test("image datasource partition test") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different schema representation

2018-09-19 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 9fefb47fe -> 83a75a83c


[SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different schema 
representation

## What changes were proposed in this pull request?

Improve testcase "image datasource test: read non image" to tolerate different 
schema representation.
Because file:/path and file:///path are both valid URI-ifications so in some 
environment the testcase will fail.

## How was this patch tested?

Manual.

Closes #22449 from WeichenXu123/image_url.

Authored-by: WeichenXu 
Signed-off-by: Xiangrui Meng 
(cherry picked from commit 6f681d42964884d19bf22deb614550d712223117)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/83a75a83
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/83a75a83
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/83a75a83

Branch: refs/heads/branch-2.4
Commit: 83a75a83cb24d20d4c2df5389bb8db34ad0335d9
Parents: 9fefb47
Author: WeichenXu 
Authored: Wed Sep 19 15:16:20 2018 -0700
Committer: Xiangrui Meng 
Committed: Wed Sep 19 15:16:30 2018 -0700

--
 .../spark/ml/source/image/ImageFileFormatSuite.scala | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/83a75a83/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
index 1a6a8d6..38e2513 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.ml.source.image
 
+import java.net.URI
 import java.nio.file.Paths
 
 import org.apache.spark.SparkFunSuite
@@ -58,8 +59,14 @@ class ImageFileFormatSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   .load(filePath)
 assert(df2.count() === 1)
 val result = df2.head()
-assert(result === invalidImageRow(
-  Paths.get(filePath).toAbsolutePath().normalize().toUri().toString))
+
+val resultOrigin = result.getStruct(0).getString(0)
+// covert `origin` to `java.net.URI` object and then compare.
+// because `file:/path` and `file:///path` are both valid URI-ifications
+assert(new URI(resultOrigin) === 
Paths.get(filePath).toAbsolutePath().normalize().toUri())
+
+// Compare other columns in the row to be the same with the 
`invalidImageRow`
+assert(result === invalidImageRow(resultOrigin))
   }
 
   test("image datasource partition test") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"

2018-09-19 Thread dongjoon

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 538ae62e0 -> 9fefb47fe


Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"

This reverts commit 6c7db7fd1ced1d143b1389d09990a620fc16be46.

(cherry picked from commit cb1b55cf771018f1560f6b173cdd7c6ca8061bc7)
Signed-off-by: Dongjoon Hyun 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9fefb47f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9fefb47f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9fefb47f

Branch: refs/heads/branch-2.4
Commit: 9fefb47feab14b865978bdb8e6155a976de72416
Parents: 538ae62
Author: Dongjoon Hyun 
Authored: Wed Sep 19 14:33:40 2018 -0700
Committer: Dongjoon Hyun 
Committed: Wed Sep 19 14:38:21 2018 -0700

--
 .../sql/catalyst/expressions/jsonExpressions.scala  |  4 ++--
 .../org/apache/spark/sql/internal/SQLConf.scala | 16 
 2 files changed, 10 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9fefb47f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index ade10ab..bd9090a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -517,12 +517,12 @@ case class JsonToStructs(
 timeZoneId: Option[String] = None)
   extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback 
with ExpectsInputTypes {
 
-  val forceNullableSchema: Boolean = 
SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA)
+  val forceNullableSchema = 
SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA)
 
   // The JSON input data might be missing certain fields. We force the 
nullability
   // of the user-provided schema to avoid data corruptions. In particular, the 
parquet-mr encoder
   // can generate incorrect files if values are missing in columns declared as 
non-nullable.
-  val nullableSchema: DataType = if (forceNullableSchema) schema.asNullable 
else schema
+  val nullableSchema = if (forceNullableSchema) schema.asNullable else schema
 
   override def nullable: Boolean = true
 

http://git-wip-us.apache.org/repos/asf/spark/blob/9fefb47f/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 5e2ac02..3e9cde4 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -588,6 +588,14 @@ object SQLConf {
 .stringConf
 .createWithDefault("_corrupt_record")
 
+  val FROM_JSON_FORCE_NULLABLE_SCHEMA = 
buildConf("spark.sql.fromJsonForceNullableSchema")
+.internal()
+.doc("When true, force the output schema of the from_json() function to be 
nullable " +
+  "(including all the fields). Otherwise, the schema might not be 
compatible with" +
+  "actual data, which leads to curruptions.")
+.booleanConf
+.createWithDefault(true)
+
   val BROADCAST_TIMEOUT = buildConf("spark.sql.broadcastTimeout")
 .doc("Timeout in seconds for the broadcast wait time in broadcast joins.")
 .timeConf(TimeUnit.SECONDS)
@@ -1334,14 +1342,6 @@ object SQLConf {
 "When this conf is not set, the value from 
`spark.redaction.string.regex` is used.")
   .fallbackConf(org.apache.spark.internal.config.STRING_REDACTION_PATTERN)
 
-  val FROM_JSON_FORCE_NULLABLE_SCHEMA = 
buildConf("spark.sql.function.fromJson.forceNullable")
-.internal()
-.doc("When true, force the output schema of the from_json() function to be 
nullable " +
-  "(including all the fields). Otherwise, the schema might not be 
compatible with" +
-  "actual data, which leads to corruptions.")
-.booleanConf
-.createWithDefault(true)
-
   val CONCAT_BINARY_AS_STRING = 
buildConf("spark.sql.function.concatBinaryAsString")
 .doc("When this option is set to false and all inputs are binary, 
`functions.concat` returns " +
   "an output as binary. Otherwise, it returns as a string. ")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"

2018-09-19 Thread dongjoon

Repository: spark
Updated Branches:
  refs/heads/master a71f6a175 -> cb1b55cf7


Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"

This reverts commit 6c7db7fd1ced1d143b1389d09990a620fc16be46.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb1b55cf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb1b55cf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb1b55cf

Branch: refs/heads/master
Commit: cb1b55cf771018f1560f6b173cdd7c6ca8061bc7
Parents: a71f6a1
Author: Dongjoon Hyun 
Authored: Wed Sep 19 14:33:40 2018 -0700
Committer: Dongjoon Hyun 
Committed: Wed Sep 19 14:33:40 2018 -0700

--
 .../sql/catalyst/expressions/jsonExpressions.scala  |  4 ++--
 .../org/apache/spark/sql/internal/SQLConf.scala | 16 
 2 files changed, 10 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cb1b55cf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index ade10ab..bd9090a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -517,12 +517,12 @@ case class JsonToStructs(
 timeZoneId: Option[String] = None)
   extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback 
with ExpectsInputTypes {
 
-  val forceNullableSchema: Boolean = 
SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA)
+  val forceNullableSchema = 
SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA)
 
   // The JSON input data might be missing certain fields. We force the 
nullability
   // of the user-provided schema to avoid data corruptions. In particular, the 
parquet-mr encoder
   // can generate incorrect files if values are missing in columns declared as 
non-nullable.
-  val nullableSchema: DataType = if (forceNullableSchema) schema.asNullable 
else schema
+  val nullableSchema = if (forceNullableSchema) schema.asNullable else schema
 
   override def nullable: Boolean = true
 

http://git-wip-us.apache.org/repos/asf/spark/blob/cb1b55cf/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 4499a35..b1e9b17 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -608,6 +608,14 @@ object SQLConf {
 .stringConf
 .createWithDefault("_corrupt_record")
 
+  val FROM_JSON_FORCE_NULLABLE_SCHEMA = 
buildConf("spark.sql.fromJsonForceNullableSchema")
+.internal()
+.doc("When true, force the output schema of the from_json() function to be 
nullable " +
+  "(including all the fields). Otherwise, the schema might not be 
compatible with" +
+  "actual data, which leads to curruptions.")
+.booleanConf
+.createWithDefault(true)
+
   val BROADCAST_TIMEOUT = buildConf("spark.sql.broadcastTimeout")
 .doc("Timeout in seconds for the broadcast wait time in broadcast joins.")
 .timeConf(TimeUnit.SECONDS)
@@ -1354,14 +1362,6 @@ object SQLConf {
 "When this conf is not set, the value from 
`spark.redaction.string.regex` is used.")
   .fallbackConf(org.apache.spark.internal.config.STRING_REDACTION_PATTERN)
 
-  val FROM_JSON_FORCE_NULLABLE_SCHEMA = 
buildConf("spark.sql.function.fromJson.forceNullable")
-.internal()
-.doc("When true, force the output schema of the from_json() function to be 
nullable " +
-  "(including all the fields). Otherwise, the schema might not be 
compatible with" +
-  "actual data, which leads to corruptions.")
-.booleanConf
-.createWithDefault(true)
-
   val CONCAT_BINARY_AS_STRING = 
buildConf("spark.sql.function.concatBinaryAsString")
 .doc("When this option is set to false and all inputs are binary, 
`functions.concat` returns " +
   "an output as binary. Otherwise, it returns as a string. ")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark-website git commit: Added the Spark Operator as a third-party project

2018-09-19 Thread liyinan926

Repository: spark-website
Updated Branches:
  refs/heads/asf-site 9b21d71d2 -> 806a1bd52


Added the Spark Operator as a third-party project

srowen.

Author: Yinan Li 

Closes #148 from liyinan926/asf-site.


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/806a1bd5
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/806a1bd5
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/806a1bd5

Branch: refs/heads/asf-site
Commit: 806a1bd5253b96ae6eb50ae6f2f3f2d7851fda42
Parents: 9b21d71
Author: Yinan Li 
Authored: Wed Sep 19 12:24:12 2018 -0700
Committer: Yinan Li 
Committed: Wed Sep 19 12:24:12 2018 -0700

--
 site/third-party-projects.html | 1 +
 third-party-projects.md| 1 +
 2 files changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark-website/blob/806a1bd5/site/third-party-projects.html
--
diff --git a/site/third-party-projects.html b/site/third-party-projects.html
index c4caf1c..aa5fd84 100644
--- a/site/third-party-projects.html
+++ b/site/third-party-projects.html
@@ -239,6 +239,7 @@ against Spark, and data scientists to use Javascript in 
Jupyter notebooks.
   https://github.com/SnappyDataInc/snappydata;>SnappyData - 
an open source 
 OLTP + OLAP database integrated with Spark on the same JVMs.
   https://github.com/Hydrospheredata/mist;>Mist - Serverless 
proxy for Spark cluster (spark middleware)
+  https://github.com/GoogleCloudPlatform/spark-on-k8s-operator;>K8S 
Operator for Apache Spark - Kubernetes operator for specifying and managing 
the lifecycle of Apache Spark applications on Kubernetes.
 
 
 Applications Using Spark

http://git-wip-us.apache.org/repos/asf/spark-website/blob/806a1bd5/third-party-projects.md
--
diff --git a/third-party-projects.md b/third-party-projects.md
index d965bce..43e7516 100644
--- a/third-party-projects.md
+++ b/third-party-projects.md
@@ -45,6 +45,7 @@ against Spark, and data scientists to use Javascript in 
Jupyter notebooks.
 - https://github.com/SnappyDataInc/snappydata;>SnappyData - an 
open source 
 OLTP + OLAP database integrated with Spark on the same JVMs.
 - https://github.com/Hydrospheredata/mist;>Mist - Serverless 
proxy for Spark cluster (spark middleware)
+- https://github.com/GoogleCloudPlatform/spark-on-k8s-operator;>K8S 
Operator for Apache Spark - Kubernetes operator for specifying and managing 
the lifecycle of Apache Spark applications on Kubernetes.
 
 Applications Using Spark
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25414][SS][TEST] make it clear that the numRows metrics should be counted for each scan of the source

2018-09-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 12b1e91e6 -> a71f6a175


[SPARK-25414][SS][TEST] make it clear that the numRows metrics should be 
counted for each scan of the source

## What changes were proposed in this pull request?

For self-join/self-union, Spark will produce a physical plan which has multiple 
`DataSourceV2ScanExec` instances referring to the same `ReadSupport` instance. 
In this case, the streaming source is indeed scanned multiple times, and the 
`numInputRows` metrics should be counted for each scan.

Actually we already have 2 test cases to verify the behavior:
1. `StreamingQuerySuite.input row calculation with same V2 source used twice in 
self-join`
2. `KafkaMicroBatchSourceSuiteBase.ensure stream-stream self-join generates 
only one offset in log and correct metrics`.

However, in these 2 tests, the expected result is different, which is super 
confusing. It turns out that, the first test doesn't trigger exchange reuse, so 
the source is scanned twice. The second test triggers exchange reuse, and the 
source is scanned only once.

This PR proposes to improve these 2 tests, to test with/without exchange reuse.

## How was this patch tested?

test only change

Closes #22402 from cloud-fan/bug.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a71f6a17
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a71f6a17
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a71f6a17

Branch: refs/heads/master
Commit: a71f6a1750fd0a29ecae6b98673ee15840da1c62
Parents: 12b1e91
Author: Wenchen Fan 
Authored: Thu Sep 20 00:29:48 2018 +0800
Committer: Wenchen Fan 
Committed: Thu Sep 20 00:29:48 2018 +0800

--
 .../kafka010/KafkaMicroBatchSourceSuite.scala   | 39 +
 .../execution/streaming/ProgressReporter.scala  |  6 +--
 .../sql/streaming/StreamingQuerySuite.scala | 46 +++-
 3 files changed, 68 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a71f6a17/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
--
diff --git 
a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
 
b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
index 8e246db..e5f0088 100644
--- 
a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
+++ 
b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
@@ -35,9 +35,11 @@ import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.sql.{ForeachWriter, SparkSession}
 import 
org.apache.spark.sql.execution.datasources.v2.StreamingDataSourceV2Relation
+import org.apache.spark.sql.execution.exchange.ReusedExchangeExec
 import org.apache.spark.sql.execution.streaming._
 import org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution
 import org.apache.spark.sql.functions.{count, window}
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.kafka010.KafkaSourceProvider._
 import org.apache.spark.sql.sources.v2.DataSourceOptions
 import org.apache.spark.sql.streaming.{ProcessingTime, StreamTest}
@@ -598,18 +600,37 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 
 val join = values.join(values, "key")
 
-testStream(join)(
-  makeSureGetOffsetCalled,
-  AddKafkaData(Set(topic), 1, 2),
-  CheckAnswer((1, 1, 1), (2, 2, 2)),
-  AddKafkaData(Set(topic), 6, 3),
-  CheckAnswer((1, 1, 1), (2, 2, 2), (3, 3, 3), (1, 6, 1), (1, 1, 6), (1, 
6, 6)),
-  AssertOnQuery { q =>
+def checkQuery(check: AssertOnQuery): Unit = {
+  testStream(join)(
+makeSureGetOffsetCalled,
+AddKafkaData(Set(topic), 1, 2),
+CheckAnswer((1, 1, 1), (2, 2, 2)),
+AddKafkaData(Set(topic), 6, 3),
+CheckAnswer((1, 1, 1), (2, 2, 2), (3, 3, 3), (1, 6, 1), (1, 1, 6), (1, 
6, 6)),
+check
+  )
+}
+
+withSQLConf(SQLConf.EXCHANGE_REUSE_ENABLED.key -> "false") {
+  checkQuery(AssertOnQuery { q =>
 assert(q.availableOffsets.iterator.size == 1)
+// The kafka source is scanned twice because of self-join
+assert(q.recentProgress.map(_.numInputRows).sum == 8)
+true
+  })
+}
+
+withSQLConf(SQLConf.EXCHANGE_REUSE_ENABLED.key -> "true") {
+  checkQuery(AssertOnQuery { q =>
+assert(q.availableOffsets.iterator.size == 1)
+assert(q.lastExecution.executedPlan.collect {
+  case r: ReusedExchangeExec => r
+}.length == 1)
+// The kafka

svn commit: r29514 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_08_14-12b1e91-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-09-19 Thread pwendell

Author: pwendell
Date: Wed Sep 19 15:29:02 2018
New Revision: 29514

Log:
Apache Spark 2.5.0-SNAPSHOT-2018_09_19_08_14-12b1e91 docs


[This commit notification would consist of 1484 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r29509 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_06_03-538ae62-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-09-19 Thread pwendell

Author: pwendell
Date: Wed Sep 19 13:17:20 2018
New Revision: 29509

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_09_19_06_03-538ae62 docs


[This commit notification would consist of 1475 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25358][SQL] MutableProjection supports fallback to an interpreted mode

2018-09-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 5534a3a58 -> 12b1e91e6


[SPARK-25358][SQL] MutableProjection supports fallback to an interpreted mode

## What changes were proposed in this pull request?
In SPARK-23711, `UnsafeProjection` supports fallback to an interpreted mode. 
Therefore, this pr fixed code to support the same fallback mode in 
`MutableProjection` based on `CodeGeneratorWithInterpretedFallback`.

## How was this patch tested?
Added tests in `CodeGeneratorWithInterpretedFallbackSuite`.

Closes #22355 from maropu/SPARK-25358.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/12b1e91e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/12b1e91e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/12b1e91e

Branch: refs/heads/master
Commit: 12b1e91e6b5135f6ed3e59a49abfc2e5a855263a
Parents: 5534a3a
Author: Takeshi Yamamuro 
Authored: Wed Sep 19 19:54:49 2018 +0800
Committer: Wenchen Fan 
Committed: Wed Sep 19 19:54:49 2018 +0800

--
 .../InterpretedMutableProjection.scala  | 89 
 .../sql/catalyst/expressions/Projection.scala   | 75 -
 .../codegen/GenerateMutableProjection.scala |  4 +
 .../sql/catalyst/expressions/package.scala  | 18 +---
 ...eGeneratorWithInterpretedFallbackSuite.scala | 38 -
 .../CollectionExpressionsSuite.scala|  8 +-
 .../expressions/ExpressionEvalHelper.scala  | 34 +---
 .../expressions/MiscExpressionsSuite.scala  | 10 +--
 .../expressions/ObjectExpressionsSuite.scala|  8 +-
 .../apache/spark/sql/execution/SparkPlan.scala  |  2 +-
 .../spark/sql/execution/aggregate/udaf.scala|  2 +-
 11 files changed, 201 insertions(+), 87 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/12b1e91e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
new file mode 100644
index 000..0654108
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.aggregate.NoOp
+
+
+/**
+ * A [[MutableProjection]] that is calculated by calling `eval` on each of the 
specified
+ * expressions.
+ *
+ * @param expressions a sequence of expressions that determine the value of 
each column of the
+ *output row.
+ */
+class InterpretedMutableProjection(expressions: Seq[Expression]) extends 
MutableProjection {
+  def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) =
+this(toBoundExprs(expressions, inputSchema))
+
+  private[this] val buffer = new Array[Any](expressions.size)
+
+  override def initialize(partitionIndex: Int): Unit = {
+expressions.foreach(_.foreach {
+  case n: Nondeterministic => n.initialize(partitionIndex)
+  case _ =>
+})
+  }
+
+  private[this] val validExprs = expressions.zipWithIndex.filter {
+case (NoOp, _) => false
+case _ => true
+  }
+  private[this] var mutableRow: InternalRow = new 
GenericInternalRow(expressions.size)
+  def currentValue: InternalRow = mutableRow
+
+  override def target(row: InternalRow): MutableProjection = {
+mutableRow = row
+this
+  }
+
+  override def apply(input: InternalRow): InternalRow = {
+var i = 0
+while (i < validExprs.length) {
+  val (expr, ordinal) = validExprs(i)
+  // Store the result into buffer first, to make the projection atomic 
(needed by aggregation)
+  buffer(ordinal) =

svn commit: r29507 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_04_02-5534a3a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-09-19 Thread pwendell

Author: pwendell
Date: Wed Sep 19 11:17:13 2018
New Revision: 29507

Log:
Apache Spark 2.5.0-SNAPSHOT-2018_09_19_04_02-5534a3a docs


[This commit notification would consist of 1484 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for publishing scala-2.12 build

2018-09-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 f11f44548 -> 538ae62e0


[SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for 
publishing scala-2.12 build

## What changes were proposed in this pull request?

This is a follow up for #22441.

1. Remove flag "-Pkafka-0-8" for Scala 2.12 build.
2. Clean up the script, simpler logic.
3. Switch to Scala version to 2.11 before script exit.

## How was this patch tested?

Manual test.

Closes #22454 from gengliangwang/revise_release_build.

Authored-by: Gengliang Wang 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 5534a3a58e4025624fbad527dd129acb8025f25a)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/538ae62e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/538ae62e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/538ae62e

Branch: refs/heads/branch-2.4
Commit: 538ae62e0cafc8180b04e4e5c74b79acee60d2b1
Parents: f11f445
Author: Gengliang Wang 
Authored: Wed Sep 19 18:30:46 2018 +0800
Committer: Wenchen Fan 
Committed: Wed Sep 19 18:31:09 2018 +0800

--
 dev/create-release/release-build.sh | 38 +---
 1 file changed, 15 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/538ae62e/dev/create-release/release-build.sh
--
diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 4c90a77..cce5f8b 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -111,21 +111,21 @@ fi
 # different versions of Scala are supported.
 BASE_PROFILES="-Pmesos -Pyarn"
 PUBLISH_SCALA_2_10=0
-PUBLISH_SCALA_2_12=0
 SCALA_2_10_PROFILES="-Pscala-2.10"
 SCALA_2_11_PROFILES=
-SCALA_2_12_PROFILES="-Pscala-2.12 -Pkafka-0-8"
-
 if [[ $SPARK_VERSION > "2.3" ]]; then
   BASE_PROFILES="$BASE_PROFILES -Pkubernetes -Pflume"
   SCALA_2_11_PROFILES="-Pkafka-0-8"
-  if [[ $SPARK_VERSION > "2.4" ]]; then
-PUBLISH_SCALA_2_12=1
-  fi
 else
   PUBLISH_SCALA_2_10=1
 fi
 
+PUBLISH_SCALA_2_12=0
+SCALA_2_12_PROFILES="-Pscala-2.12"
+if [[ $SPARK_VERSION > "2.4" ]]; then
+  PUBLISH_SCALA_2_12=1
+fi
+
 # Hive-specific profiles for some builds
 HIVE_PROFILES="-Phive -Phive-thriftserver"
 # Profiles for publishing snapshots and release to Maven Central
@@ -190,17 +190,9 @@ if [[ "$1" == "package" ]]; then
   # Updated for each binary build
   make_binary_release() {
 NAME=$1
-SCALA_VERSION=$2
-SCALA_PROFILES=
-if [[ SCALA_VERSION == "2.10" ]]; then
-  SCALA_PROFILES="$SCALA_2_10_PROFILES"
-elif [[ SCALA_VERSION == "2.12" ]]; then
-  SCALA_PROFILES="$SCALA_2_12_PROFILES"
-else
-  SCALA_PROFILES="$SCALA_2_11_PROFILES"
-fi
-FLAGS="$MVN_EXTRA_OPTS -B $SCALA_PROFILES $BASE_RELEASE_PROFILES $3"
-BUILD_PACKAGE=$4
+FLAGS="$MVN_EXTRA_OPTS -B $BASE_RELEASE_PROFILES $2"
+BUILD_PACKAGE=$3
+SCALA_VERSION=$4
 
 # We increment the Zinc port each time to avoid OOM's and other craziness 
if multiple builds
 # share the same Zinc server.
@@ -210,10 +202,8 @@ if [[ "$1" == "package" ]]; then
 cp -r spark spark-$SPARK_VERSION-bin-$NAME
 cd spark-$SPARK_VERSION-bin-$NAME
 
-if [[ SCALA_VERSION == "2.10" ]]; then
-  ./dev/change-scala-version.sh 2.10
-elif [[ SCALA_VERSION == "2.12" ]]; then
-  ./dev/change-scala-version.sh 2.12
+if [[ "$SCALA_VERSION" != "2.11" ]]; then
+  ./dev/change-scala-version.sh $SCALA_VERSION
 fi
 
 export ZINC_PORT=$ZINC_PORT
@@ -305,7 +295,7 @@ if [[ "$1" == "package" ]]; then
   for key in ${!BINARY_PKGS_ARGS[@]}; do
 args=${BINARY_PKGS_ARGS[$key]}
 extra=${BINARY_PKGS_EXTRA[$key]}
-if ! make_binary_release "$key" "2.11" "$args" "$extra"; then
+if ! make_binary_release "$key" "$SCALA_2_11_PROFILES $args" "$extra" 
"2.11"; then
   error "Failed to build $key package. Check logs for details."
 fi
   done
@@ -314,7 +304,7 @@ if [[ "$1" == "package" ]]; then
 key="without-hadoop-scala-2.12"
 args="-Phadoop-provided"
 extra=""
-if ! make_binary_release "$key" "2.12" "$args" "$extra"; then
+if ! make_binary_release "$key" "$SCALA_2_12_PROFILES $args" "$extra" 
"2.12"; then
   error "Failed to build $key package. Check logs for details."
 fi
   fi
@@ -446,6 +436,8 @@ if [[ "$1" == "publish-release" ]]; then
   # Clean-up Zinc nailgun process
   $LSOF -P |grep $ZINC_PORT | grep LISTEN | awk '{ print $2; }' | xargs kill
 
+  ./dev/change-scala-version.sh 2.11
+
   pushd $tmp_repo/org/apache/spark
 
   # Remove any extra files generated during install


-
To unsubscribe, e-mail:

spark git commit: [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for publishing scala-2.12 build

2018-09-19 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 4193c7623 -> 5534a3a58


[SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for 
publishing scala-2.12 build

## What changes were proposed in this pull request?

This is a follow up for #22441.

1. Remove flag "-Pkafka-0-8" for Scala 2.12 build.
2. Clean up the script, simpler logic.
3. Switch to Scala version to 2.11 before script exit.

## How was this patch tested?

Manual test.

Closes #22454 from gengliangwang/revise_release_build.

Authored-by: Gengliang Wang 
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5534a3a5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5534a3a5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5534a3a5

Branch: refs/heads/master
Commit: 5534a3a58e4025624fbad527dd129acb8025f25a
Parents: 4193c76
Author: Gengliang Wang 
Authored: Wed Sep 19 18:30:46 2018 +0800
Committer: Wenchen Fan 
Committed: Wed Sep 19 18:30:46 2018 +0800

--
 dev/create-release/release-build.sh | 38 +---
 1 file changed, 15 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5534a3a5/dev/create-release/release-build.sh
--
diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 4c90a77..cce5f8b 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -111,21 +111,21 @@ fi
 # different versions of Scala are supported.
 BASE_PROFILES="-Pmesos -Pyarn"
 PUBLISH_SCALA_2_10=0
-PUBLISH_SCALA_2_12=0
 SCALA_2_10_PROFILES="-Pscala-2.10"
 SCALA_2_11_PROFILES=
-SCALA_2_12_PROFILES="-Pscala-2.12 -Pkafka-0-8"
-
 if [[ $SPARK_VERSION > "2.3" ]]; then
   BASE_PROFILES="$BASE_PROFILES -Pkubernetes -Pflume"
   SCALA_2_11_PROFILES="-Pkafka-0-8"
-  if [[ $SPARK_VERSION > "2.4" ]]; then
-PUBLISH_SCALA_2_12=1
-  fi
 else
   PUBLISH_SCALA_2_10=1
 fi
 
+PUBLISH_SCALA_2_12=0
+SCALA_2_12_PROFILES="-Pscala-2.12"
+if [[ $SPARK_VERSION > "2.4" ]]; then
+  PUBLISH_SCALA_2_12=1
+fi
+
 # Hive-specific profiles for some builds
 HIVE_PROFILES="-Phive -Phive-thriftserver"
 # Profiles for publishing snapshots and release to Maven Central
@@ -190,17 +190,9 @@ if [[ "$1" == "package" ]]; then
   # Updated for each binary build
   make_binary_release() {
 NAME=$1
-SCALA_VERSION=$2
-SCALA_PROFILES=
-if [[ SCALA_VERSION == "2.10" ]]; then
-  SCALA_PROFILES="$SCALA_2_10_PROFILES"
-elif [[ SCALA_VERSION == "2.12" ]]; then
-  SCALA_PROFILES="$SCALA_2_12_PROFILES"
-else
-  SCALA_PROFILES="$SCALA_2_11_PROFILES"
-fi
-FLAGS="$MVN_EXTRA_OPTS -B $SCALA_PROFILES $BASE_RELEASE_PROFILES $3"
-BUILD_PACKAGE=$4
+FLAGS="$MVN_EXTRA_OPTS -B $BASE_RELEASE_PROFILES $2"
+BUILD_PACKAGE=$3
+SCALA_VERSION=$4
 
 # We increment the Zinc port each time to avoid OOM's and other craziness 
if multiple builds
 # share the same Zinc server.
@@ -210,10 +202,8 @@ if [[ "$1" == "package" ]]; then
 cp -r spark spark-$SPARK_VERSION-bin-$NAME
 cd spark-$SPARK_VERSION-bin-$NAME
 
-if [[ SCALA_VERSION == "2.10" ]]; then
-  ./dev/change-scala-version.sh 2.10
-elif [[ SCALA_VERSION == "2.12" ]]; then
-  ./dev/change-scala-version.sh 2.12
+if [[ "$SCALA_VERSION" != "2.11" ]]; then
+  ./dev/change-scala-version.sh $SCALA_VERSION
 fi
 
 export ZINC_PORT=$ZINC_PORT
@@ -305,7 +295,7 @@ if [[ "$1" == "package" ]]; then
   for key in ${!BINARY_PKGS_ARGS[@]}; do
 args=${BINARY_PKGS_ARGS[$key]}
 extra=${BINARY_PKGS_EXTRA[$key]}
-if ! make_binary_release "$key" "2.11" "$args" "$extra"; then
+if ! make_binary_release "$key" "$SCALA_2_11_PROFILES $args" "$extra" 
"2.11"; then
   error "Failed to build $key package. Check logs for details."
 fi
   done
@@ -314,7 +304,7 @@ if [[ "$1" == "package" ]]; then
 key="without-hadoop-scala-2.12"
 args="-Phadoop-provided"
 extra=""
-if ! make_binary_release "$key" "2.12" "$args" "$extra"; then
+if ! make_binary_release "$key" "$SCALA_2_12_PROFILES $args" "$extra" 
"2.12"; then
   error "Failed to build $key package. Check logs for details."
 fi
   fi
@@ -446,6 +436,8 @@ if [[ "$1" == "publish-release" ]]; then
   # Clean-up Zinc nailgun process
   $LSOF -P |grep $ZINC_PORT | grep LISTEN | awk '{ print $2; }' | xargs kill
 
+  ./dev/change-scala-version.sh 2.11
+
   pushd $tmp_repo/org/apache/spark
 
   # Remove any extra files generated during install


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r29506 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_02_02-f11f445-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-09-19 Thread pwendell

Author: pwendell
Date: Wed Sep 19 09:17:32 2018
New Revision: 29506

Log:
Apache Spark 2.4.1-SNAPSHOT-2018_09_19_02_02-f11f445 docs


[This commit notification would consist of 1475 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r29504 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_00_02-4193c76-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-09-19 Thread pwendell

Author: pwendell
Date: Wed Sep 19 07:17:46 2018
New Revision: 29504

Log:
Apache Spark 2.5.0-SNAPSHOT-2018_09_19_00_02-4193c76 docs


[This commit notification would consist of 1484 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r29533 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_22_02-dfcff38-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

svn commit: r29532 - in /dev/spark/2.3.3-SNAPSHOT-2018_09_19_22_02-e319a62-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

spark git commit: [SPARK-23648][R][SQL] Adds more types for hint in SparkR

spark git commit: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled

spark git commit: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled

svn commit: r29531 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_20_02-47d6e80-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-25457][SQL] IntegralDivide returns data type of the operands

spark git commit: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior

spark git commit: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior

spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled

spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled

spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23

spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23

spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23

svn commit: r29529 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_18_03-a9a8d3a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-25425][SQL][BACKPORT-2.4] Extra options should override session options in DataSource V2

svn commit: r29527 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_16_02-6f681d4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark.memory limit for K8S

spark git commit: [SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different schema representation

spark git commit: [SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different schema representation

spark git commit: Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"

spark git commit: Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"

spark-website git commit: Added the Spark Operator as a third-party project

spark git commit: [SPARK-25414][SS][TEST] make it clear that the numRows metrics should be counted for each scan of the source

svn commit: r29514 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_08_14-12b1e91-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

svn commit: r29509 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_06_03-538ae62-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-25358][SQL] MutableProjection supports fallback to an interpreted mode

svn commit: r29507 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_04_02-5534a3a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for publishing scala-2.12 build

spark git commit: [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for publishing scala-2.12 build

svn commit: r29506 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_02_02-f11f445-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

svn commit: r29504 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_00_02-4193c76-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

33 matches

Site Navigation

Mail list logo

Footer information