svn commit: r29533 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_22_02-dfcff38-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Thu Sep 20 05:19:10 2018 New Revision: 29533 Log: Apache Spark 2.4.1-SNAPSHOT-2018_09_19_22_02-dfcff38 docs [This commit notification would consist of 1475 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r29532 - in /dev/spark/2.3.3-SNAPSHOT-2018_09_19_22_02-e319a62-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Thu Sep 20 05:17:54 2018 New Revision: 29532 Log: Apache Spark 2.3.3-SNAPSHOT-2018_09_19_22_02-e319a62 docs [This commit notification would consist of 1443 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark
Repository: spark Updated Branches: refs/heads/master 95b177c8f -> 0e31a6f25 [SPARK-25339][TEST] Refactor FilterPushdownBenchmark ## What changes were proposed in this pull request? Refactor `FilterPushdownBenchmark` use `main` method. we can use 3 ways to run this test now: 1. bin/spark-submit --class org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark spark-sql_2.11-2.5.0-SNAPSHOT-tests.jar 2. build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark" 3. SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark" The method 2 and the method 3 do not need to compile the `spark-sql_*-tests.jar` package. So these two methods are mainly for developers to quickly do benchmark. ## How was this patch tested? manual tests Closes #22443 from wangyum/SPARK-25339. Authored-by: Yuming Wang Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0e31a6f2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0e31a6f2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0e31a6f2 Branch: refs/heads/master Commit: 0e31a6f25e0263b144255a6630e1d381fe2d27a7 Parents: 95b177c Author: Yuming Wang Authored: Thu Sep 20 12:34:39 2018 +0800 Committer: Wenchen Fan Committed: Thu Sep 20 12:34:39 2018 +0800 -- .../org/apache/spark/util/BenchmarkBase.scala | 57 .../benchmark/FilterPushdownBenchmark.scala | 333 +-- 2 files changed, 206 insertions(+), 184 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0e31a6f2/core/src/main/scala/org/apache/spark/util/BenchmarkBase.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/BenchmarkBase.scala b/core/src/main/scala/org/apache/spark/util/BenchmarkBase.scala new file mode 100644 index 000..c84032b --- /dev/null +++ b/core/src/main/scala/org/apache/spark/util/BenchmarkBase.scala @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.io.{File, FileOutputStream, OutputStream} + +/** + * A base class for generate benchmark results to a file. + */ +abstract class BenchmarkBase { + var output: Option[OutputStream] = None + + def benchmark(): Unit + + final def runBenchmark(benchmarkName: String)(func: => Any): Unit = { +val separator = "=" * 96 +val testHeader = (separator + '\n' + benchmarkName + '\n' + separator + '\n' + '\n').getBytes +output.foreach(_.write(testHeader)) +func +output.foreach(_.write('\n')) + } + + def main(args: Array[String]): Unit = { +val regenerateBenchmarkFiles: Boolean = System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1" +if (regenerateBenchmarkFiles) { + val resultFileName = s"${this.getClass.getSimpleName.replace("$", "")}-results.txt" + val file = new File(s"benchmarks/$resultFileName") + if (!file.exists()) { +file.createNewFile() + } + output = Some(new FileOutputStream(file)) +} + +benchmark() + +output.foreach { o => + if (o != null) { +o.close() + } +} + } +} http://git-wip-us.apache.org/repos/asf/spark/blob/0e31a6f2/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala index d6dfdec..9ecea99 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala @@ -17,29 +17,28 @@ package org.apache.spark.sql.execution.benchmark -import java.io.{File, FileOutputStream, OutputStream} +import
spark git commit: [SPARK-23648][R][SQL] Adds more types for hint in SparkR
Repository: spark Updated Branches: refs/heads/master 76399d75e -> 95b177c8f [SPARK-23648][R][SQL] Adds more types for hint in SparkR ## What changes were proposed in this pull request? Addition of numeric and list hints for SparkR. ## How was this patch tested? Add test in test_sparkSQL.R Author: Huaxin Gao Closes #21649 from huaxingao/spark-23648. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/95b177c8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/95b177c8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/95b177c8 Branch: refs/heads/master Commit: 95b177c8f0862c6965a7c3cd76b3935c975adee9 Parents: 76399d7 Author: Huaxin Gao Authored: Wed Sep 19 21:27:30 2018 -0700 Committer: Felix Cheung Committed: Wed Sep 19 21:27:30 2018 -0700 -- R/pkg/R/DataFrame.R | 12 +++- R/pkg/tests/fulltests/test_sparkSQL.R | 9 + 2 files changed, 20 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/95b177c8/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 458deca..a1cb478 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -3985,7 +3985,17 @@ setMethod("hint", signature(x = "SparkDataFrame", name = "character"), function(x, name, ...) { parameters <- list(...) -stopifnot(all(sapply(parameters, is.character))) +if (!all(sapply(parameters, function(y) { + if (is.character(y) || is.numeric(y)) { +TRUE + } else if (is.list(y)) { +all(sapply(y, function(z) { is.character(z) || is.numeric(z) })) + } else { +FALSE + } +}))) { + stop("sql hint should be character, numeric, or list with character or numeric.") +} jdf <- callJMethod(x@sdf, "hint", name, parameters) dataFrame(jdf) }) http://git-wip-us.apache.org/repos/asf/spark/blob/95b177c8/R/pkg/tests/fulltests/test_sparkSQL.R -- diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R b/R/pkg/tests/fulltests/test_sparkSQL.R index 0c4bdb3..40d8f80 100644 --- a/R/pkg/tests/fulltests/test_sparkSQL.R +++ b/R/pkg/tests/fulltests/test_sparkSQL.R @@ -2419,6 +2419,15 @@ test_that("join(), crossJoin() and merge() on a DataFrame", { expect_true(any(grepl("BroadcastHashJoin", execution_plan_broadcast))) }) +test_that("test hint", { + df <- sql("SELECT * FROM range(10e10)") + hintList <- list("hint2", "hint3", "hint4") + execution_plan_hint <- capture.output( +explain(hint(df, "hint1", 1.23456, "aa", hintList), TRUE) + ) + expect_true(any(grepl("1.23456, aa", execution_plan_hint))) +}) + test_that("toJSON() on DataFrame", { df <- as.DataFrame(cars) df_json <- toJSON(df) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled
Repository: spark Updated Branches: refs/heads/branch-2.4 06efed290 -> dfcff3839 [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled ## What changes were proposed in this pull request? This patch adds an "optimizer" prefix to nested schema pruning. ## How was this patch tested? Should be covered by existing tests. Closes #22475 from rxin/SPARK-4502. Authored-by: Reynold Xin Signed-off-by: gatorsmile (cherry picked from commit 76399d75e23f2c7d6c2a1fb77a4387c5e15c809b) Signed-off-by: gatorsmile Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dfcff383 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dfcff383 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dfcff383 Branch: refs/heads/branch-2.4 Commit: dfcff38394929970fee454c69864d0e10d59f8d4 Parents: 06efed2 Author: Reynold Xin Authored: Wed Sep 19 21:23:35 2018 -0700 Committer: gatorsmile Committed: Wed Sep 19 21:23:49 2018 -0700 -- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dfcff383/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 8b82fe1..c5901ee 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1437,7 +1437,7 @@ object SQLConf { .createWithDefault(true) val NESTED_SCHEMA_PRUNING_ENABLED = -buildConf("spark.sql.nestedSchemaPruning.enabled") +buildConf("spark.sql.optimizer.nestedSchemaPruning.enabled") .internal() .doc("Prune nested fields from a logical relation's output which are unnecessary in " + "satisfying a query. This optimization allows columnar file format readers to avoid " + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled
Repository: spark Updated Branches: refs/heads/master 47d6e80a2 -> 76399d75e [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.enabled ## What changes were proposed in this pull request? This patch adds an "optimizer" prefix to nested schema pruning. ## How was this patch tested? Should be covered by existing tests. Closes #22475 from rxin/SPARK-4502. Authored-by: Reynold Xin Signed-off-by: gatorsmile Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/76399d75 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/76399d75 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/76399d75 Branch: refs/heads/master Commit: 76399d75e23f2c7d6c2a1fb77a4387c5e15c809b Parents: 47d6e80 Author: Reynold Xin Authored: Wed Sep 19 21:23:35 2018 -0700 Committer: gatorsmile Committed: Wed Sep 19 21:23:35 2018 -0700 -- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/76399d75/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 907221c..a01e87c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1457,7 +1457,7 @@ object SQLConf { .createWithDefault(true) val NESTED_SCHEMA_PRUNING_ENABLED = -buildConf("spark.sql.nestedSchemaPruning.enabled") +buildConf("spark.sql.optimizer.nestedSchemaPruning.enabled") .internal() .doc("Prune nested fields from a logical relation's output which are unnecessary in " + "satisfying a query. This optimization allows columnar file format readers to avoid " + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r29531 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_20_02-47d6e80-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Thu Sep 20 03:16:55 2018 New Revision: 29531 Log: Apache Spark 2.5.0-SNAPSHOT-2018_09_19_20_02-47d6e80 docs [This commit notification would consist of 1484 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25457][SQL] IntegralDivide returns data type of the operands
Repository: spark Updated Branches: refs/heads/master 8aae49afc -> 47d6e80a2 [SPARK-25457][SQL] IntegralDivide returns data type of the operands ## What changes were proposed in this pull request? The PR proposes to return the data type of the operands as a result for the `div` operator. Before the PR, `bigint` is always returned. It introduces also a `spark.sql.legacy.integralDivide.returnBigint` config in order to let the users restore the legacy behavior. ## How was this patch tested? added UTs Closes #22465 from mgaido91/SPARK-25457. Authored-by: Marco Gaido Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47d6e80a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47d6e80a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/47d6e80a Branch: refs/heads/master Commit: 47d6e80a2e64823fabb596503fb6a6cc6f51f713 Parents: 8aae49a Author: Marco Gaido Authored: Thu Sep 20 10:23:37 2018 +0800 Committer: Wenchen Fan Committed: Thu Sep 20 10:23:37 2018 +0800 -- .../sql/catalyst/expressions/arithmetic.scala | 17 +- .../org/apache/spark/sql/internal/SQLConf.scala | 9 + .../expressions/ArithmeticExpressionSuite.scala | 26 +- .../resources/sql-tests/inputs/operator-div.sql | 14 ++ .../resources/sql-tests/inputs/operators.sql| 6 +- .../sql-tests/results/operator-div.sql.out | 82 ++ .../sql-tests/results/operators.sql.out | 248 --- 7 files changed, 246 insertions(+), 156 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/47d6e80a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index 1b1808f..f59b2a2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -22,6 +22,7 @@ import org.apache.spark.sql.catalyst.analysis.{TypeCheckResult, TypeCoercion} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.expressions.codegen.Block._ import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.CalendarInterval @@ -327,16 +328,24 @@ case class Divide(left: Expression, right: Expression) extends DivModLike { case class IntegralDivide(left: Expression, right: Expression) extends DivModLike { override def inputType: AbstractDataType = IntegralType - override def dataType: DataType = LongType + override def dataType: DataType = if (SQLConf.get.integralDivideReturnLong) { +LongType + } else { +left.dataType + } override def symbol: String = "/" override def sqlOperator: String = "div" - private lazy val div: (Any, Any) => Long = left.dataType match { + private lazy val div: (Any, Any) => Any = left.dataType match { case i: IntegralType => val divide = i.integral.asInstanceOf[Integral[Any]].quot _ - val toLong = i.integral.asInstanceOf[Integral[Any]].toLong _ - (x, y) => toLong(divide(x, y)) + if (SQLConf.get.integralDivideReturnLong) { +val toLong = i.integral.asInstanceOf[Integral[Any]].toLong _ +(x, y) => toLong(divide(x, y)) + } else { +divide + } } override def evalOperation(left: Any, right: Any): Any = div(left, right) http://git-wip-us.apache.org/repos/asf/spark/blob/47d6e80a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index c3328a6..907221c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1561,6 +1561,13 @@ object SQLConf { "are performed before any UNION, EXCEPT and MINUS operations.") .booleanConf .createWithDefault(false) + + val LEGACY_INTEGRALDIVIDE_RETURN_LONG = buildConf("spark.sql.legacy.integralDivide.returnBigint") +.doc("If it is set to true, the div operator returns always a bigint. This behavior was " + + "inherited from Hive. Otherwise, the return type is the data type of the operands.") +.internal() +.booleanConf +.createWithDefault(false) } /** @@ -1973,6 +1980,8 @@
spark git commit: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior
Repository: spark Updated Branches: refs/heads/branch-2.4 535bf1cc9 -> 06efed290 [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior ## What changes were proposed in this pull request? The PR updates the migration guide in order to explain the changes introduced in the behavior of the IN operator with subqueries, in particular, the improved handling of struct attributes in these situations. ## How was this patch tested? NA Closes #22469 from mgaido91/SPARK-24341_followup. Authored-by: Marco Gaido Signed-off-by: Wenchen Fan (cherry picked from commit 8aae49afc7997aa1da61029409ef6d8ce0ab256a) Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/06efed29 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/06efed29 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/06efed29 Branch: refs/heads/branch-2.4 Commit: 06efed29047041096910f300cb14e8cfff540efc Parents: 535bf1c Author: Marco Gaido Authored: Thu Sep 20 10:10:20 2018 +0800 Committer: Wenchen Fan Committed: Thu Sep 20 10:10:36 2018 +0800 -- docs/sql-programming-guide.md | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/06efed29/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 2fa29a0..c76f2e3 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1879,6 +1879,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see ## Upgrading From Spark SQL 2.3 to 2.4 + - Since Spark 2.4, when there is a struct field in front of the IN operator before a subquery, the inner query must contain a struct field as well. In previous versions, instead, the fields of the struct were compared to the output of the inner query. Eg. if `a` is a `struct(a string, b int)`, in Spark 2.4 `a in (select (1 as a, 'a' as b) from range(1))` is a valid query, while `a in (select 1, 'a' from range(1))` is not. In previous version it was the opposite. - In versions 2.2.1+ and 2.3, if `spark.sql.caseSensitive` is set to true, then the `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions incorrectly became case-sensitive and would resolve to columns (unless typed in lower case). In Spark 2.4 this has been fixed and the functions are no longer case-sensitive. - Since Spark 2.4, Spark will evaluate the set operations referenced in a query by following a precedence rule as per the SQL standard. If the order is not specified by parentheses, set operations are performed from left to right with the exception that all INTERSECT operations are performed before any UNION, EXCEPT or MINUS operations. The old behaviour of giving equal precedence to all the set operations are preserved under a newly added configuration `spark.sql.legacy.setopsPrecedence.enabled` with a default value of `false`. When this property is set to `true`, spark will evaluate the set operators from left to right as they appear in the query given no explicit ordering is enforced by usage of parenthesis. - Since Spark 2.4, Spark will display table description column Last Access value as UNKNOWN when the value was Jan 01 1970. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior
Repository: spark Updated Branches: refs/heads/master 936c92034 -> 8aae49afc [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries behavior ## What changes were proposed in this pull request? The PR updates the migration guide in order to explain the changes introduced in the behavior of the IN operator with subqueries, in particular, the improved handling of struct attributes in these situations. ## How was this patch tested? NA Closes #22469 from mgaido91/SPARK-24341_followup. Authored-by: Marco Gaido Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8aae49af Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8aae49af Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8aae49af Branch: refs/heads/master Commit: 8aae49afc7997aa1da61029409ef6d8ce0ab256a Parents: 936c920 Author: Marco Gaido Authored: Thu Sep 20 10:10:20 2018 +0800 Committer: Wenchen Fan Committed: Thu Sep 20 10:10:20 2018 +0800 -- docs/sql-programming-guide.md | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8aae49af/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 2fa29a0..c76f2e3 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1879,6 +1879,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see ## Upgrading From Spark SQL 2.3 to 2.4 + - Since Spark 2.4, when there is a struct field in front of the IN operator before a subquery, the inner query must contain a struct field as well. In previous versions, instead, the fields of the struct were compared to the output of the inner query. Eg. if `a` is a `struct(a string, b int)`, in Spark 2.4 `a in (select (1 as a, 'a' as b) from range(1))` is a valid query, while `a in (select 1, 'a' from range(1))` is not. In previous version it was the opposite. - In versions 2.2.1+ and 2.3, if `spark.sql.caseSensitive` is set to true, then the `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions incorrectly became case-sensitive and would resolve to columns (unless typed in lower case). In Spark 2.4 this has been fixed and the functions are no longer case-sensitive. - Since Spark 2.4, Spark will evaluate the set operations referenced in a query by following a precedence rule as per the SQL standard. If the order is not specified by parentheses, set operations are performed from left to right with the exception that all INTERSECT operations are performed before any UNION, EXCEPT or MINUS operations. The old behaviour of giving equal precedence to all the set operations are preserved under a newly added configuration `spark.sql.legacy.setopsPrecedence.enabled` with a default value of `false`. When this property is set to `true`, spark will evaluate the set operators from left to right as they appear in the query given no explicit ordering is enforced by usage of parenthesis. - Since Spark 2.4, Spark will display table description column Last Access value as UNKNOWN when the value was Jan 01 1970. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled
Repository: spark Updated Branches: refs/heads/branch-2.4 99ae693b3 -> 535bf1cc9 [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled ## What changes were proposed in this pull request? This patch changes the config option `spark.sql.streaming.noDataMicroBatchesEnabled` to `spark.sql.streaming.noDataMicroBatches.enabled` to be more consistent with rest of the configs. Unfortunately there is one streaming config called `spark.sql.streaming.metricsEnabled`. For that one we should just use a fallback config and change it in a separate patch. ## How was this patch tested? Made sure no other references to this config are in the code base: ``` > git grep "noDataMicro" sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: buildConf("spark.sql.streaming.noDataMicroBatches.enabled") ``` Closes #22476 from rxin/SPARK-24157. Authored-by: Reynold Xin Signed-off-by: Reynold Xin (cherry picked from commit 936c920347e196381b48bc3656ca81a06f2ff46d) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/535bf1cc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/535bf1cc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/535bf1cc Branch: refs/heads/branch-2.4 Commit: 535bf1cc9e6b54df7059ac3109b8cba30057d040 Parents: 99ae693 Author: Reynold Xin Authored: Wed Sep 19 18:51:20 2018 -0700 Committer: Reynold Xin Committed: Wed Sep 19 18:51:31 2018 -0700 -- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/535bf1cc/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 3e9cde4..8b82fe1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1056,7 +1056,7 @@ object SQLConf { .createWithDefault(1L) val STREAMING_NO_DATA_MICRO_BATCHES_ENABLED = -buildConf("spark.sql.streaming.noDataMicroBatchesEnabled") +buildConf("spark.sql.streaming.noDataMicroBatches.enabled") .doc( "Whether streaming micro-batch engine will execute batches without data " + "for eager state management for stateful streaming queries.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled
Repository: spark Updated Branches: refs/heads/master 90e3955f3 -> 936c92034 [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled ## What changes were proposed in this pull request? This patch changes the config option `spark.sql.streaming.noDataMicroBatchesEnabled` to `spark.sql.streaming.noDataMicroBatches.enabled` to be more consistent with rest of the configs. Unfortunately there is one streaming config called `spark.sql.streaming.metricsEnabled`. For that one we should just use a fallback config and change it in a separate patch. ## How was this patch tested? Made sure no other references to this config are in the code base: ``` > git grep "noDataMicro" sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: buildConf("spark.sql.streaming.noDataMicroBatches.enabled") ``` Closes #22476 from rxin/SPARK-24157. Authored-by: Reynold Xin Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/936c9203 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/936c9203 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/936c9203 Branch: refs/heads/master Commit: 936c920347e196381b48bc3656ca81a06f2ff46d Parents: 90e3955 Author: Reynold Xin Authored: Wed Sep 19 18:51:20 2018 -0700 Committer: Reynold Xin Committed: Wed Sep 19 18:51:20 2018 -0700 -- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/936c9203/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index b1e9b17..c3328a6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1076,7 +1076,7 @@ object SQLConf { .createWithDefault(1L) val STREAMING_NO_DATA_MICRO_BATCHES_ENABLED = -buildConf("spark.sql.streaming.noDataMicroBatchesEnabled") +buildConf("spark.sql.streaming.noDataMicroBatches.enabled") .doc( "Whether streaming micro-batch engine will execute batches without data " + "for eager state management for stateful streaming queries.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23
Repository: spark Updated Branches: refs/heads/branch-2.3 7b5da37c0 -> e319a624e [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23 ## What changes were proposed in this pull request? Fix test that constructs a Pandas DataFrame by specifying the column order. Previously this test assumed the columns would be sorted alphabetically, however when using Python 3.6 with Pandas 0.23 or higher, the original column order is maintained. This causes the columns to get mixed up and the test errors. Manually tested with `python/run-tests` using Python 3.6.6 and Pandas 0.23.4 Closes #22477 from BryanCutler/pyspark-tests-py36-pd23-SPARK-25471. Authored-by: Bryan Cutler Signed-off-by: hyukjinkwon (cherry picked from commit 90e3955f384ca07bdf24faa6cdb60ded944cf0d8) Signed-off-by: hyukjinkwon Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e319a624 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e319a624 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e319a624 Branch: refs/heads/branch-2.3 Commit: e319a624e2f366a941bd92a685e1b48504c887b1 Parents: 7b5da37 Author: Bryan Cutler Authored: Thu Sep 20 09:29:29 2018 +0800 Committer: hyukjinkwon Committed: Thu Sep 20 09:30:06 2018 +0800 -- python/pyspark/sql/tests.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e319a624/python/pyspark/sql/tests.py -- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index 6bfb329..3c5fc97 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -2885,7 +2885,7 @@ class SQLTests(ReusedSQLTestCase): import pandas as pd from datetime import datetime pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)], -"d": [pd.Timestamp.now().date()]}) +"d": [pd.Timestamp.now().date()]}, columns=["d", "ts"]) # test types are inferred correctly without specifying schema df = self.spark.createDataFrame(pdf) self.assertTrue(isinstance(df.schema['ts'].dataType, TimestampType)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23
Repository: spark Updated Branches: refs/heads/branch-2.4 a9a8d3a4b -> 99ae693b3 [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23 ## What changes were proposed in this pull request? Fix test that constructs a Pandas DataFrame by specifying the column order. Previously this test assumed the columns would be sorted alphabetically, however when using Python 3.6 with Pandas 0.23 or higher, the original column order is maintained. This causes the columns to get mixed up and the test errors. Manually tested with `python/run-tests` using Python 3.6.6 and Pandas 0.23.4 Closes #22477 from BryanCutler/pyspark-tests-py36-pd23-SPARK-25471. Authored-by: Bryan Cutler Signed-off-by: hyukjinkwon (cherry picked from commit 90e3955f384ca07bdf24faa6cdb60ded944cf0d8) Signed-off-by: hyukjinkwon Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99ae693b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99ae693b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99ae693b Branch: refs/heads/branch-2.4 Commit: 99ae693b3722db6e01825b8cf2c3f2ef74a65ddb Parents: a9a8d3a Author: Bryan Cutler Authored: Thu Sep 20 09:29:29 2018 +0800 Committer: hyukjinkwon Committed: Thu Sep 20 09:29:49 2018 +0800 -- python/pyspark/sql/tests.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/99ae693b/python/pyspark/sql/tests.py -- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index 08d7cfa..603f994 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -3266,7 +3266,7 @@ class SQLTests(ReusedSQLTestCase): import pandas as pd from datetime import datetime pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)], -"d": [pd.Timestamp.now().date()]}) +"d": [pd.Timestamp.now().date()]}, columns=["d", "ts"]) # test types are inferred correctly without specifying schema df = self.spark.createDataFrame(pdf) self.assertTrue(isinstance(df.schema['ts'].dataType, TimestampType)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23
Repository: spark Updated Branches: refs/heads/master 6f681d429 -> 90e3955f3 [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23 ## What changes were proposed in this pull request? Fix test that constructs a Pandas DataFrame by specifying the column order. Previously this test assumed the columns would be sorted alphabetically, however when using Python 3.6 with Pandas 0.23 or higher, the original column order is maintained. This causes the columns to get mixed up and the test errors. Manually tested with `python/run-tests` using Python 3.6.6 and Pandas 0.23.4 Closes #22477 from BryanCutler/pyspark-tests-py36-pd23-SPARK-25471. Authored-by: Bryan Cutler Signed-off-by: hyukjinkwon Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/90e3955f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/90e3955f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/90e3955f Branch: refs/heads/master Commit: 90e3955f384ca07bdf24faa6cdb60ded944cf0d8 Parents: 6f681d4 Author: Bryan Cutler Authored: Thu Sep 20 09:29:29 2018 +0800 Committer: hyukjinkwon Committed: Thu Sep 20 09:29:29 2018 +0800 -- python/pyspark/sql/tests.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/90e3955f/python/pyspark/sql/tests.py -- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index 08d7cfa..603f994 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -3266,7 +3266,7 @@ class SQLTests(ReusedSQLTestCase): import pandas as pd from datetime import datetime pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)], -"d": [pd.Timestamp.now().date()]}) +"d": [pd.Timestamp.now().date()]}, columns=["d", "ts"]) # test types are inferred correctly without specifying schema df = self.spark.createDataFrame(pdf) self.assertTrue(isinstance(df.schema['ts'].dataType, TimestampType)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r29529 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_18_03-a9a8d3a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Thu Sep 20 01:17:31 2018 New Revision: 29529 Log: Apache Spark 2.4.1-SNAPSHOT-2018_09_19_18_03-a9a8d3a docs [This commit notification would consist of 1475 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25425][SQL][BACKPORT-2.4] Extra options should override session options in DataSource V2
Repository: spark Updated Branches: refs/heads/branch-2.4 9031c7848 -> a9a8d3a4b [SPARK-25425][SQL][BACKPORT-2.4] Extra options should override session options in DataSource V2 ## What changes were proposed in this pull request? In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via `.option()`, and should overwrite more generic session options. ## How was this patch tested? Added tests for read and write paths. Closes #22474 from MaxGekk/session-options-2.4. Authored-by: Maxim Gekk Signed-off-by: Dongjoon Hyun Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9a8d3a4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a9a8d3a4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a9a8d3a4 Branch: refs/heads/branch-2.4 Commit: a9a8d3a4b92be89defd82d5f2eeb3f9af45c687d Parents: 9031c78 Author: Maxim Gekk Authored: Wed Sep 19 16:53:26 2018 -0700 Committer: Dongjoon Hyun Committed: Wed Sep 19 16:53:26 2018 -0700 -- .../org/apache/spark/sql/DataFrameReader.scala | 2 +- .../org/apache/spark/sql/DataFrameWriter.scala | 8 +++-- .../sql/sources/v2/DataSourceV2Suite.scala | 33 .../sources/v2/SimpleWritableDataSource.scala | 7 - 4 files changed, 45 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a9a8d3a4/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala index 371ec70..27a1af2 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala @@ -202,7 +202,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { DataSourceOptions.PATHS_KEY -> objectMapper.writeValueAsString(paths.toArray) } Dataset.ofRows(sparkSession, DataSourceV2Relation.create( - ds, extraOptions.toMap ++ sessionOptions + pathsOption, + ds, sessionOptions ++ extraOptions.toMap + pathsOption, userSpecifiedSchema = userSpecifiedSchema)) } else { loadV1Source(paths: _*) http://git-wip-us.apache.org/repos/asf/spark/blob/a9a8d3a4/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala index 4aeddfd..80ade7c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala @@ -241,10 +241,12 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { val source = cls.newInstance().asInstanceOf[DataSourceV2] source match { case ws: WriteSupport => - val options = extraOptions ++ - DataSourceV2Utils.extractSessionConfigs(source, df.sparkSession.sessionState.conf) + val sessionOptions = DataSourceV2Utils.extractSessionConfigs( +source, +df.sparkSession.sessionState.conf) + val options = sessionOptions ++ extraOptions + val relation = DataSourceV2Relation.create(source, options) - val relation = DataSourceV2Relation.create(source, options.toMap) if (mode == SaveMode.Append) { runCommand(df.sparkSession, "save") { AppendData.byName(relation, df.logicalPlan) http://git-wip-us.apache.org/repos/asf/spark/blob/a9a8d3a4/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala index 12beca2..bafde50 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala @@ -17,6 +17,7 @@ package org.apache.spark.sql.sources.v2 +import java.io.File import java.util.{ArrayList, List => JList} import test.org.apache.spark.sql.sources.v2._ @@ -322,6 +323,38 @@ class DataSourceV2Suite extends QueryTest with SharedSQLContext { checkCanonicalizedOutput(df, 2, 2) checkCanonicalizedOutput(df.select('i), 2, 1) } + + test("SPARK-25425: extra options should override sessions options during reading") { +val
svn commit: r29527 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_16_02-6f681d4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Sep 19 23:17:10 2018 New Revision: 29527 Log: Apache Spark 2.5.0-SNAPSHOT-2018_09_19_16_02-6f681d4 docs [This commit notification would consist of 1484 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark.memory limit for K8S
Repository: spark Updated Branches: refs/heads/branch-2.4 83a75a83c -> 9031c7848 [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark.memory limit for K8S ## What changes were proposed in this pull request? Add spark.executor.pyspark.memory limit for K8S [BACKPORT] ## How was this patch tested? Unit and Integration tests Closes #22376 from ifilonenko/SPARK-25021-2.4. Authored-by: Ilan Filonenko Signed-off-by: Holden Karau Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9031c784 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9031c784 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9031c784 Branch: refs/heads/branch-2.4 Commit: 9031c784847353051bc0978f63ef4146ae9095ff Parents: 83a75a8 Author: Ilan Filonenko Authored: Wed Sep 19 15:37:56 2018 -0700 Committer: Holden Karau Committed: Wed Sep 19 15:37:56 2018 -0700 -- dev/make-distribution.sh| 1 + docs/configuration.md | 2 +- examples/src/main/python/py_container_checks.py | 32 - examples/src/main/python/pyfiles.py | 38 .../org/apache/spark/deploy/k8s/Config.scala| 7 +++ .../k8s/features/BasicExecutorFeatureStep.scala | 14 +- .../bindings/JavaDriverFeatureStep.scala| 4 +- .../bindings/PythonDriverFeatureStep.scala | 4 +- .../features/bindings/RDriverFeatureStep.scala | 4 +- .../features/BasicDriverFeatureStepSuite.scala | 1 - .../BasicExecutorFeatureStepSuite.scala | 24 ++ .../bindings/JavaDriverFeatureStepSuite.scala | 1 - .../src/main/dockerfiles/spark/Dockerfile | 1 + .../dockerfiles/spark/bindings/R/Dockerfile | 2 +- .../spark/bindings/python/Dockerfile| 2 +- .../k8s/integrationtest/KubernetesSuite.scala | 33 ++ .../k8s/integrationtest/PythonTestsSuite.scala | 34 +++--- .../tests/py_container_checks.py| 32 + .../integration-tests/tests/pyfiles.py | 38 .../tests/worker_memory_check.py| 47 20 files changed, 235 insertions(+), 86 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9031c784/dev/make-distribution.sh -- diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh index 126d39d..668682f 100755 --- a/dev/make-distribution.sh +++ b/dev/make-distribution.sh @@ -192,6 +192,7 @@ fi if [ -d "$SPARK_HOME"/resource-managers/kubernetes/core/target/ ]; then mkdir -p "$DISTDIR/kubernetes/" cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src/main/dockerfiles "$DISTDIR/kubernetes/" + cp -a "$SPARK_HOME"/resource-managers/kubernetes/integration-tests/tests "$DISTDIR/kubernetes/" fi # Copy examples and dependencies http://git-wip-us.apache.org/repos/asf/spark/blob/9031c784/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index a3e59a0..782ccff 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -188,7 +188,7 @@ of the most common options to set are: unless otherwise specified. If set, PySpark memory for an executor will be limited to this amount. If not set, Spark will not limit Python's memory use and it is up to the application to avoid exceeding the overhead memory space -shared with other non-JVM processes. When PySpark is run in YARN, this memory +shared with other non-JVM processes. When PySpark is run in YARN or Kubernetes, this memory is added to executor resource requests. http://git-wip-us.apache.org/repos/asf/spark/blob/9031c784/examples/src/main/python/py_container_checks.py -- diff --git a/examples/src/main/python/py_container_checks.py b/examples/src/main/python/py_container_checks.py deleted file mode 100644 index f6b3be2..000 --- a/examples/src/main/python/py_container_checks.py +++ /dev/null @@ -1,32 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -#http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
spark git commit: [SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different schema representation
Repository: spark Updated Branches: refs/heads/master cb1b55cf7 -> 6f681d429 [SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different schema representation ## What changes were proposed in this pull request? Improve testcase "image datasource test: read non image" to tolerate different schema representation. Because file:/path and file:///path are both valid URI-ifications so in some environment the testcase will fail. ## How was this patch tested? Manual. Closes #22449 from WeichenXu123/image_url. Authored-by: WeichenXu Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f681d42 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f681d42 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f681d42 Branch: refs/heads/master Commit: 6f681d42964884d19bf22deb614550d712223117 Parents: cb1b55c Author: WeichenXu Authored: Wed Sep 19 15:16:20 2018 -0700 Committer: Xiangrui Meng Committed: Wed Sep 19 15:16:20 2018 -0700 -- .../spark/ml/source/image/ImageFileFormatSuite.scala | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6f681d42/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala index 1a6a8d6..38e2513 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala @@ -17,6 +17,7 @@ package org.apache.spark.ml.source.image +import java.net.URI import java.nio.file.Paths import org.apache.spark.SparkFunSuite @@ -58,8 +59,14 @@ class ImageFileFormatSuite extends SparkFunSuite with MLlibTestSparkContext { .load(filePath) assert(df2.count() === 1) val result = df2.head() -assert(result === invalidImageRow( - Paths.get(filePath).toAbsolutePath().normalize().toUri().toString)) + +val resultOrigin = result.getStruct(0).getString(0) +// covert `origin` to `java.net.URI` object and then compare. +// because `file:/path` and `file:///path` are both valid URI-ifications +assert(new URI(resultOrigin) === Paths.get(filePath).toAbsolutePath().normalize().toUri()) + +// Compare other columns in the row to be the same with the `invalidImageRow` +assert(result === invalidImageRow(resultOrigin)) } test("image datasource partition test") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different schema representation
Repository: spark Updated Branches: refs/heads/branch-2.4 9fefb47fe -> 83a75a83c [SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different schema representation ## What changes were proposed in this pull request? Improve testcase "image datasource test: read non image" to tolerate different schema representation. Because file:/path and file:///path are both valid URI-ifications so in some environment the testcase will fail. ## How was this patch tested? Manual. Closes #22449 from WeichenXu123/image_url. Authored-by: WeichenXu Signed-off-by: Xiangrui Meng (cherry picked from commit 6f681d42964884d19bf22deb614550d712223117) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/83a75a83 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/83a75a83 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/83a75a83 Branch: refs/heads/branch-2.4 Commit: 83a75a83cb24d20d4c2df5389bb8db34ad0335d9 Parents: 9fefb47 Author: WeichenXu Authored: Wed Sep 19 15:16:20 2018 -0700 Committer: Xiangrui Meng Committed: Wed Sep 19 15:16:30 2018 -0700 -- .../spark/ml/source/image/ImageFileFormatSuite.scala | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/83a75a83/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala index 1a6a8d6..38e2513 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala @@ -17,6 +17,7 @@ package org.apache.spark.ml.source.image +import java.net.URI import java.nio.file.Paths import org.apache.spark.SparkFunSuite @@ -58,8 +59,14 @@ class ImageFileFormatSuite extends SparkFunSuite with MLlibTestSparkContext { .load(filePath) assert(df2.count() === 1) val result = df2.head() -assert(result === invalidImageRow( - Paths.get(filePath).toAbsolutePath().normalize().toUri().toString)) + +val resultOrigin = result.getStruct(0).getString(0) +// covert `origin` to `java.net.URI` object and then compare. +// because `file:/path` and `file:///path` are both valid URI-ifications +assert(new URI(resultOrigin) === Paths.get(filePath).toAbsolutePath().normalize().toUri()) + +// Compare other columns in the row to be the same with the `invalidImageRow` +assert(result === invalidImageRow(resultOrigin)) } test("image datasource partition test") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"
Repository: spark Updated Branches: refs/heads/branch-2.4 538ae62e0 -> 9fefb47fe Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema" This reverts commit 6c7db7fd1ced1d143b1389d09990a620fc16be46. (cherry picked from commit cb1b55cf771018f1560f6b173cdd7c6ca8061bc7) Signed-off-by: Dongjoon Hyun Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9fefb47f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9fefb47f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9fefb47f Branch: refs/heads/branch-2.4 Commit: 9fefb47feab14b865978bdb8e6155a976de72416 Parents: 538ae62 Author: Dongjoon Hyun Authored: Wed Sep 19 14:33:40 2018 -0700 Committer: Dongjoon Hyun Committed: Wed Sep 19 14:38:21 2018 -0700 -- .../sql/catalyst/expressions/jsonExpressions.scala | 4 ++-- .../org/apache/spark/sql/internal/SQLConf.scala | 16 2 files changed, 10 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9fefb47f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index ade10ab..bd9090a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -517,12 +517,12 @@ case class JsonToStructs( timeZoneId: Option[String] = None) extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes { - val forceNullableSchema: Boolean = SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA) + val forceNullableSchema = SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA) // The JSON input data might be missing certain fields. We force the nullability // of the user-provided schema to avoid data corruptions. In particular, the parquet-mr encoder // can generate incorrect files if values are missing in columns declared as non-nullable. - val nullableSchema: DataType = if (forceNullableSchema) schema.asNullable else schema + val nullableSchema = if (forceNullableSchema) schema.asNullable else schema override def nullable: Boolean = true http://git-wip-us.apache.org/repos/asf/spark/blob/9fefb47f/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 5e2ac02..3e9cde4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -588,6 +588,14 @@ object SQLConf { .stringConf .createWithDefault("_corrupt_record") + val FROM_JSON_FORCE_NULLABLE_SCHEMA = buildConf("spark.sql.fromJsonForceNullableSchema") +.internal() +.doc("When true, force the output schema of the from_json() function to be nullable " + + "(including all the fields). Otherwise, the schema might not be compatible with" + + "actual data, which leads to curruptions.") +.booleanConf +.createWithDefault(true) + val BROADCAST_TIMEOUT = buildConf("spark.sql.broadcastTimeout") .doc("Timeout in seconds for the broadcast wait time in broadcast joins.") .timeConf(TimeUnit.SECONDS) @@ -1334,14 +1342,6 @@ object SQLConf { "When this conf is not set, the value from `spark.redaction.string.regex` is used.") .fallbackConf(org.apache.spark.internal.config.STRING_REDACTION_PATTERN) - val FROM_JSON_FORCE_NULLABLE_SCHEMA = buildConf("spark.sql.function.fromJson.forceNullable") -.internal() -.doc("When true, force the output schema of the from_json() function to be nullable " + - "(including all the fields). Otherwise, the schema might not be compatible with" + - "actual data, which leads to corruptions.") -.booleanConf -.createWithDefault(true) - val CONCAT_BINARY_AS_STRING = buildConf("spark.sql.function.concatBinaryAsString") .doc("When this option is set to false and all inputs are binary, `functions.concat` returns " + "an output as binary. Otherwise, it returns as a string. ") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"
Repository: spark Updated Branches: refs/heads/master a71f6a175 -> cb1b55cf7 Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema" This reverts commit 6c7db7fd1ced1d143b1389d09990a620fc16be46. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb1b55cf Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb1b55cf Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb1b55cf Branch: refs/heads/master Commit: cb1b55cf771018f1560f6b173cdd7c6ca8061bc7 Parents: a71f6a1 Author: Dongjoon Hyun Authored: Wed Sep 19 14:33:40 2018 -0700 Committer: Dongjoon Hyun Committed: Wed Sep 19 14:33:40 2018 -0700 -- .../sql/catalyst/expressions/jsonExpressions.scala | 4 ++-- .../org/apache/spark/sql/internal/SQLConf.scala | 16 2 files changed, 10 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cb1b55cf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index ade10ab..bd9090a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -517,12 +517,12 @@ case class JsonToStructs( timeZoneId: Option[String] = None) extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes { - val forceNullableSchema: Boolean = SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA) + val forceNullableSchema = SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA) // The JSON input data might be missing certain fields. We force the nullability // of the user-provided schema to avoid data corruptions. In particular, the parquet-mr encoder // can generate incorrect files if values are missing in columns declared as non-nullable. - val nullableSchema: DataType = if (forceNullableSchema) schema.asNullable else schema + val nullableSchema = if (forceNullableSchema) schema.asNullable else schema override def nullable: Boolean = true http://git-wip-us.apache.org/repos/asf/spark/blob/cb1b55cf/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 4499a35..b1e9b17 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -608,6 +608,14 @@ object SQLConf { .stringConf .createWithDefault("_corrupt_record") + val FROM_JSON_FORCE_NULLABLE_SCHEMA = buildConf("spark.sql.fromJsonForceNullableSchema") +.internal() +.doc("When true, force the output schema of the from_json() function to be nullable " + + "(including all the fields). Otherwise, the schema might not be compatible with" + + "actual data, which leads to curruptions.") +.booleanConf +.createWithDefault(true) + val BROADCAST_TIMEOUT = buildConf("spark.sql.broadcastTimeout") .doc("Timeout in seconds for the broadcast wait time in broadcast joins.") .timeConf(TimeUnit.SECONDS) @@ -1354,14 +1362,6 @@ object SQLConf { "When this conf is not set, the value from `spark.redaction.string.regex` is used.") .fallbackConf(org.apache.spark.internal.config.STRING_REDACTION_PATTERN) - val FROM_JSON_FORCE_NULLABLE_SCHEMA = buildConf("spark.sql.function.fromJson.forceNullable") -.internal() -.doc("When true, force the output schema of the from_json() function to be nullable " + - "(including all the fields). Otherwise, the schema might not be compatible with" + - "actual data, which leads to corruptions.") -.booleanConf -.createWithDefault(true) - val CONCAT_BINARY_AS_STRING = buildConf("spark.sql.function.concatBinaryAsString") .doc("When this option is set to false and all inputs are binary, `functions.concat` returns " + "an output as binary. Otherwise, it returns as a string. ") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark-website git commit: Added the Spark Operator as a third-party project
Repository: spark-website Updated Branches: refs/heads/asf-site 9b21d71d2 -> 806a1bd52 Added the Spark Operator as a third-party project srowen. Author: Yinan Li Closes #148 from liyinan926/asf-site. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/806a1bd5 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/806a1bd5 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/806a1bd5 Branch: refs/heads/asf-site Commit: 806a1bd5253b96ae6eb50ae6f2f3f2d7851fda42 Parents: 9b21d71 Author: Yinan Li Authored: Wed Sep 19 12:24:12 2018 -0700 Committer: Yinan Li Committed: Wed Sep 19 12:24:12 2018 -0700 -- site/third-party-projects.html | 1 + third-party-projects.md| 1 + 2 files changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/806a1bd5/site/third-party-projects.html -- diff --git a/site/third-party-projects.html b/site/third-party-projects.html index c4caf1c..aa5fd84 100644 --- a/site/third-party-projects.html +++ b/site/third-party-projects.html @@ -239,6 +239,7 @@ against Spark, and data scientists to use Javascript in Jupyter notebooks. https://github.com/SnappyDataInc/snappydata;>SnappyData - an open source OLTP + OLAP database integrated with Spark on the same JVMs. https://github.com/Hydrospheredata/mist;>Mist - Serverless proxy for Spark cluster (spark middleware) + https://github.com/GoogleCloudPlatform/spark-on-k8s-operator;>K8S Operator for Apache Spark - Kubernetes operator for specifying and managing the lifecycle of Apache Spark applications on Kubernetes. Applications Using Spark http://git-wip-us.apache.org/repos/asf/spark-website/blob/806a1bd5/third-party-projects.md -- diff --git a/third-party-projects.md b/third-party-projects.md index d965bce..43e7516 100644 --- a/third-party-projects.md +++ b/third-party-projects.md @@ -45,6 +45,7 @@ against Spark, and data scientists to use Javascript in Jupyter notebooks. - https://github.com/SnappyDataInc/snappydata;>SnappyData - an open source OLTP + OLAP database integrated with Spark on the same JVMs. - https://github.com/Hydrospheredata/mist;>Mist - Serverless proxy for Spark cluster (spark middleware) +- https://github.com/GoogleCloudPlatform/spark-on-k8s-operator;>K8S Operator for Apache Spark - Kubernetes operator for specifying and managing the lifecycle of Apache Spark applications on Kubernetes. Applications Using Spark - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25414][SS][TEST] make it clear that the numRows metrics should be counted for each scan of the source
Repository: spark Updated Branches: refs/heads/master 12b1e91e6 -> a71f6a175 [SPARK-25414][SS][TEST] make it clear that the numRows metrics should be counted for each scan of the source ## What changes were proposed in this pull request? For self-join/self-union, Spark will produce a physical plan which has multiple `DataSourceV2ScanExec` instances referring to the same `ReadSupport` instance. In this case, the streaming source is indeed scanned multiple times, and the `numInputRows` metrics should be counted for each scan. Actually we already have 2 test cases to verify the behavior: 1. `StreamingQuerySuite.input row calculation with same V2 source used twice in self-join` 2. `KafkaMicroBatchSourceSuiteBase.ensure stream-stream self-join generates only one offset in log and correct metrics`. However, in these 2 tests, the expected result is different, which is super confusing. It turns out that, the first test doesn't trigger exchange reuse, so the source is scanned twice. The second test triggers exchange reuse, and the source is scanned only once. This PR proposes to improve these 2 tests, to test with/without exchange reuse. ## How was this patch tested? test only change Closes #22402 from cloud-fan/bug. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a71f6a17 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a71f6a17 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a71f6a17 Branch: refs/heads/master Commit: a71f6a1750fd0a29ecae6b98673ee15840da1c62 Parents: 12b1e91 Author: Wenchen Fan Authored: Thu Sep 20 00:29:48 2018 +0800 Committer: Wenchen Fan Committed: Thu Sep 20 00:29:48 2018 +0800 -- .../kafka010/KafkaMicroBatchSourceSuite.scala | 39 + .../execution/streaming/ProgressReporter.scala | 6 +-- .../sql/streaming/StreamingQuerySuite.scala | 46 +++- 3 files changed, 68 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a71f6a17/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala -- diff --git a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala index 8e246db..e5f0088 100644 --- a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala +++ b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala @@ -35,9 +35,11 @@ import org.scalatest.time.SpanSugar._ import org.apache.spark.sql.{ForeachWriter, SparkSession} import org.apache.spark.sql.execution.datasources.v2.StreamingDataSourceV2Relation +import org.apache.spark.sql.execution.exchange.ReusedExchangeExec import org.apache.spark.sql.execution.streaming._ import org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution import org.apache.spark.sql.functions.{count, window} +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.kafka010.KafkaSourceProvider._ import org.apache.spark.sql.sources.v2.DataSourceOptions import org.apache.spark.sql.streaming.{ProcessingTime, StreamTest} @@ -598,18 +600,37 @@ abstract class KafkaMicroBatchSourceSuiteBase extends KafkaSourceSuiteBase { val join = values.join(values, "key") -testStream(join)( - makeSureGetOffsetCalled, - AddKafkaData(Set(topic), 1, 2), - CheckAnswer((1, 1, 1), (2, 2, 2)), - AddKafkaData(Set(topic), 6, 3), - CheckAnswer((1, 1, 1), (2, 2, 2), (3, 3, 3), (1, 6, 1), (1, 1, 6), (1, 6, 6)), - AssertOnQuery { q => +def checkQuery(check: AssertOnQuery): Unit = { + testStream(join)( +makeSureGetOffsetCalled, +AddKafkaData(Set(topic), 1, 2), +CheckAnswer((1, 1, 1), (2, 2, 2)), +AddKafkaData(Set(topic), 6, 3), +CheckAnswer((1, 1, 1), (2, 2, 2), (3, 3, 3), (1, 6, 1), (1, 1, 6), (1, 6, 6)), +check + ) +} + +withSQLConf(SQLConf.EXCHANGE_REUSE_ENABLED.key -> "false") { + checkQuery(AssertOnQuery { q => assert(q.availableOffsets.iterator.size == 1) +// The kafka source is scanned twice because of self-join +assert(q.recentProgress.map(_.numInputRows).sum == 8) +true + }) +} + +withSQLConf(SQLConf.EXCHANGE_REUSE_ENABLED.key -> "true") { + checkQuery(AssertOnQuery { q => +assert(q.availableOffsets.iterator.size == 1) +assert(q.lastExecution.executedPlan.collect { + case r: ReusedExchangeExec => r +}.length == 1) +// The kafka
svn commit: r29514 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_08_14-12b1e91-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Sep 19 15:29:02 2018 New Revision: 29514 Log: Apache Spark 2.5.0-SNAPSHOT-2018_09_19_08_14-12b1e91 docs [This commit notification would consist of 1484 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r29509 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_06_03-538ae62-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Sep 19 13:17:20 2018 New Revision: 29509 Log: Apache Spark 2.4.1-SNAPSHOT-2018_09_19_06_03-538ae62 docs [This commit notification would consist of 1475 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25358][SQL] MutableProjection supports fallback to an interpreted mode
Repository: spark Updated Branches: refs/heads/master 5534a3a58 -> 12b1e91e6 [SPARK-25358][SQL] MutableProjection supports fallback to an interpreted mode ## What changes were proposed in this pull request? In SPARK-23711, `UnsafeProjection` supports fallback to an interpreted mode. Therefore, this pr fixed code to support the same fallback mode in `MutableProjection` based on `CodeGeneratorWithInterpretedFallback`. ## How was this patch tested? Added tests in `CodeGeneratorWithInterpretedFallbackSuite`. Closes #22355 from maropu/SPARK-25358. Authored-by: Takeshi Yamamuro Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/12b1e91e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/12b1e91e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/12b1e91e Branch: refs/heads/master Commit: 12b1e91e6b5135f6ed3e59a49abfc2e5a855263a Parents: 5534a3a Author: Takeshi Yamamuro Authored: Wed Sep 19 19:54:49 2018 +0800 Committer: Wenchen Fan Committed: Wed Sep 19 19:54:49 2018 +0800 -- .../InterpretedMutableProjection.scala | 89 .../sql/catalyst/expressions/Projection.scala | 75 - .../codegen/GenerateMutableProjection.scala | 4 + .../sql/catalyst/expressions/package.scala | 18 +--- ...eGeneratorWithInterpretedFallbackSuite.scala | 38 - .../CollectionExpressionsSuite.scala| 8 +- .../expressions/ExpressionEvalHelper.scala | 34 +--- .../expressions/MiscExpressionsSuite.scala | 10 +-- .../expressions/ObjectExpressionsSuite.scala| 8 +- .../apache/spark/sql/execution/SparkPlan.scala | 2 +- .../spark/sql/execution/aggregate/udaf.scala| 2 +- 11 files changed, 201 insertions(+), 87 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/12b1e91e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala new file mode 100644 index 000..0654108 --- /dev/null +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.aggregate.NoOp + + +/** + * A [[MutableProjection]] that is calculated by calling `eval` on each of the specified + * expressions. + * + * @param expressions a sequence of expressions that determine the value of each column of the + *output row. + */ +class InterpretedMutableProjection(expressions: Seq[Expression]) extends MutableProjection { + def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) = +this(toBoundExprs(expressions, inputSchema)) + + private[this] val buffer = new Array[Any](expressions.size) + + override def initialize(partitionIndex: Int): Unit = { +expressions.foreach(_.foreach { + case n: Nondeterministic => n.initialize(partitionIndex) + case _ => +}) + } + + private[this] val validExprs = expressions.zipWithIndex.filter { +case (NoOp, _) => false +case _ => true + } + private[this] var mutableRow: InternalRow = new GenericInternalRow(expressions.size) + def currentValue: InternalRow = mutableRow + + override def target(row: InternalRow): MutableProjection = { +mutableRow = row +this + } + + override def apply(input: InternalRow): InternalRow = { +var i = 0 +while (i < validExprs.length) { + val (expr, ordinal) = validExprs(i) + // Store the result into buffer first, to make the projection atomic (needed by aggregation) + buffer(ordinal) =
svn commit: r29507 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_04_02-5534a3a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Sep 19 11:17:13 2018 New Revision: 29507 Log: Apache Spark 2.5.0-SNAPSHOT-2018_09_19_04_02-5534a3a docs [This commit notification would consist of 1484 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for publishing scala-2.12 build
Repository: spark Updated Branches: refs/heads/branch-2.4 f11f44548 -> 538ae62e0 [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for publishing scala-2.12 build ## What changes were proposed in this pull request? This is a follow up for #22441. 1. Remove flag "-Pkafka-0-8" for Scala 2.12 build. 2. Clean up the script, simpler logic. 3. Switch to Scala version to 2.11 before script exit. ## How was this patch tested? Manual test. Closes #22454 from gengliangwang/revise_release_build. Authored-by: Gengliang Wang Signed-off-by: Wenchen Fan (cherry picked from commit 5534a3a58e4025624fbad527dd129acb8025f25a) Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/538ae62e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/538ae62e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/538ae62e Branch: refs/heads/branch-2.4 Commit: 538ae62e0cafc8180b04e4e5c74b79acee60d2b1 Parents: f11f445 Author: Gengliang Wang Authored: Wed Sep 19 18:30:46 2018 +0800 Committer: Wenchen Fan Committed: Wed Sep 19 18:31:09 2018 +0800 -- dev/create-release/release-build.sh | 38 +--- 1 file changed, 15 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/538ae62e/dev/create-release/release-build.sh -- diff --git a/dev/create-release/release-build.sh b/dev/create-release/release-build.sh index 4c90a77..cce5f8b 100755 --- a/dev/create-release/release-build.sh +++ b/dev/create-release/release-build.sh @@ -111,21 +111,21 @@ fi # different versions of Scala are supported. BASE_PROFILES="-Pmesos -Pyarn" PUBLISH_SCALA_2_10=0 -PUBLISH_SCALA_2_12=0 SCALA_2_10_PROFILES="-Pscala-2.10" SCALA_2_11_PROFILES= -SCALA_2_12_PROFILES="-Pscala-2.12 -Pkafka-0-8" - if [[ $SPARK_VERSION > "2.3" ]]; then BASE_PROFILES="$BASE_PROFILES -Pkubernetes -Pflume" SCALA_2_11_PROFILES="-Pkafka-0-8" - if [[ $SPARK_VERSION > "2.4" ]]; then -PUBLISH_SCALA_2_12=1 - fi else PUBLISH_SCALA_2_10=1 fi +PUBLISH_SCALA_2_12=0 +SCALA_2_12_PROFILES="-Pscala-2.12" +if [[ $SPARK_VERSION > "2.4" ]]; then + PUBLISH_SCALA_2_12=1 +fi + # Hive-specific profiles for some builds HIVE_PROFILES="-Phive -Phive-thriftserver" # Profiles for publishing snapshots and release to Maven Central @@ -190,17 +190,9 @@ if [[ "$1" == "package" ]]; then # Updated for each binary build make_binary_release() { NAME=$1 -SCALA_VERSION=$2 -SCALA_PROFILES= -if [[ SCALA_VERSION == "2.10" ]]; then - SCALA_PROFILES="$SCALA_2_10_PROFILES" -elif [[ SCALA_VERSION == "2.12" ]]; then - SCALA_PROFILES="$SCALA_2_12_PROFILES" -else - SCALA_PROFILES="$SCALA_2_11_PROFILES" -fi -FLAGS="$MVN_EXTRA_OPTS -B $SCALA_PROFILES $BASE_RELEASE_PROFILES $3" -BUILD_PACKAGE=$4 +FLAGS="$MVN_EXTRA_OPTS -B $BASE_RELEASE_PROFILES $2" +BUILD_PACKAGE=$3 +SCALA_VERSION=$4 # We increment the Zinc port each time to avoid OOM's and other craziness if multiple builds # share the same Zinc server. @@ -210,10 +202,8 @@ if [[ "$1" == "package" ]]; then cp -r spark spark-$SPARK_VERSION-bin-$NAME cd spark-$SPARK_VERSION-bin-$NAME -if [[ SCALA_VERSION == "2.10" ]]; then - ./dev/change-scala-version.sh 2.10 -elif [[ SCALA_VERSION == "2.12" ]]; then - ./dev/change-scala-version.sh 2.12 +if [[ "$SCALA_VERSION" != "2.11" ]]; then + ./dev/change-scala-version.sh $SCALA_VERSION fi export ZINC_PORT=$ZINC_PORT @@ -305,7 +295,7 @@ if [[ "$1" == "package" ]]; then for key in ${!BINARY_PKGS_ARGS[@]}; do args=${BINARY_PKGS_ARGS[$key]} extra=${BINARY_PKGS_EXTRA[$key]} -if ! make_binary_release "$key" "2.11" "$args" "$extra"; then +if ! make_binary_release "$key" "$SCALA_2_11_PROFILES $args" "$extra" "2.11"; then error "Failed to build $key package. Check logs for details." fi done @@ -314,7 +304,7 @@ if [[ "$1" == "package" ]]; then key="without-hadoop-scala-2.12" args="-Phadoop-provided" extra="" -if ! make_binary_release "$key" "2.12" "$args" "$extra"; then +if ! make_binary_release "$key" "$SCALA_2_12_PROFILES $args" "$extra" "2.12"; then error "Failed to build $key package. Check logs for details." fi fi @@ -446,6 +436,8 @@ if [[ "$1" == "publish-release" ]]; then # Clean-up Zinc nailgun process $LSOF -P |grep $ZINC_PORT | grep LISTEN | awk '{ print $2; }' | xargs kill + ./dev/change-scala-version.sh 2.11 + pushd $tmp_repo/org/apache/spark # Remove any extra files generated during install - To unsubscribe, e-mail:
spark git commit: [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for publishing scala-2.12 build
Repository: spark Updated Branches: refs/heads/master 4193c7623 -> 5534a3a58 [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for publishing scala-2.12 build ## What changes were proposed in this pull request? This is a follow up for #22441. 1. Remove flag "-Pkafka-0-8" for Scala 2.12 build. 2. Clean up the script, simpler logic. 3. Switch to Scala version to 2.11 before script exit. ## How was this patch tested? Manual test. Closes #22454 from gengliangwang/revise_release_build. Authored-by: Gengliang Wang Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5534a3a5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5534a3a5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5534a3a5 Branch: refs/heads/master Commit: 5534a3a58e4025624fbad527dd129acb8025f25a Parents: 4193c76 Author: Gengliang Wang Authored: Wed Sep 19 18:30:46 2018 +0800 Committer: Wenchen Fan Committed: Wed Sep 19 18:30:46 2018 +0800 -- dev/create-release/release-build.sh | 38 +--- 1 file changed, 15 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5534a3a5/dev/create-release/release-build.sh -- diff --git a/dev/create-release/release-build.sh b/dev/create-release/release-build.sh index 4c90a77..cce5f8b 100755 --- a/dev/create-release/release-build.sh +++ b/dev/create-release/release-build.sh @@ -111,21 +111,21 @@ fi # different versions of Scala are supported. BASE_PROFILES="-Pmesos -Pyarn" PUBLISH_SCALA_2_10=0 -PUBLISH_SCALA_2_12=0 SCALA_2_10_PROFILES="-Pscala-2.10" SCALA_2_11_PROFILES= -SCALA_2_12_PROFILES="-Pscala-2.12 -Pkafka-0-8" - if [[ $SPARK_VERSION > "2.3" ]]; then BASE_PROFILES="$BASE_PROFILES -Pkubernetes -Pflume" SCALA_2_11_PROFILES="-Pkafka-0-8" - if [[ $SPARK_VERSION > "2.4" ]]; then -PUBLISH_SCALA_2_12=1 - fi else PUBLISH_SCALA_2_10=1 fi +PUBLISH_SCALA_2_12=0 +SCALA_2_12_PROFILES="-Pscala-2.12" +if [[ $SPARK_VERSION > "2.4" ]]; then + PUBLISH_SCALA_2_12=1 +fi + # Hive-specific profiles for some builds HIVE_PROFILES="-Phive -Phive-thriftserver" # Profiles for publishing snapshots and release to Maven Central @@ -190,17 +190,9 @@ if [[ "$1" == "package" ]]; then # Updated for each binary build make_binary_release() { NAME=$1 -SCALA_VERSION=$2 -SCALA_PROFILES= -if [[ SCALA_VERSION == "2.10" ]]; then - SCALA_PROFILES="$SCALA_2_10_PROFILES" -elif [[ SCALA_VERSION == "2.12" ]]; then - SCALA_PROFILES="$SCALA_2_12_PROFILES" -else - SCALA_PROFILES="$SCALA_2_11_PROFILES" -fi -FLAGS="$MVN_EXTRA_OPTS -B $SCALA_PROFILES $BASE_RELEASE_PROFILES $3" -BUILD_PACKAGE=$4 +FLAGS="$MVN_EXTRA_OPTS -B $BASE_RELEASE_PROFILES $2" +BUILD_PACKAGE=$3 +SCALA_VERSION=$4 # We increment the Zinc port each time to avoid OOM's and other craziness if multiple builds # share the same Zinc server. @@ -210,10 +202,8 @@ if [[ "$1" == "package" ]]; then cp -r spark spark-$SPARK_VERSION-bin-$NAME cd spark-$SPARK_VERSION-bin-$NAME -if [[ SCALA_VERSION == "2.10" ]]; then - ./dev/change-scala-version.sh 2.10 -elif [[ SCALA_VERSION == "2.12" ]]; then - ./dev/change-scala-version.sh 2.12 +if [[ "$SCALA_VERSION" != "2.11" ]]; then + ./dev/change-scala-version.sh $SCALA_VERSION fi export ZINC_PORT=$ZINC_PORT @@ -305,7 +295,7 @@ if [[ "$1" == "package" ]]; then for key in ${!BINARY_PKGS_ARGS[@]}; do args=${BINARY_PKGS_ARGS[$key]} extra=${BINARY_PKGS_EXTRA[$key]} -if ! make_binary_release "$key" "2.11" "$args" "$extra"; then +if ! make_binary_release "$key" "$SCALA_2_11_PROFILES $args" "$extra" "2.11"; then error "Failed to build $key package. Check logs for details." fi done @@ -314,7 +304,7 @@ if [[ "$1" == "package" ]]; then key="without-hadoop-scala-2.12" args="-Phadoop-provided" extra="" -if ! make_binary_release "$key" "2.12" "$args" "$extra"; then +if ! make_binary_release "$key" "$SCALA_2_12_PROFILES $args" "$extra" "2.12"; then error "Failed to build $key package. Check logs for details." fi fi @@ -446,6 +436,8 @@ if [[ "$1" == "publish-release" ]]; then # Clean-up Zinc nailgun process $LSOF -P |grep $ZINC_PORT | grep LISTEN | awk '{ print $2; }' | xargs kill + ./dev/change-scala-version.sh 2.11 + pushd $tmp_repo/org/apache/spark # Remove any extra files generated during install - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r29506 - in /dev/spark/2.4.1-SNAPSHOT-2018_09_19_02_02-f11f445-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Sep 19 09:17:32 2018 New Revision: 29506 Log: Apache Spark 2.4.1-SNAPSHOT-2018_09_19_02_02-f11f445 docs [This commit notification would consist of 1475 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r29504 - in /dev/spark/2.5.0-SNAPSHOT-2018_09_19_00_02-4193c76-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Sep 19 07:17:46 2018 New Revision: 29504 Log: Apache Spark 2.5.0-SNAPSHOT-2018_09_19_00_02-4193c76 docs [This commit notification would consist of 1484 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org