[spark] branch master updated (aaa8a80 -> 4530760)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aaa8a80 [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes add 4530760 [SPARK-35774][SQL] Parse any year-month interval types in SQL No new revisions were added by this update. Summary of changes: .../main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 2 +- .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 9 +++-- .../test/scala/org/apache/spark/sql/types/DataTypeSuite.scala| 2 +- 3 files changed, 9 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new aaa8a80 [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes aaa8a80 is described below commit aaa8a80c9d3426107de5873b4391600701121385 Author: Venkata krishnan Sowrirajan AuthorDate: Tue Jun 15 22:02:19 2021 -0500 [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes ### What changes were proposed in this pull request? Cache commonly occurring duplicate Some objects in SQLMetrics by using a Guava cache and reusing the existing Guava String Interner to avoid duplicate strings in JSONProtocol. Also with AccumulatorV2 we have seen lot of Some(-1L) and Some(0L) occurrences in a heap dump that is naively interned by having reusing a already constructed Some(-1L) and Some(0L) To give some context on the impact and the garbage got accumulated, below are the details of the complex spark job which we troubleshooted and figured out the bottlenecks. **tl;dr - In short, major issues were the accumulation of duplicate objects mainly from SQLMetrics.** Greater than 25% of the 40G driver heap filled with (a very large number of) **duplicate**, immutable objects. 1. Very large number of **duplicate** immutable objects. - Type of metric is represented by `'scala.Some("sql")'` - which is created for each metric. - Fixing this reduced memory usage from 4GB to a few bytes. 2. `scala.Some(0)` and `scala.Some(-1)` are very common metric values (typically to indicate absence of metric) - Individually the values are all immutable, but spark sql was creating a new instance each time. - Intern'ing these resulted in saving ~4.5GB for a 40G heap. 3. Using string interpolation for metric names. - Interpolation results in creation of a new string object. - We end up with a very large number of metric names - though the number of unique strings is miniscule. - ~7.5 GB in the 40 GB heap : which went down to a few KB's when fixed. ### Why are the changes needed? To reduce overall driver memory footprint which eventually reduces the Full GC pauses. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Since these are memory related optimizations, unit tests are not added. These changes are added in our internal platform which made it possible for one of the complex spark job continuously failing to succeed along with other set of optimizations. Closes #32754 from venkata91/SPARK-35613. Authored-by: Venkata krishnan Sowrirajan Signed-off-by: Mridul Muralidharan gmail.com> --- .../scala/org/apache/spark/status/LiveEntity.scala | 16 +--- .../org/apache/spark/util/AccumulatorV2.scala | 19 +- .../scala/org/apache/spark/util/JsonProtocol.scala | 9 --- .../main/scala/org/apache/spark/util/Utils.scala | 8 ++ .../spark/sql/execution/metric/SQLMetrics.scala| 30 +++--- 5 files changed, 53 insertions(+), 29 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/status/LiveEntity.scala b/core/src/main/scala/org/apache/spark/status/LiveEntity.scala index 5af76e9..fc5fc32 100644 --- a/core/src/main/scala/org/apache/spark/status/LiveEntity.scala +++ b/core/src/main/scala/org/apache/spark/status/LiveEntity.scala @@ -24,8 +24,6 @@ import scala.collection.JavaConverters._ import scala.collection.immutable.{HashSet, TreeSet} import scala.collection.mutable.HashMap -import com.google.common.collect.Interners - import org.apache.spark.JobExecutionStatus import org.apache.spark.executor.{ExecutorMetrics, TaskMetrics} import org.apache.spark.resource.{ExecutorResourceRequest, ResourceInformation, ResourceProfile, TaskResourceRequest} @@ -34,6 +32,7 @@ import org.apache.spark.status.api.v1 import org.apache.spark.storage.{RDDInfo, StorageLevel} import org.apache.spark.ui.SparkUI import org.apache.spark.util.{AccumulatorContext, Utils} +import org.apache.spark.util.Utils.weakIntern import org.apache.spark.util.collection.OpenHashSet /** @@ -511,8 +510,6 @@ private class LiveStage(var info: StageInfo) extends LiveEntity { */ private class LiveRDDPartition(val blockName: String, rddLevel: StorageLevel) { - import LiveEntityHelpers._ - // Pointers used by RDDPartitionSeq. @volatile var prev: LiveRDDPartition = null @volatile var next: LiveRDDPartition = null @@ -543,8 +540,6 @@ private class LiveRDDPartition(val blockName: String, rddLevel: StorageLevel) { private class LiveRDDDistribution(exec: LiveExecutor) { - import LiveEntityHelpers._ - val executorId = exec.executorId
[spark] branch master updated (5c96d64 -> b08cf6e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5c96d64 [SPARK-35707][ML] optimize sparse GEMM by skipping bound checking add b08cf6e [SPARK-35203][SQL] Improve Repartition statistics estimation No new revisions were added by this update. Summary of changes: .../logical/statsEstimation/BasicStatsPlanVisitor.scala | 4 ++-- .../SizeInBytesOnlyStatsPlanVisitor.scala | 4 ++-- .../statsEstimation/BasicStatsEstimationSuite.scala | 17 - 3 files changed, 16 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35707][ML] optimize sparse GEMM by skipping bound checking
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5c96d64 [SPARK-35707][ML] optimize sparse GEMM by skipping bound checking 5c96d64 is described below commit 5c96d643eeb4ca1ad7e4e9cc711971203fcacc6c Author: Ruifeng Zheng AuthorDate: Wed Jun 16 08:57:27 2021 +0800 [SPARK-35707][ML] optimize sparse GEMM by skipping bound checking ### What changes were proposed in this pull request? Sparse gemm use mothod `DenseMatrix.apply` to access the values, which can be optimized by skipping checking the bound and `isTransposed` ``` override def apply(i: Int, j: Int): Double = values(index(i, j)) private[ml] def index(i: Int, j: Int): Int = { require(i >= 0 && i < numRows, s"Expected 0 <= i < $numRows, got i = $i.") require(j >= 0 && j < numCols, s"Expected 0 <= j < $numCols, got j = $j.") if (!isTransposed) i + numRows * j else j + numCols * i } ``` ### Why are the changes needed? to improve performance, about 15% faster in the designed case ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing testsuite and additional performance test Closes #32857 from zhengruifeng/gemm_opt_index. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala | 4 ++-- mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala| 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala index 0bc8b2f..d1255de 100644 --- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala +++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala @@ -480,7 +480,7 @@ private[spark] object BLAS extends Serializable { val indEnd = AcolPtrs(rowCounterForA + 1) var sum = 0.0 while (i < indEnd) { - sum += Avals(i) * B(ArowIndices(i), colCounterForB) + sum += Avals(i) * Bvals(colCounterForB + nB * ArowIndices(i)) i += 1 } val Cindex = Cstart + rowCounterForA @@ -522,7 +522,7 @@ private[spark] object BLAS extends Serializable { while (colCounterForA < kA) { var i = AcolPtrs(colCounterForA) val indEnd = AcolPtrs(colCounterForA + 1) -val Bval = B(colCounterForA, colCounterForB) * alpha +val Bval = Bvals(colCounterForB + nB * colCounterForA) * alpha while (i < indEnd) { Cvals(Cstart + ArowIndices(i)) += Avals(i) * Bval i += 1 diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala index e38cfe4..5cbec53 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala @@ -462,7 +462,7 @@ private[spark] object BLAS extends Serializable with Logging { val indEnd = AcolPtrs(rowCounterForA + 1) var sum = 0.0 while (i < indEnd) { - sum += Avals(i) * B(ArowIndices(i), colCounterForB) + sum += Avals(i) * Bvals(colCounterForB + nB * ArowIndices(i)) i += 1 } val Cindex = Cstart + rowCounterForA @@ -504,7 +504,7 @@ private[spark] object BLAS extends Serializable with Logging { while (colCounterForA < kA) { var i = AcolPtrs(colCounterForA) val indEnd = AcolPtrs(colCounterForA + 1) -val Bval = B(colCounterForA, colCounterForB) * alpha +val Bval = Bvals(colCounterForB + nB * colCounterForA) * alpha while (i < indEnd) { Cvals(Cstart + ArowIndices(i)) += Avals(i) * Bval i += 1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35666][ML] gemv skip array shape checking
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2802ac3 [SPARK-35666][ML] gemv skip array shape checking 2802ac3 is described below commit 2802ac321f7378c8a9113338c9872b8fd332de6b Author: Ruifeng Zheng AuthorDate: Wed Jun 16 08:54:34 2021 +0800 [SPARK-35666][ML] gemv skip array shape checking ### What changes were proposed in this pull request? In existing impls, it is common case that the vector/matrix need to be sliced/copied just due to shape match. which makes the logic complex and introduce extra costing of slicing & copying. ### Why are the changes needed? 1, avoid slicing and copying due to shape checking; 2, simpify the usages; ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing testsuites Closes #32805 from zhengruifeng/new_blas_func_for_agg. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- .../scala/org/apache/spark/ml/linalg/BLAS.scala| 60 -- .../ml/optim/aggregator/AFTBlockAggregator.scala | 24 +++-- .../aggregator/BinaryLogisticBlockAggregator.scala | 34 ++-- .../ml/optim/aggregator/HingeBlockAggregator.scala | 34 ++-- .../ml/optim/aggregator/HuberBlockAggregator.scala | 24 +++-- .../aggregator/LeastSquaresBlockAggregator.scala | 18 +++ 6 files changed, 84 insertions(+), 110 deletions(-) diff --git a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala index 5a6bee3..0bc8b2f 100644 --- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala +++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala @@ -536,6 +536,32 @@ private[spark] object BLAS extends Serializable { } /** + * y[0: A.numRows] := alpha * A * x[0: A.numCols] + beta * y[0: A.numRows] + */ + def gemv( + alpha: Double, + A: Matrix, + x: Array[Double], + beta: Double, + y: Array[Double]): Unit = { +require(A.numCols <= x.length, + s"The columns of A don't match the number of elements of x. A: ${A.numCols}, x: ${x.length}") +require(A.numRows <= y.length, + s"The rows of A don't match the number of elements of y. A: ${A.numRows}, y:${y.length}") +if (alpha == 0.0 && beta == 1.0) { + // gemv: alpha is equal to 0 and beta is equal to 1. Returning y. + return +} else if (alpha == 0.0) { + getBLAS(A.numRows).dscal(A.numRows, beta, y, 1) +} else { + A match { +case smA: SparseMatrix => gemvImpl(alpha, smA, x, beta, y) +case dmA: DenseMatrix => gemvImpl(alpha, dmA, x, beta, y) + } +} + } + + /** * y := alpha * A * x + beta * y * @param alpha a scalar to scale the multiplication A * x. * @param A the matrix A that will be left multiplied to x. Size of m x n. @@ -585,11 +611,24 @@ private[spark] object BLAS extends Serializable { x: DenseVector, beta: Double, y: DenseVector): Unit = { +gemvImpl(alpha, A, x.values, beta, y.values) + } + + /** + * y[0: A.numRows] := alpha * A * x[0: A.numCols] + beta * y[0: A.numRows] + * For `DenseMatrix` A. + */ + private def gemvImpl( + alpha: Double, + A: DenseMatrix, + xValues: Array[Double], + beta: Double, + yValues: Array[Double]): Unit = { val tStrA = if (A.isTransposed) "T" else "N" val mA = if (!A.isTransposed) A.numRows else A.numCols val nA = if (!A.isTransposed) A.numCols else A.numRows -nativeBLAS.dgemv(tStrA, mA, nA, alpha, A.values, mA, x.values, 1, beta, - y.values, 1) +nativeBLAS.dgemv(tStrA, mA, nA, alpha, A.values, mA, xValues, 1, beta, + yValues, 1) } /** @@ -715,8 +754,19 @@ private[spark] object BLAS extends Serializable { x: DenseVector, beta: Double, y: DenseVector): Unit = { -val xValues = x.values -val yValues = y.values +gemvImpl(alpha, A, x.values, beta, y.values) + } + + /** + * y[0: A.numRows] := alpha * A * x[0: A.numCols] + beta * y[0: A.numRows] + * For `SparseMatrix` A. + */ + private def gemvImpl( + alpha: Double, + A: SparseMatrix, + xValues: Array[Double], + beta: Double, + yValues: Array[Double]): Unit = { val mA: Int = A.numRows val nA: Int = A.numCols @@ -738,7 +788,7 @@ private[spark] object BLAS extends Serializable { rowCounter += 1 } } else { - if (beta != 1.0) scal(beta, y) + if (beta != 1.0) getBLAS(mA).dscal(mA, beta, yValues, 1) // Perform matrix-vector multiplication and add to y var colCounterForA = 0 while (colCounterForA < nA) { diff --git a/mllib/src/main/scala/org/apache/spark/ml/optim/aggr
[spark] branch master updated (ac228d4 -> 11e96dc)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ac228d4 [SPARK-35691][CORE] addFile/addJar/addDirectory should put CanonicalFile add 11e96dc [SPARK-35669][SQL] Quote the pushed column name only when nested column predicate pushdown is enabled No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/sources/filters.scala | 5 ++-- .../execution/datasources/DataSourceStrategy.scala | 31 +- .../spark/sql/FileBasedDataSourceSuite.scala | 10 +++ 3 files changed, 31 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9709ee5 -> ac228d4)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9709ee5 [SPARK-35760][SQL] Fix the max rows check for broadcast exchange add ac228d4 [SPARK-35691][CORE] addFile/addJar/addDirectory should put CanonicalFile No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/rpc/RpcEnv.scala | 3 +- .../spark/rpc/netty/NettyStreamManager.scala | 12 .../main/scala/org/apache/spark/util/Utils.scala | 2 +- .../scala/org/apache/spark/SparkContextSuite.scala | 32 ++ .../scala/org/apache/spark/rpc/RpcEnvSuite.scala | 9 ++ 5 files changed, 51 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (864ff67 -> 9709ee5)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 864ff67 [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs add 9709ee5 [SPARK-35760][SQL] Fix the max rows check for broadcast exchange No new revisions were added by this update. Summary of changes: .../execution/exchange/BroadcastExchangeExec.scala | 25 +++--- 1 file changed, 17 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 864ff67 [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs 864ff67 is described below commit 864ff677469172ca24fdef69b7d3a3482c688f47 Author: Sumeet Gajjar AuthorDate: Tue Jun 15 14:43:30 2021 -0700 [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs ### What changes were proposed in this pull request? Remove commons-httpclient as a direct dependency for Hadoop-3.2 profile. Hadoop-2.7 profile distribution still has it, hadoop-client has a compile dependency on commons-httpclient, thus we cannot remove it for Hadoop-2.7 profile. ``` [INFO] +- org.apache.hadoop:hadoop-client:jar:2.7.4:compile [INFO] | +- org.apache.hadoop:hadoop-common:jar:2.7.4:compile [INFO] | | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | | +- commons-httpclient:commons-httpclient:jar:3.1:compile ``` ### Why are the changes needed? Spark is pulling in commons-httpclient as a dependency directly. commons-httpclient went EOL years ago and there are most likely CVEs not being reported against it, thus we should remove it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing unittests - Checked the dependency tree before and after introducing the changes Before: ``` ./build/mvn dependency:tree -Phadoop-3.2 | grep -i "commons-httpclient" Using `mvn` from path: /usr/bin/mvn [INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:provided ``` After ``` ./build/mvn dependency:tree | grep -i "commons-httpclient" Using `mvn` from path: /Users/sumeet.gajjar/cloudera/upstream-spark/build/apache-maven-3.6.3/bin/mvn ``` P.S. Reopening this since [spark upgraded](https://github.com/apache/spark/commit/463daabd5afd9abfb8027ebcb2e608f169ad1e40) its `hive.version` to `2.3.9` which does not have a dependency on `commons-httpclient`. Closes #32912 from sumeetgajjar/SPARK-35429. Authored-by: Sumeet Gajjar Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 1 - pom.xml | 11 --- sql/hive/pom.xml| 4 3 files changed, 16 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 index 8b79d7e5..3482dd2 100644 --- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 @@ -35,7 +35,6 @@ commons-compiler/3.1.4//commons-compiler-3.1.4.jar commons-compress/1.20//commons-compress-1.20.jar commons-crypto/1.1.0//commons-crypto-1.1.0.jar commons-dbcp/1.4//commons-dbcp-1.4.jar -commons-httpclient/3.1//commons-httpclient-3.1.jar commons-io/2.8.0//commons-io-2.8.0.jar commons-lang/2.6//commons-lang-2.6.jar commons-lang3/3.12.0//commons-lang3-3.12.0.jar diff --git a/pom.xml b/pom.xml index 82a047f..ca038b2 100644 --- a/pom.xml +++ b/pom.xml @@ -157,8 +157,6 @@ 4.5.13 4.4.14 - -3.1 3.4.1 3.2.2 @@ -593,11 +591,6 @@ ${jsr305.version} -commons-httpclient -commons-httpclient -${httpclient.classic.version} - - org.apache.httpcomponents httpclient ${commons.httpclient.version} @@ -1811,10 +1804,6 @@ commons-codec -commons-httpclient -commons-httpclient - - org.apache.avro avro-mapred diff --git a/sql/hive/pom.xml b/sql/hive/pom.xml index 729d3f4..67a9854 100644 --- a/sql/hive/pom.xml +++ b/sql/hive/pom.xml @@ -134,10 +134,6 @@ avro-mapred - commons-httpclient - commons-httpclient - - org.apache.httpcomponents httpclient - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35680][SQL] Add fields to `YearMonthIntervalType`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 61ce8f7 [SPARK-35680][SQL] Add fields to `YearMonthIntervalType` 61ce8f7 is described below commit 61ce8f764982306f2c7a8b2b3dfe22963b49f2d5 Author: Max Gekk AuthorDate: Tue Jun 15 23:08:12 2021 +0300 [SPARK-35680][SQL] Add fields to `YearMonthIntervalType` ### What changes were proposed in this pull request? Extend `YearMonthIntervalType` to support interval fields. Valid interval field values: - 0 (YEAR) - 1 (MONTH) After the changes, the following year-month interval types are supported: 1. `YearMonthIntervalType(0, 0)` or `YearMonthIntervalType(YEAR, YEAR)` 2. `YearMonthIntervalType(0, 1)` or `YearMonthIntervalType(YEAR, MONTH)`. **It is the default one**. 3. `YearMonthIntervalType(1, 1)` or `YearMonthIntervalType(MONTH, MONTH)` Closes #32825 ### Why are the changes needed? In the current implementation, Spark supports only `interval year to month` but the SQL standard allows to specify the start and end fields. The changes will allow to follow ANSI SQL standard more precisely. ### Does this PR introduce _any_ user-facing change? Yes but `YearMonthIntervalType` has not been released yet. ### How was this patch tested? By existing test suites. Closes #32909 from MaxGekk/add-fields-to-YearMonthIntervalType. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../spark/sql/catalyst/expressions/UnsafeRow.java | 11 ++--- .../java/org/apache/spark/sql/types/DataTypes.java | 19 +--- .../sql/catalyst/CatalystTypeConverters.scala | 3 +- .../apache/spark/sql/catalyst/InternalRow.scala| 4 +- .../spark/sql/catalyst/JavaTypeInference.scala | 2 +- .../spark/sql/catalyst/ScalaReflection.scala | 10 ++--- .../spark/sql/catalyst/SerializerBuildHelper.scala | 2 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 18 .../apache/spark/sql/catalyst/dsl/package.scala| 7 ++- .../spark/sql/catalyst/encoders/RowEncoder.scala | 6 +-- .../spark/sql/catalyst/expressions/Cast.scala | 35 +-- .../expressions/InterpretedUnsafeProjection.scala | 2 +- .../catalyst/expressions/SpecificInternalRow.scala | 2 +- .../catalyst/expressions/aggregate/Average.scala | 6 +-- .../sql/catalyst/expressions/aggregate/Sum.scala | 2 +- .../sql/catalyst/expressions/arithmetic.scala | 10 ++--- .../expressions/codegen/CodeGenerator.scala| 4 +- .../expressions/collectionOperations.scala | 8 ++-- .../catalyst/expressions/datetimeExpressions.scala | 2 +- .../spark/sql/catalyst/expressions/hash.scala | 2 +- .../catalyst/expressions/intervalExpressions.scala | 10 ++--- .../spark/sql/catalyst/expressions/literals.scala | 16 --- .../catalyst/expressions/windowExpressions.scala | 4 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 6 ++- .../spark/sql/catalyst/util/IntervalUtils.scala| 17 ++- .../apache/spark/sql/catalyst/util/TypeUtils.scala | 4 +- .../spark/sql/errors/QueryCompilationErrors.scala | 11 + .../org/apache/spark/sql/types/DataType.scala | 4 +- .../spark/sql/types/YearMonthIntervalType.scala| 52 ++ .../org/apache/spark/sql/util/ArrowUtils.scala | 4 +- .../org/apache/spark/sql/RandomDataGenerator.scala | 2 +- .../spark/sql/RandomDataGeneratorSuite.scala | 4 +- .../sql/catalyst/CatalystTypeConvertersSuite.scala | 2 +- .../sql/catalyst/encoders/RowEncoderSuite.scala| 18 .../expressions/ArithmeticExpressionSuite.scala| 10 ++--- .../spark/sql/catalyst/expressions/CastSuite.scala | 30 ++--- .../sql/catalyst/expressions/CastSuiteBase.scala | 8 ++-- .../expressions/DateExpressionsSuite.scala | 19 .../expressions/HashExpressionsSuite.scala | 2 +- .../expressions/IntervalExpressionsSuite.scala | 24 ++ .../expressions/LiteralExpressionSuite.scala | 6 +-- .../catalyst/expressions/LiteralGenerator.scala| 4 +- .../expressions/MutableProjectionSuite.scala | 14 +++--- .../optimizer/PushFoldableIntoBranchesSuite.scala | 16 +++ .../sql/catalyst/parser/DataTypeParserSuite.scala | 2 +- .../sql/catalyst/util/IntervalUtilsSuite.scala | 5 ++- .../org/apache/spark/sql/types/DataTypeSuite.scala | 6 +-- .../apache/spark/sql/types/DataTypeTestUtils.scala | 18 .../apache/spark/sql/util/ArrowUtilsSuite.scala| 2 +- .../apache/spark/sql/execution/HiveResult.scala| 4 +- .../sql/execution/aggregate/HashMapGenerator.scala | 3 +- .../spark/sql/execution/aggregate/udaf.scala | 4 +- .../spark/sql/execution/arrow/ArrowWriter.scala| 2 +- .../sql/execution/columnar/Column
[spark] branch branch-3.0 updated (f1711af -> 0ef0f4f)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from f1711af Preparing development version 3.0.4-SNAPSHOT add 0ef0f4f [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/basicPhysicalOperators.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new d8ea6bc [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec d8ea6bc is described below commit d8ea6bcfad8d82f6886c7f538481ef2338fc04be Author: Andy Grove AuthorDate: Tue Jun 15 11:59:21 2021 -0700 [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec ### What changes were proposed in this pull request? `CoalesceExec` needlessly calls `child.execute` twice when it could just call it once and re-use the results. This only happens when `numPartitions == 1`. ### Why are the changes needed? It is more efficient to execute the child plan once rather than twice. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? There are no functional changes. This is just a performance optimization, so the existing tests should be sufficient to catch any regressions. Closes #32920 from andygrove/coalesce-exec-executes-twice. Authored-by: Andy Grove Signed-off-by: Dongjoon Hyun (cherry picked from commit 1012967ace4c7bd4e5a6f59c6ea6eec45871f292) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/execution/basicPhysicalOperators.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala index d651132..4fcd67b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala @@ -690,12 +690,13 @@ case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends UnaryExecN } protected override def doExecute(): RDD[InternalRow] = { -if (numPartitions == 1 && child.execute().getNumPartitions < 1) { +val rdd = child.execute() +if (numPartitions == 1 && rdd.getNumPartitions < 1) { // Make sure we don't output an RDD with 0 partitions, when claiming that we have a // `SinglePartition`. new CoalesceExec.EmptyRDDWithPartitions(sparkContext, numPartitions) } else { - child.execute().coalesce(numPartitions, shuffle = false) + rdd.coalesce(numPartitions, shuffle = false) } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1012967 [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec 1012967 is described below commit 1012967ace4c7bd4e5a6f59c6ea6eec45871f292 Author: Andy Grove AuthorDate: Tue Jun 15 11:59:21 2021 -0700 [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec ### What changes were proposed in this pull request? `CoalesceExec` needlessly calls `child.execute` twice when it could just call it once and re-use the results. This only happens when `numPartitions == 1`. ### Why are the changes needed? It is more efficient to execute the child plan once rather than twice. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? There are no functional changes. This is just a performance optimization, so the existing tests should be sufficient to catch any regressions. Closes #32920 from andygrove/coalesce-exec-executes-twice. Authored-by: Andy Grove Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/execution/basicPhysicalOperators.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala index b537040..8c51cde 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala @@ -724,12 +724,13 @@ case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends UnaryExecN } protected override def doExecute(): RDD[InternalRow] = { -if (numPartitions == 1 && child.execute().getNumPartitions < 1) { +val rdd = child.execute() +if (numPartitions == 1 && rdd.getNumPartitions < 1) { // Make sure we don't output an RDD with 0 partitions, when claiming that we have a // `SinglePartition`. new CoalesceExec.EmptyRDDWithPartitions(sparkContext, numPartitions) } else { - child.execute().coalesce(numPartitions, shuffle = false) + rdd.coalesce(numPartitions, shuffle = false) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c382d40 -> 8a02f3a)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c382d40 [SPARK-35766][SQL][TESTS] Break down CastSuite/AnsiCastSuite into multiple files add 8a02f3a [SPARK-35129][SQL] Construct year-month interval column from integral fields No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 1 + .../catalyst/expressions/intervalExpressions.scala | 53 +++ .../expressions/IntervalExpressionsSuite.scala | 28 ++ .../sql-functions/sql-expression-schema.md | 3 +- .../test/resources/sql-tests/inputs/interval.sql | 9 .../sql-tests/results/ansi/interval.sql.out| 60 +- .../resources/sql-tests/results/interval.sql.out | 60 +- 7 files changed, 211 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b74260f -> c382d40)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b74260f [SPARK-35765][SQL] Distinct aggs are not duplicate sensitive add c382d40 [SPARK-35766][SQL][TESTS] Break down CastSuite/AnsiCastSuite into multiple files No new revisions were added by this update. Summary of changes: .../catalyst/expressions/AnsiCastSuiteBase.scala | 481 +++ .../spark/sql/catalyst/expressions/CastSuite.scala | 1357 +--- .../sql/catalyst/expressions/CastSuiteBase.scala | 930 ++ 3 files changed, 1412 insertions(+), 1356 deletions(-) create mode 100644 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AnsiCastSuiteBase.scala create mode 100644 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d54edf0 -> b74260f)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d54edf0 [SPARK-35758][DOCS] Update the document about building Spark with Hadoop for Hadoop 2.x and 3.x add b74260f [SPARK-35765][SQL] Distinct aggs are not duplicate sensitive No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala| 6 +- .../sql/catalyst/optimizer/RemoveRedundantAggregatesSuite.scala | 6 +++--- 2 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-35758][DOCS] Update the document about building Spark with Hadoop for Hadoop 2.x and 3.x
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new f35df10 [SPARK-35758][DOCS] Update the document about building Spark with Hadoop for Hadoop 2.x and 3.x f35df10 is described below commit f35df10e2bab01c492bf627c7b12ce076a5da01e Author: Kousuke Saruta AuthorDate: Tue Jun 15 20:19:50 2021 +0900 [SPARK-35758][DOCS] Update the document about building Spark with Hadoop for Hadoop 2.x and 3.x ### What changes were proposed in this pull request? This PR updates the document about building Spark with Hadoop for Hadoop 3.x and Hadoop 3.2. ### Why are the changes needed? The document says about how to build like as follows: ``` ./build/mvn -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean package ``` But this command fails because the default build settings are for Hadoop 3.x. So, we need to modify the command example. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I confirmed both of these commands successfully finished. ``` ./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests package ./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -DskipTests package ``` I also built the document and confirmed the result. This is before: ![hadoop-version-before](https://user-images.githubusercontent.com/4736016/122016157-bf020c80-cdfb-11eb-8e74-4840861f8541.png) And this is after: ![hadoop-version-after](https://user-images.githubusercontent.com/4736016/122016188-c75a4780-cdfb-11eb-8427-2f0765e6ff7a.png) Closes #32917 from sarutak/fix-build-doc-with-hadoop. Authored-by: Kousuke Saruta Signed-off-by: Hyukjin Kwon (cherry picked from commit d54edf0bde33c0e93cf33cb41d6be13eb32b6848) Signed-off-by: Hyukjin Kwon --- docs/building-spark.md | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/building-spark.md b/docs/building-spark.md index 5106f2a..286b48e 100644 --- a/docs/building-spark.md +++ b/docs/building-spark.md @@ -77,7 +77,11 @@ from `hadoop.version`. Example: -./build/mvn -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean package +./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests clean package + +If you want to build with Hadoop 2.x, enable hadoop-2.7 profile: + +./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean package ## Building With Hive and JDBC Support - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b191d72 -> d54edf0)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b191d72 [SPARK-35056][SQL] Group exception messages in execution/streaming add d54edf0 [SPARK-35758][DOCS] Update the document about building Spark with Hadoop for Hadoop 2.x and 3.x No new revisions were added by this update. Summary of changes: docs/building-spark.md | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r48352 - /dev/spark/v3.0.3-rc1-bin/
Author: wuyi Date: Tue Jun 15 09:28:03 2021 New Revision: 48352 Log: Apache Spark v3.0.3-rc1 Added: dev/spark/v3.0.3-rc1-bin/ dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz (with props) dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.asc dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.sha512 dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz (with props) dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.asc dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.sha512 dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop2.7.tgz.asc dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop3.2.tgz.asc dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-without-hadoop.tgz (with props) dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-without-hadoop.tgz.asc dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.3-rc1-bin/spark-3.0.3.tgz (with props) dev/spark/v3.0.3-rc1-bin/spark-3.0.3.tgz.asc dev/spark/v3.0.3-rc1-bin/spark-3.0.3.tgz.sha512 Added: dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.asc == --- dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.asc (added) +++ dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.asc Tue Jun 15 09:28:03 2021 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEw3eq753BV3urvlGUePMbbdH/WUIFAmDIafEACgkQePMbbdH/ +WUJHjxAAxOFo3TCEXtcxo7OyYCxCmxZKh/LW/Vg4ThNKFi7aSp9wvsnZC+t5NeSB +qzDGKxGbxe9PbEKQVgYmLfBQGr0cdzHH9DILOX+7fK5mACHvTR8JyLADmn5ap+2v +l2pvWlSc3IaNlZgWzSV/QT/kIJsrEhWvEMguCnR3U5+CGtAydnwkpj+HYi2L827d +QhEh6dB3Y/7JtPyoN9JMLTwpfuv3io0bvpxP3bjfiGaUk6ssV5Q9L/q/cXv+7B6d +BNIW5qXaamjP0Afb/XVC39q79FSqjDTyGMZWtTCov2AOKjqrguqEiiyTmJQwyEQD +xQa5pXRVy85o31JcLTCW1NqmCqL3bvA+6255lPJd8+LfdkRP3GLPTrmpm6cRH0a+ +7DViiCVKBucVp0XEV3K3mFfvfTXjM5927zF2VpAi/yRX1y8ZWO9JlYoc2b0jpAqK +5d99Iny8xEb9PCafWZwwB3YenhD9zM80QltLaHcoOMarf54D/FFHHX6lNUi2ykyj +QZf9k9ta+lyItOFYkwrHV4iZNQaKv8S+y7o8XScOdugGhI1UUHF+x/vbZbWT/pZe +ZQ4k+PIcxVzQ4ZdPNAKknFCccatyBleaspV3bDLtNuUFFspaMbYCdtZcNPT9ZgAF +qbj1hKfYC37IZcRZ9W41TiiZXKLvXLXIteabAJwA+XeC+1X25EU= +=5B9V +-END PGP SIGNATURE- Added: dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.sha512 == --- dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.sha512 (added) +++ dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.sha512 Tue Jun 15 09:28:03 2021 @@ -0,0 +1,3 @@ +SparkR_3.0.3.tar.gz: 09AC8516 68EA8F46 6DC17B9C 7EE1F258 EB30132B 41433F8A + ECF49574 13C45884 1735039B 1E544418 303D766A BFB95749 + 0FC2EFC8 CCA2904A 8B280AB2 29E51F1C Added: dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.asc == --- dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.asc (added) +++ dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.asc Tue Jun 15 09:28:03 2021 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEw3eq753BV3urvlGUePMbbdH/WUIFAmDIafMACgkQePMbbdH/ +WUIE+Q//WnwTGNI42D3M4I0buKwh5IuaiSL+y4UoZZihLN0yzNkLKhziO28CQuQd +Dv1jQ/4DFi/9epqT+WHNpnlueyN6SXM62qRnMic3427BCUnb5Y7NrwFqO48/hSPC +7ctPJY4QKzWeJUJryoPmFbj+gBJyKSo+uGNg/SHyXiddS+Azt6pE7izL7FqLIiow +BB3ciIr7QluqPKeuifrik6NlsBpw80MQp92dQdU1hdt+gxj2H0OhCI1vmf+tGfI+ +l3b18xHJAqaKtzo1xDFjcDawjGrRUKCBQ44F1vj8wScOVrkzJKVvEuHP/k/3dgBx +YOJRuAj13I+tAwbU8ZM6ErQtfRQdO8wemIhyrExFpoQ5HXKGAjFOweFyvS8p43zM +tbVNkBA5N0gMX+CTGofFtV/zO86n/BW1vn5DeARLvTmtkbs2aiiB5gDJmsQ/Zzsg +jxLEiYsF35+oBEbTLVGFWZM5XpOGFu65mv1VyTCHHaJj8NeKREkfLlfqe7lxrxrs +LqcrT3p2i9nB/K4oLlY2Y4u8Z5a1Is/dDnIGNZ65r6QEVInWCnLRklcWmfIkoMI4 +BOqgOtt+YF9tCv8Nmr6bt0NKcHR861oZMU6swh+miayxvIaf+vItcYWkWhS20qle +WAN1/WhIFzCRH+I7heISmdxvu9b9ewNEHv8upJjpFSvTk+Kv3TE= +=F72o +-END PGP SIGNATURE- Added: dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.sha512 == --- dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.sha512 (added) +++ dev/spark/v3.0.3-rc1-bin/pyspark-
[spark] branch master updated (195090a -> b191d72)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 195090a [SPARK-35764][SQL] Assign pretty names to TimestampWithoutTZType add b191d72 [SPARK-35056][SQL] Group exception messages in execution/streaming No new revisions were added by this update. Summary of changes: .../spark/sql/errors/QueryExecutionErrors.scala| 95 +- .../streaming/CheckpointFileManager.scala | 16 ++-- .../sql/execution/streaming/FileStreamSink.scala | 21 + .../sql/execution/streaming/GroupStateImpl.scala | 15 ++-- .../sql/execution/streaming/HDFSMetadataLog.scala | 7 +- .../streaming/ManifestFileCommitProtocol.scala | 4 +- .../execution/streaming/MicroBatchExecution.scala | 4 +- .../execution/streaming/StreamingRelation.scala| 3 +- .../execution/streaming/statefulOperators.scala| 3 +- 9 files changed, 121 insertions(+), 47 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35764][SQL] Assign pretty names to TimestampWithoutTZType
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 195090a [SPARK-35764][SQL] Assign pretty names to TimestampWithoutTZType 195090a is described below commit 195090afcc8ed138336b353edc0a4db6f0f5f168 Author: Gengliang Wang AuthorDate: Tue Jun 15 12:15:13 2021 +0300 [SPARK-35764][SQL] Assign pretty names to TimestampWithoutTZType ### What changes were proposed in this pull request? In the PR, I propose to override the typeName() method in TimestampWithoutTZType, and assign it a name according to the ANSI SQL standard ![image](https://user-images.githubusercontent.com/1097932/122013859-2cf50680-cdf1-11eb-9fcd-0ec1b59fb5c0.png) ### Why are the changes needed? To improve Spark SQL user experience, and have readable types in error messages. ### Does this PR introduce _any_ user-facing change? No, the new timestamp type is not released yet. ### How was this patch tested? Unit test Closes #32915 from gengliangwang/typename. Authored-by: Gengliang Wang Signed-off-by: Max Gekk --- .../org/apache/spark/sql/types/TimestampWithoutTZType.scala | 2 ++ .../apache/spark/sql/catalyst/expressions/CastSuite.scala | 13 + 2 files changed, 15 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala index 558f5ee..856d549 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala @@ -48,6 +48,8 @@ class TimestampWithoutTZType private() extends AtomicType { */ override def defaultSize: Int = 8 + override def typeName: String = "timestamp without time zone" + private[spark] override def asNullable: TimestampWithoutTZType = this } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala index c268d52..910c757 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala @@ -1295,6 +1295,19 @@ abstract class AnsiCastSuiteBase extends CastSuiteBase { } } } + + test("disallow type conversions between Numeric types and Timestamp without time zone type") { +import DataTypeTestUtils.numericTypes +checkInvalidCastFromNumericType(TimestampWithoutTZType) +var errorMsg = "cannot cast bigint to timestamp without time zone" +verifyCastFailure(cast(Literal(0L), TimestampWithoutTZType), Some(errorMsg)) + +val timestampWithoutTZLiteral = Literal.create(LocalDateTime.now(), TimestampWithoutTZType) +errorMsg = "cannot cast timestamp without time zone to" +numericTypes.foreach { numericType => + verifyCastFailure(cast(timestampWithoutTZLiteral, numericType), Some(errorMsg)) +} + } } /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b9aeeb4 -> a50bd8f)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b9aeeb4 [SPARK-35683][PYTHON] Fix Index.difference to avoid collect 'other' to driver side add a50bd8f [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/Expression.scala | 2 +- .../catalyst/expressions/namedExpressions.scala| 5 - .../spark/sql/catalyst/expressions/subquery.scala | 7 - .../catalyst/expressions/CanonicalizeSuite.scala | 7 + .../execution/SubqueryAdaptiveBroadcastExec.scala | 6 + .../execution/aggregate/HashAggregateExec.scala| 4 +- .../org/apache/spark/sql/execution/subquery.scala | 10 +- .../approved-plans-v1_4/q23b/explain.txt | 323 - .../approved-plans-v1_4/q23b/simplified.txt| 18 +- .../approved-plans-v1_4/q44.sf100/explain.txt | 231 +++ .../approved-plans-v1_4/q44.sf100/simplified.txt | 13 +- .../approved-plans-v1_4/q44/explain.txt| 231 +++ .../approved-plans-v1_4/q44/simplified.txt | 13 +- .../approved-plans-v1_4/q58.sf100/explain.txt | 394 --- .../approved-plans-v1_4/q58.sf100/simplified.txt | 40 +- .../approved-plans-v1_4/q58/explain.txt| 368 -- .../approved-plans-v1_4/q58/simplified.txt | 40 +- .../approved-plans-v2_7/q14a.sf100/explain.txt | 770 ++--- .../approved-plans-v2_7/q14a.sf100/simplified.txt | 114 +-- .../approved-plans-v2_7/q14a/explain.txt | 770 ++--- .../approved-plans-v2_7/q14a/simplified.txt| 114 +-- 21 files changed, 1092 insertions(+), 2388 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org