[spark] branch master updated (aaa8a80 -> 4530760)

2021-06-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aaa8a80  [SPARK-35613][CORE][SQL] Cache commonly occurring strings in 
SQLMetrics, JSONProtocol and AccumulatorV2 classes
 add 4530760  [SPARK-35774][SQL] Parse any year-month interval types in SQL

No new revisions were added by this update.

Summary of changes:
 .../main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 2 +-
 .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala  | 9 +++--
 .../test/scala/org/apache/spark/sql/types/DataTypeSuite.scala| 2 +-
 3 files changed, 9 insertions(+), 4 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes

2021-06-15 Thread mridulm80
This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aaa8a80  [SPARK-35613][CORE][SQL] Cache commonly occurring strings in 
SQLMetrics, JSONProtocol and AccumulatorV2 classes
aaa8a80 is described below

commit aaa8a80c9d3426107de5873b4391600701121385
Author: Venkata krishnan Sowrirajan 
AuthorDate: Tue Jun 15 22:02:19 2021 -0500

[SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, 
JSONProtocol and AccumulatorV2 classes

### What changes were proposed in this pull request?
Cache commonly occurring duplicate Some objects in SQLMetrics by using a 
Guava cache and reusing the existing Guava String Interner to avoid duplicate 
strings in JSONProtocol. Also with AccumulatorV2 we have seen lot of Some(-1L) 
and Some(0L) occurrences in a heap dump that is naively interned by having 
reusing a already constructed Some(-1L) and Some(0L)

To give some context on the impact and the garbage got accumulated, below 
are the details of the complex spark job which we troubleshooted and figured 
out the bottlenecks. **tl;dr - In short, major issues were the accumulation of 
duplicate objects mainly from SQLMetrics.**

Greater than 25% of the 40G driver heap filled with (a very large number 
of) **duplicate**, immutable objects.

1. Very large number of **duplicate** immutable objects.

- Type of metric is represented by `'scala.Some("sql")'` - which is created 
for each metric.
- Fixing this reduced memory usage from 4GB to a few bytes.

2. `scala.Some(0)` and `scala.Some(-1)` are very common metric values 
(typically to indicate absence of metric)

- Individually the values are all immutable, but spark sql was creating a 
new instance each time.
- Intern'ing these resulted in saving ~4.5GB for a 40G heap.

3. Using string interpolation for metric names.

- Interpolation results in creation of a new string object.
- We end up with a very large number of metric names - though the number of 
unique strings is miniscule.
- ~7.5 GB in the 40 GB heap : which went down to a few KB's when fixed.

### Why are the changes needed?
To reduce overall driver memory footprint which eventually reduces the Full 
GC pauses.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Since these are memory related optimizations, unit tests are not added. 
These changes are added in our internal platform which made it possible for one 
of the complex spark job continuously failing to succeed along with other set 
of optimizations.

Closes #32754 from venkata91/SPARK-35613.

Authored-by: Venkata krishnan Sowrirajan 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../scala/org/apache/spark/status/LiveEntity.scala | 16 +---
 .../org/apache/spark/util/AccumulatorV2.scala  | 19 +-
 .../scala/org/apache/spark/util/JsonProtocol.scala |  9 ---
 .../main/scala/org/apache/spark/util/Utils.scala   |  8 ++
 .../spark/sql/execution/metric/SQLMetrics.scala| 30 +++---
 5 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/status/LiveEntity.scala 
b/core/src/main/scala/org/apache/spark/status/LiveEntity.scala
index 5af76e9..fc5fc32 100644
--- a/core/src/main/scala/org/apache/spark/status/LiveEntity.scala
+++ b/core/src/main/scala/org/apache/spark/status/LiveEntity.scala
@@ -24,8 +24,6 @@ import scala.collection.JavaConverters._
 import scala.collection.immutable.{HashSet, TreeSet}
 import scala.collection.mutable.HashMap
 
-import com.google.common.collect.Interners
-
 import org.apache.spark.JobExecutionStatus
 import org.apache.spark.executor.{ExecutorMetrics, TaskMetrics}
 import org.apache.spark.resource.{ExecutorResourceRequest, 
ResourceInformation, ResourceProfile, TaskResourceRequest}
@@ -34,6 +32,7 @@ import org.apache.spark.status.api.v1
 import org.apache.spark.storage.{RDDInfo, StorageLevel}
 import org.apache.spark.ui.SparkUI
 import org.apache.spark.util.{AccumulatorContext, Utils}
+import org.apache.spark.util.Utils.weakIntern
 import org.apache.spark.util.collection.OpenHashSet
 
 /**
@@ -511,8 +510,6 @@ private class LiveStage(var info: StageInfo) extends 
LiveEntity {
  */
 private class LiveRDDPartition(val blockName: String, rddLevel: StorageLevel) {
 
-  import LiveEntityHelpers._
-
   // Pointers used by RDDPartitionSeq.
   @volatile var prev: LiveRDDPartition = null
   @volatile var next: LiveRDDPartition = null
@@ -543,8 +540,6 @@ private class LiveRDDPartition(val blockName: String, 
rddLevel: StorageLevel) {
 
 private class LiveRDDDistribution(exec: LiveExecutor) {
 
-  import LiveEntityHelpers._
-
   val executorId = exec.executorId

[spark] branch master updated (5c96d64 -> b08cf6e)

2021-06-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5c96d64  [SPARK-35707][ML] optimize sparse GEMM by skipping bound 
checking
 add b08cf6e  [SPARK-35203][SQL] Improve Repartition statistics estimation

No new revisions were added by this update.

Summary of changes:
 .../logical/statsEstimation/BasicStatsPlanVisitor.scala |  4 ++--
 .../SizeInBytesOnlyStatsPlanVisitor.scala   |  4 ++--
 .../statsEstimation/BasicStatsEstimationSuite.scala | 17 -
 3 files changed, 16 insertions(+), 9 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35707][ML] optimize sparse GEMM by skipping bound checking

2021-06-15 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5c96d64  [SPARK-35707][ML] optimize sparse GEMM by skipping bound 
checking
5c96d64 is described below

commit 5c96d643eeb4ca1ad7e4e9cc711971203fcacc6c
Author: Ruifeng Zheng 
AuthorDate: Wed Jun 16 08:57:27 2021 +0800

[SPARK-35707][ML] optimize sparse GEMM by skipping bound checking

### What changes were proposed in this pull request?
Sparse gemm use mothod `DenseMatrix.apply` to access the values, which can 
be optimized by skipping checking the bound and `isTransposed`

```
  override def apply(i: Int, j: Int): Double = values(index(i, j))

  private[ml] def index(i: Int, j: Int): Int = {
require(i >= 0 && i < numRows, s"Expected 0 <= i < $numRows, got i = 
$i.")
require(j >= 0 && j < numCols, s"Expected 0 <= j < $numCols, got j = 
$j.")
if (!isTransposed) i + numRows * j else j + numCols * i
  }

```

### Why are the changes needed?
to improve performance, about 15% faster in the designed case

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuite and additional performance test

Closes #32857 from zhengruifeng/gemm_opt_index.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala | 4 ++--
 mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala 
b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
index 0bc8b2f..d1255de 100644
--- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
+++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
@@ -480,7 +480,7 @@ private[spark] object BLAS extends Serializable {
 val indEnd = AcolPtrs(rowCounterForA + 1)
 var sum = 0.0
 while (i < indEnd) {
-  sum += Avals(i) * B(ArowIndices(i), colCounterForB)
+  sum += Avals(i) * Bvals(colCounterForB + nB * ArowIndices(i))
   i += 1
 }
 val Cindex = Cstart + rowCounterForA
@@ -522,7 +522,7 @@ private[spark] object BLAS extends Serializable {
   while (colCounterForA < kA) {
 var i = AcolPtrs(colCounterForA)
 val indEnd = AcolPtrs(colCounterForA + 1)
-val Bval = B(colCounterForA, colCounterForB) * alpha
+val Bval = Bvals(colCounterForB + nB * colCounterForA) * alpha
 while (i < indEnd) {
   Cvals(Cstart + ArowIndices(i)) += Avals(i) * Bval
   i += 1
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala
index e38cfe4..5cbec53 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala
@@ -462,7 +462,7 @@ private[spark] object BLAS extends Serializable with 
Logging {
 val indEnd = AcolPtrs(rowCounterForA + 1)
 var sum = 0.0
 while (i < indEnd) {
-  sum += Avals(i) * B(ArowIndices(i), colCounterForB)
+  sum += Avals(i) * Bvals(colCounterForB + nB * ArowIndices(i))
   i += 1
 }
 val Cindex = Cstart + rowCounterForA
@@ -504,7 +504,7 @@ private[spark] object BLAS extends Serializable with 
Logging {
   while (colCounterForA < kA) {
 var i = AcolPtrs(colCounterForA)
 val indEnd = AcolPtrs(colCounterForA + 1)
-val Bval = B(colCounterForA, colCounterForB) * alpha
+val Bval = Bvals(colCounterForB + nB * colCounterForA) * alpha
 while (i < indEnd) {
   Cvals(Cstart + ArowIndices(i)) += Avals(i) * Bval
   i += 1

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35666][ML] gemv skip array shape checking

2021-06-15 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2802ac3  [SPARK-35666][ML] gemv skip array shape checking
2802ac3 is described below

commit 2802ac321f7378c8a9113338c9872b8fd332de6b
Author: Ruifeng Zheng 
AuthorDate: Wed Jun 16 08:54:34 2021 +0800

[SPARK-35666][ML] gemv skip array shape checking

### What changes were proposed in this pull request?
In existing impls, it is common case that the vector/matrix need to be 
sliced/copied just due to shape match.
which makes the logic complex and introduce extra costing of slicing & 
copying.

### Why are the changes needed?
1, avoid slicing and copying due to shape checking;
2, simpify the usages;

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #32805 from zhengruifeng/new_blas_func_for_agg.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 .../scala/org/apache/spark/ml/linalg/BLAS.scala| 60 --
 .../ml/optim/aggregator/AFTBlockAggregator.scala   | 24 +++--
 .../aggregator/BinaryLogisticBlockAggregator.scala | 34 ++--
 .../ml/optim/aggregator/HingeBlockAggregator.scala | 34 ++--
 .../ml/optim/aggregator/HuberBlockAggregator.scala | 24 +++--
 .../aggregator/LeastSquaresBlockAggregator.scala   | 18 +++
 6 files changed, 84 insertions(+), 110 deletions(-)

diff --git a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala 
b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
index 5a6bee3..0bc8b2f 100644
--- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
+++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
@@ -536,6 +536,32 @@ private[spark] object BLAS extends Serializable {
   }
 
   /**
+   * y[0: A.numRows] := alpha * A * x[0: A.numCols] + beta * y[0: A.numRows]
+   */
+  def gemv(
+  alpha: Double,
+  A: Matrix,
+  x: Array[Double],
+  beta: Double,
+  y: Array[Double]): Unit = {
+require(A.numCols <= x.length,
+  s"The columns of A don't match the number of elements of x. A: 
${A.numCols}, x: ${x.length}")
+require(A.numRows <= y.length,
+  s"The rows of A don't match the number of elements of y. A: 
${A.numRows}, y:${y.length}")
+if (alpha == 0.0 && beta == 1.0) {
+  // gemv: alpha is equal to 0 and beta is equal to 1. Returning y.
+  return
+} else if (alpha == 0.0) {
+  getBLAS(A.numRows).dscal(A.numRows, beta, y, 1)
+} else {
+  A match {
+case smA: SparseMatrix => gemvImpl(alpha, smA, x, beta, y)
+case dmA: DenseMatrix => gemvImpl(alpha, dmA, x, beta, y)
+  }
+}
+  }
+
+  /**
* y := alpha * A * x + beta * y
* @param alpha a scalar to scale the multiplication A * x.
* @param A the matrix A that will be left multiplied to x. Size of m x n.
@@ -585,11 +611,24 @@ private[spark] object BLAS extends Serializable {
   x: DenseVector,
   beta: Double,
   y: DenseVector): Unit = {
+gemvImpl(alpha, A, x.values, beta, y.values)
+  }
+
+  /**
+   * y[0: A.numRows] := alpha * A * x[0: A.numCols] + beta * y[0: A.numRows]
+   * For `DenseMatrix` A.
+   */
+  private def gemvImpl(
+  alpha: Double,
+  A: DenseMatrix,
+  xValues: Array[Double],
+  beta: Double,
+  yValues: Array[Double]): Unit = {
 val tStrA = if (A.isTransposed) "T" else "N"
 val mA = if (!A.isTransposed) A.numRows else A.numCols
 val nA = if (!A.isTransposed) A.numCols else A.numRows
-nativeBLAS.dgemv(tStrA, mA, nA, alpha, A.values, mA, x.values, 1, beta,
-  y.values, 1)
+nativeBLAS.dgemv(tStrA, mA, nA, alpha, A.values, mA, xValues, 1, beta,
+  yValues, 1)
   }
 
   /**
@@ -715,8 +754,19 @@ private[spark] object BLAS extends Serializable {
   x: DenseVector,
   beta: Double,
   y: DenseVector): Unit = {
-val xValues = x.values
-val yValues = y.values
+gemvImpl(alpha, A, x.values, beta, y.values)
+  }
+
+  /**
+   * y[0: A.numRows] := alpha * A * x[0: A.numCols] + beta * y[0: A.numRows]
+   * For `SparseMatrix` A.
+   */
+  private def gemvImpl(
+  alpha: Double,
+  A: SparseMatrix,
+  xValues: Array[Double],
+  beta: Double,
+  yValues: Array[Double]): Unit = {
 val mA: Int = A.numRows
 val nA: Int = A.numCols
 
@@ -738,7 +788,7 @@ private[spark] object BLAS extends Serializable {
 rowCounter += 1
   }
 } else {
-  if (beta != 1.0) scal(beta, y)
+  if (beta != 1.0) getBLAS(mA).dscal(mA, beta, yValues, 1)
   // Perform matrix-vector multiplication and add to y
   var colCounterForA = 0
   while (colCounterForA < nA) {
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/optim/aggr

[spark] branch master updated (ac228d4 -> 11e96dc)

2021-06-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ac228d4  [SPARK-35691][CORE] addFile/addJar/addDirectory should put 
CanonicalFile
 add 11e96dc  [SPARK-35669][SQL] Quote the pushed column name only when 
nested column predicate pushdown is enabled

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/sources/filters.scala |  5 ++--
 .../execution/datasources/DataSourceStrategy.scala | 31 +-
 .../spark/sql/FileBasedDataSourceSuite.scala   | 10 +++
 3 files changed, 31 insertions(+), 15 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (9709ee5 -> ac228d4)

2021-06-15 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9709ee5  [SPARK-35760][SQL] Fix the max rows check for broadcast 
exchange
 add ac228d4  [SPARK-35691][CORE] addFile/addJar/addDirectory should put 
CanonicalFile

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/rpc/RpcEnv.scala   |  3 +-
 .../spark/rpc/netty/NettyStreamManager.scala   | 12 
 .../main/scala/org/apache/spark/util/Utils.scala   |  2 +-
 .../scala/org/apache/spark/SparkContextSuite.scala | 32 ++
 .../scala/org/apache/spark/rpc/RpcEnvSuite.scala   |  9 ++
 5 files changed, 51 insertions(+), 7 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (864ff67 -> 9709ee5)

2021-06-15 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 864ff67  [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 
profile due to EOL and CVEs
 add 9709ee5  [SPARK-35760][SQL] Fix the max rows check for broadcast 
exchange

No new revisions were added by this update.

Summary of changes:
 .../execution/exchange/BroadcastExchangeExec.scala | 25 +++---
 1 file changed, 17 insertions(+), 8 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs

2021-06-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 864ff67  [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 
profile due to EOL and CVEs
864ff67 is described below

commit 864ff677469172ca24fdef69b7d3a3482c688f47
Author: Sumeet Gajjar 
AuthorDate: Tue Jun 15 14:43:30 2021 -0700

[SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due 
to EOL and CVEs

### What changes were proposed in this pull request?

Remove commons-httpclient as a direct dependency for Hadoop-3.2 profile.
Hadoop-2.7 profile distribution still has it, hadoop-client has a compile 
dependency on commons-httpclient, thus we cannot remove it for Hadoop-2.7 
profile.
```
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.7.4:compile
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.7.4:compile
[INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
```

### Why are the changes needed?

Spark is pulling in commons-httpclient as a dependency directly. 
commons-httpclient went EOL years ago and there are most likely CVEs not being 
reported against it, thus we should remove it.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing unittests
- Checked the dependency tree before and after introducing the changes

Before:
```
./build/mvn dependency:tree -Phadoop-3.2 | grep -i "commons-httpclient"
Using `mvn` from path: /usr/bin/mvn
[INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:provided
```

After
```
./build/mvn dependency:tree | grep -i "commons-httpclient"
Using `mvn` from path: 
/Users/sumeet.gajjar/cloudera/upstream-spark/build/apache-maven-3.6.3/bin/mvn
```

P.S. Reopening this since [spark 
upgraded](https://github.com/apache/spark/commit/463daabd5afd9abfb8027ebcb2e608f169ad1e40)
 its `hive.version` to `2.3.9` which does not have a dependency on 
`commons-httpclient`.

Closes #32912 from sumeetgajjar/SPARK-35429.

Authored-by: Sumeet Gajjar 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 |  1 -
 pom.xml | 11 ---
 sql/hive/pom.xml|  4 
 3 files changed, 16 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 
b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
index 8b79d7e5..3482dd2 100644
--- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
@@ -35,7 +35,6 @@ commons-compiler/3.1.4//commons-compiler-3.1.4.jar
 commons-compress/1.20//commons-compress-1.20.jar
 commons-crypto/1.1.0//commons-crypto-1.1.0.jar
 commons-dbcp/1.4//commons-dbcp-1.4.jar
-commons-httpclient/3.1//commons-httpclient-3.1.jar
 commons-io/2.8.0//commons-io-2.8.0.jar
 commons-lang/2.6//commons-lang-2.6.jar
 commons-lang3/3.12.0//commons-lang3-3.12.0.jar
diff --git a/pom.xml b/pom.xml
index 82a047f..ca038b2 100644
--- a/pom.xml
+++ b/pom.xml
@@ -157,8 +157,6 @@
 
 4.5.13
 4.4.14
-
-3.1
 3.4.1
 
 3.2.2
@@ -593,11 +591,6 @@
 ${jsr305.version}
   
   
-commons-httpclient
-commons-httpclient
-${httpclient.classic.version}
-  
-  
 org.apache.httpcomponents
 httpclient
 ${commons.httpclient.version}
@@ -1811,10 +1804,6 @@
 commons-codec
   
   
-commons-httpclient
-commons-httpclient
-  
-  
 org.apache.avro
 avro-mapred
   
diff --git a/sql/hive/pom.xml b/sql/hive/pom.xml
index 729d3f4..67a9854 100644
--- a/sql/hive/pom.xml
+++ b/sql/hive/pom.xml
@@ -134,10 +134,6 @@
   avro-mapred
 
 
-  commons-httpclient
-  commons-httpclient
-
-
   org.apache.httpcomponents
   httpclient
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35680][SQL] Add fields to `YearMonthIntervalType`

2021-06-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 61ce8f7  [SPARK-35680][SQL] Add fields to `YearMonthIntervalType`
61ce8f7 is described below

commit 61ce8f764982306f2c7a8b2b3dfe22963b49f2d5
Author: Max Gekk 
AuthorDate: Tue Jun 15 23:08:12 2021 +0300

[SPARK-35680][SQL] Add fields to `YearMonthIntervalType`

### What changes were proposed in this pull request?
Extend `YearMonthIntervalType` to support interval fields. Valid interval 
field values:
- 0 (YEAR)
- 1 (MONTH)

After the changes, the following year-month interval types are supported:
1. `YearMonthIntervalType(0, 0)` or `YearMonthIntervalType(YEAR, YEAR)`
2. `YearMonthIntervalType(0, 1)` or `YearMonthIntervalType(YEAR, MONTH)`. 
**It is the default one**.
3. `YearMonthIntervalType(1, 1)` or `YearMonthIntervalType(MONTH, MONTH)`

Closes #32825

### Why are the changes needed?
In the current implementation, Spark supports only `interval year to month` 
but the SQL standard allows to specify the start and end fields. The changes 
will allow to follow ANSI SQL standard more precisely.

### Does this PR introduce _any_ user-facing change?
Yes but `YearMonthIntervalType` has not been released yet.

### How was this patch tested?
By existing test suites.

Closes #32909 from MaxGekk/add-fields-to-YearMonthIntervalType.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/expressions/UnsafeRow.java  | 11 ++---
 .../java/org/apache/spark/sql/types/DataTypes.java | 19 +---
 .../sql/catalyst/CatalystTypeConverters.scala  |  3 +-
 .../apache/spark/sql/catalyst/InternalRow.scala|  4 +-
 .../spark/sql/catalyst/JavaTypeInference.scala |  2 +-
 .../spark/sql/catalyst/ScalaReflection.scala   | 10 ++---
 .../spark/sql/catalyst/SerializerBuildHelper.scala |  2 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala | 18 
 .../apache/spark/sql/catalyst/dsl/package.scala|  7 ++-
 .../spark/sql/catalyst/encoders/RowEncoder.scala   |  6 +--
 .../spark/sql/catalyst/expressions/Cast.scala  | 35 +--
 .../expressions/InterpretedUnsafeProjection.scala  |  2 +-
 .../catalyst/expressions/SpecificInternalRow.scala |  2 +-
 .../catalyst/expressions/aggregate/Average.scala   |  6 +--
 .../sql/catalyst/expressions/aggregate/Sum.scala   |  2 +-
 .../sql/catalyst/expressions/arithmetic.scala  | 10 ++---
 .../expressions/codegen/CodeGenerator.scala|  4 +-
 .../expressions/collectionOperations.scala |  8 ++--
 .../catalyst/expressions/datetimeExpressions.scala |  2 +-
 .../spark/sql/catalyst/expressions/hash.scala  |  2 +-
 .../catalyst/expressions/intervalExpressions.scala | 10 ++---
 .../spark/sql/catalyst/expressions/literals.scala  | 16 ---
 .../catalyst/expressions/windowExpressions.scala   |  4 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  6 ++-
 .../spark/sql/catalyst/util/IntervalUtils.scala| 17 ++-
 .../apache/spark/sql/catalyst/util/TypeUtils.scala |  4 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  | 11 +
 .../org/apache/spark/sql/types/DataType.scala  |  4 +-
 .../spark/sql/types/YearMonthIntervalType.scala| 52 ++
 .../org/apache/spark/sql/util/ArrowUtils.scala |  4 +-
 .../org/apache/spark/sql/RandomDataGenerator.scala |  2 +-
 .../spark/sql/RandomDataGeneratorSuite.scala   |  4 +-
 .../sql/catalyst/CatalystTypeConvertersSuite.scala |  2 +-
 .../sql/catalyst/encoders/RowEncoderSuite.scala| 18 
 .../expressions/ArithmeticExpressionSuite.scala| 10 ++---
 .../spark/sql/catalyst/expressions/CastSuite.scala | 30 ++---
 .../sql/catalyst/expressions/CastSuiteBase.scala   |  8 ++--
 .../expressions/DateExpressionsSuite.scala | 19 
 .../expressions/HashExpressionsSuite.scala |  2 +-
 .../expressions/IntervalExpressionsSuite.scala | 24 ++
 .../expressions/LiteralExpressionSuite.scala   |  6 +--
 .../catalyst/expressions/LiteralGenerator.scala|  4 +-
 .../expressions/MutableProjectionSuite.scala   | 14 +++---
 .../optimizer/PushFoldableIntoBranchesSuite.scala  | 16 +++
 .../sql/catalyst/parser/DataTypeParserSuite.scala  |  2 +-
 .../sql/catalyst/util/IntervalUtilsSuite.scala |  5 ++-
 .../org/apache/spark/sql/types/DataTypeSuite.scala |  6 +--
 .../apache/spark/sql/types/DataTypeTestUtils.scala | 18 
 .../apache/spark/sql/util/ArrowUtilsSuite.scala|  2 +-
 .../apache/spark/sql/execution/HiveResult.scala|  4 +-
 .../sql/execution/aggregate/HashMapGenerator.scala |  3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  4 +-
 .../spark/sql/execution/arrow/ArrowWriter.scala|  2 +-
 .../sql/execution/columnar/Column

[spark] branch branch-3.0 updated (f1711af -> 0ef0f4f)

2021-06-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f1711af  Preparing development version 3.0.4-SNAPSHOT
 add 0ef0f4f  [SPARK-35767][SQL] Avoid executing child plan twice in 
CoalesceExec

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/basicPhysicalOperators.scala  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec

2021-06-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new d8ea6bc  [SPARK-35767][SQL] Avoid executing child plan twice in 
CoalesceExec
d8ea6bc is described below

commit d8ea6bcfad8d82f6886c7f538481ef2338fc04be
Author: Andy Grove 
AuthorDate: Tue Jun 15 11:59:21 2021 -0700

[SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec

### What changes were proposed in this pull request?

`CoalesceExec` needlessly calls `child.execute` twice when it could just 
call it once and re-use the results. This only happens when `numPartitions == 
1`.

### Why are the changes needed?

It is more efficient to execute the child plan once rather than twice.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

There are no functional changes. This is just a performance optimization, 
so the existing tests should be sufficient to catch any regressions.

Closes #32920 from andygrove/coalesce-exec-executes-twice.

Authored-by: Andy Grove 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1012967ace4c7bd4e5a6f59c6ea6eec45871f292)
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/execution/basicPhysicalOperators.scala  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
index d651132..4fcd67b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
@@ -690,12 +690,13 @@ case class CoalesceExec(numPartitions: Int, child: 
SparkPlan) extends UnaryExecN
   }
 
   protected override def doExecute(): RDD[InternalRow] = {
-if (numPartitions == 1 && child.execute().getNumPartitions < 1) {
+val rdd = child.execute()
+if (numPartitions == 1 && rdd.getNumPartitions < 1) {
   // Make sure we don't output an RDD with 0 partitions, when claiming 
that we have a
   // `SinglePartition`.
   new CoalesceExec.EmptyRDDWithPartitions(sparkContext, numPartitions)
 } else {
-  child.execute().coalesce(numPartitions, shuffle = false)
+  rdd.coalesce(numPartitions, shuffle = false)
 }
   }
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec

2021-06-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1012967  [SPARK-35767][SQL] Avoid executing child plan twice in 
CoalesceExec
1012967 is described below

commit 1012967ace4c7bd4e5a6f59c6ea6eec45871f292
Author: Andy Grove 
AuthorDate: Tue Jun 15 11:59:21 2021 -0700

[SPARK-35767][SQL] Avoid executing child plan twice in CoalesceExec

### What changes were proposed in this pull request?

`CoalesceExec` needlessly calls `child.execute` twice when it could just 
call it once and re-use the results. This only happens when `numPartitions == 
1`.

### Why are the changes needed?

It is more efficient to execute the child plan once rather than twice.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

There are no functional changes. This is just a performance optimization, 
so the existing tests should be sufficient to catch any regressions.

Closes #32920 from andygrove/coalesce-exec-executes-twice.

Authored-by: Andy Grove 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/execution/basicPhysicalOperators.scala  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
index b537040..8c51cde 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
@@ -724,12 +724,13 @@ case class CoalesceExec(numPartitions: Int, child: 
SparkPlan) extends UnaryExecN
   }
 
   protected override def doExecute(): RDD[InternalRow] = {
-if (numPartitions == 1 && child.execute().getNumPartitions < 1) {
+val rdd = child.execute()
+if (numPartitions == 1 && rdd.getNumPartitions < 1) {
   // Make sure we don't output an RDD with 0 partitions, when claiming 
that we have a
   // `SinglePartition`.
   new CoalesceExec.EmptyRDDWithPartitions(sparkContext, numPartitions)
 } else {
-  child.execute().coalesce(numPartitions, shuffle = false)
+  rdd.coalesce(numPartitions, shuffle = false)
 }
   }
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c382d40 -> 8a02f3a)

2021-06-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c382d40  [SPARK-35766][SQL][TESTS] Break down CastSuite/AnsiCastSuite 
into multiple files
 add 8a02f3a  [SPARK-35129][SQL] Construct year-month interval column from 
integral fields

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  1 +
 .../catalyst/expressions/intervalExpressions.scala | 53 +++
 .../expressions/IntervalExpressionsSuite.scala | 28 ++
 .../sql-functions/sql-expression-schema.md |  3 +-
 .../test/resources/sql-tests/inputs/interval.sql   |  9 
 .../sql-tests/results/ansi/interval.sql.out| 60 +-
 .../resources/sql-tests/results/interval.sql.out   | 60 +-
 7 files changed, 211 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b74260f -> c382d40)

2021-06-15 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b74260f  [SPARK-35765][SQL] Distinct aggs are not duplicate sensitive
 add c382d40  [SPARK-35766][SQL][TESTS] Break down CastSuite/AnsiCastSuite 
into multiple files

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/AnsiCastSuiteBase.scala   |  481 +++
 .../spark/sql/catalyst/expressions/CastSuite.scala | 1357 +---
 .../sql/catalyst/expressions/CastSuiteBase.scala   |  930 ++
 3 files changed, 1412 insertions(+), 1356 deletions(-)
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AnsiCastSuiteBase.scala
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d54edf0 -> b74260f)

2021-06-15 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d54edf0  [SPARK-35758][DOCS] Update the document about building Spark 
with Hadoop for Hadoop 2.x and 3.x
 add b74260f  [SPARK-35765][SQL] Distinct aggs are not duplicate sensitive

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala| 6 +-
 .../sql/catalyst/optimizer/RemoveRedundantAggregatesSuite.scala | 6 +++---
 2 files changed, 8 insertions(+), 4 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-35758][DOCS] Update the document about building Spark with Hadoop for Hadoop 2.x and 3.x

2021-06-15 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new f35df10  [SPARK-35758][DOCS] Update the document about building Spark 
with Hadoop for Hadoop 2.x and 3.x
f35df10 is described below

commit f35df10e2bab01c492bf627c7b12ce076a5da01e
Author: Kousuke Saruta 
AuthorDate: Tue Jun 15 20:19:50 2021 +0900

[SPARK-35758][DOCS] Update the document about building Spark with Hadoop 
for Hadoop 2.x and 3.x

### What changes were proposed in this pull request?

This PR updates the document about building Spark with Hadoop for Hadoop 
3.x and Hadoop 3.2.

### Why are the changes needed?

The document says about how to build like as follows:
```
./build/mvn -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean package
```

But this command fails because the default build settings are for Hadoop 
3.x.
So, we need to modify the command example.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed both of these commands successfully finished.
```
./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests package
./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -DskipTests package
```

I also built the document and confirmed the result.
This is before:

![hadoop-version-before](https://user-images.githubusercontent.com/4736016/122016157-bf020c80-cdfb-11eb-8e74-4840861f8541.png)

And this is after:

![hadoop-version-after](https://user-images.githubusercontent.com/4736016/122016188-c75a4780-cdfb-11eb-8427-2f0765e6ff7a.png)

Closes #32917 from sarutak/fix-build-doc-with-hadoop.

Authored-by: Kousuke Saruta 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit d54edf0bde33c0e93cf33cb41d6be13eb32b6848)
Signed-off-by: Hyukjin Kwon 
---
 docs/building-spark.md | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/building-spark.md b/docs/building-spark.md
index 5106f2a..286b48e 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -77,7 +77,11 @@ from `hadoop.version`.
 
 Example:
 
-./build/mvn -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean package
+./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests clean package
+
+If you want to build with Hadoop 2.x, enable hadoop-2.7 profile:
+
+./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean 
package
 
 ## Building With Hive and JDBC Support
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b191d72 -> d54edf0)

2021-06-15 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b191d72  [SPARK-35056][SQL] Group exception messages in 
execution/streaming
 add d54edf0  [SPARK-35758][DOCS] Update the document about building Spark 
with Hadoop for Hadoop 2.x and 3.x

No new revisions were added by this update.

Summary of changes:
 docs/building-spark.md | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r48352 - /dev/spark/v3.0.3-rc1-bin/

2021-06-15 Thread wuyi
Author: wuyi
Date: Tue Jun 15 09:28:03 2021
New Revision: 48352

Log:
Apache Spark v3.0.3-rc1

Added:
dev/spark/v3.0.3-rc1-bin/
dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz   (with props)
dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.asc
dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.sha512
dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz   (with props)
dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.asc
dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.sha512
dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop2.7.tgz.asc
dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop3.2.tgz.asc
dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-without-hadoop.tgz   (with props)
dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-without-hadoop.tgz.asc
dev/spark/v3.0.3-rc1-bin/spark-3.0.3-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.3-rc1-bin/spark-3.0.3.tgz   (with props)
dev/spark/v3.0.3-rc1-bin/spark-3.0.3.tgz.asc
dev/spark/v3.0.3-rc1-bin/spark-3.0.3.tgz.sha512

Added: dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.asc
==
--- dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.asc (added)
+++ dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.asc Tue Jun 15 09:28:03 2021
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEw3eq753BV3urvlGUePMbbdH/WUIFAmDIafEACgkQePMbbdH/
+WUJHjxAAxOFo3TCEXtcxo7OyYCxCmxZKh/LW/Vg4ThNKFi7aSp9wvsnZC+t5NeSB
+qzDGKxGbxe9PbEKQVgYmLfBQGr0cdzHH9DILOX+7fK5mACHvTR8JyLADmn5ap+2v
+l2pvWlSc3IaNlZgWzSV/QT/kIJsrEhWvEMguCnR3U5+CGtAydnwkpj+HYi2L827d
+QhEh6dB3Y/7JtPyoN9JMLTwpfuv3io0bvpxP3bjfiGaUk6ssV5Q9L/q/cXv+7B6d
+BNIW5qXaamjP0Afb/XVC39q79FSqjDTyGMZWtTCov2AOKjqrguqEiiyTmJQwyEQD
+xQa5pXRVy85o31JcLTCW1NqmCqL3bvA+6255lPJd8+LfdkRP3GLPTrmpm6cRH0a+
+7DViiCVKBucVp0XEV3K3mFfvfTXjM5927zF2VpAi/yRX1y8ZWO9JlYoc2b0jpAqK
+5d99Iny8xEb9PCafWZwwB3YenhD9zM80QltLaHcoOMarf54D/FFHHX6lNUi2ykyj
+QZf9k9ta+lyItOFYkwrHV4iZNQaKv8S+y7o8XScOdugGhI1UUHF+x/vbZbWT/pZe
+ZQ4k+PIcxVzQ4ZdPNAKknFCccatyBleaspV3bDLtNuUFFspaMbYCdtZcNPT9ZgAF
+qbj1hKfYC37IZcRZ9W41TiiZXKLvXLXIteabAJwA+XeC+1X25EU=
+=5B9V
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.sha512
==
--- dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.sha512 (added)
+++ dev/spark/v3.0.3-rc1-bin/SparkR_3.0.3.tar.gz.sha512 Tue Jun 15 09:28:03 2021
@@ -0,0 +1,3 @@
+SparkR_3.0.3.tar.gz: 09AC8516 68EA8F46 6DC17B9C 7EE1F258 EB30132B 41433F8A
+ ECF49574 13C45884 1735039B 1E544418 303D766A BFB95749
+ 0FC2EFC8 CCA2904A 8B280AB2 29E51F1C

Added: dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.asc
==
--- dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.asc (added)
+++ dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.asc Tue Jun 15 09:28:03 2021
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEw3eq753BV3urvlGUePMbbdH/WUIFAmDIafMACgkQePMbbdH/
+WUIE+Q//WnwTGNI42D3M4I0buKwh5IuaiSL+y4UoZZihLN0yzNkLKhziO28CQuQd
+Dv1jQ/4DFi/9epqT+WHNpnlueyN6SXM62qRnMic3427BCUnb5Y7NrwFqO48/hSPC
+7ctPJY4QKzWeJUJryoPmFbj+gBJyKSo+uGNg/SHyXiddS+Azt6pE7izL7FqLIiow
+BB3ciIr7QluqPKeuifrik6NlsBpw80MQp92dQdU1hdt+gxj2H0OhCI1vmf+tGfI+
+l3b18xHJAqaKtzo1xDFjcDawjGrRUKCBQ44F1vj8wScOVrkzJKVvEuHP/k/3dgBx
+YOJRuAj13I+tAwbU8ZM6ErQtfRQdO8wemIhyrExFpoQ5HXKGAjFOweFyvS8p43zM
+tbVNkBA5N0gMX+CTGofFtV/zO86n/BW1vn5DeARLvTmtkbs2aiiB5gDJmsQ/Zzsg
+jxLEiYsF35+oBEbTLVGFWZM5XpOGFu65mv1VyTCHHaJj8NeKREkfLlfqe7lxrxrs
+LqcrT3p2i9nB/K4oLlY2Y4u8Z5a1Is/dDnIGNZ65r6QEVInWCnLRklcWmfIkoMI4
+BOqgOtt+YF9tCv8Nmr6bt0NKcHR861oZMU6swh+miayxvIaf+vItcYWkWhS20qle
+WAN1/WhIFzCRH+I7heISmdxvu9b9ewNEHv8upJjpFSvTk+Kv3TE=
+=F72o
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.sha512
==
--- dev/spark/v3.0.3-rc1-bin/pyspark-3.0.3.tar.gz.sha512 (added)
+++ dev/spark/v3.0.3-rc1-bin/pyspark-

[spark] branch master updated (195090a -> b191d72)

2021-06-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 195090a  [SPARK-35764][SQL] Assign pretty names to 
TimestampWithoutTZType
 add b191d72  [SPARK-35056][SQL] Group exception messages in 
execution/streaming

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/errors/QueryExecutionErrors.scala| 95 +-
 .../streaming/CheckpointFileManager.scala  | 16 ++--
 .../sql/execution/streaming/FileStreamSink.scala   | 21 +
 .../sql/execution/streaming/GroupStateImpl.scala   | 15 ++--
 .../sql/execution/streaming/HDFSMetadataLog.scala  |  7 +-
 .../streaming/ManifestFileCommitProtocol.scala |  4 +-
 .../execution/streaming/MicroBatchExecution.scala  |  4 +-
 .../execution/streaming/StreamingRelation.scala|  3 +-
 .../execution/streaming/statefulOperators.scala|  3 +-
 9 files changed, 121 insertions(+), 47 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35764][SQL] Assign pretty names to TimestampWithoutTZType

2021-06-15 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 195090a  [SPARK-35764][SQL] Assign pretty names to 
TimestampWithoutTZType
195090a is described below

commit 195090afcc8ed138336b353edc0a4db6f0f5f168
Author: Gengliang Wang 
AuthorDate: Tue Jun 15 12:15:13 2021 +0300

[SPARK-35764][SQL] Assign pretty names to TimestampWithoutTZType

### What changes were proposed in this pull request?

In the PR, I propose to override the typeName() method in 
TimestampWithoutTZType, and assign it a name according to the ANSI SQL standard

![image](https://user-images.githubusercontent.com/1097932/122013859-2cf50680-cdf1-11eb-9fcd-0ec1b59fb5c0.png)

### Why are the changes needed?

To improve Spark SQL user experience, and have readable types in error 
messages.

### Does this PR introduce _any_ user-facing change?

No, the new timestamp type is not released yet.
### How was this patch tested?

Unit test

Closes #32915 from gengliangwang/typename.

Authored-by: Gengliang Wang 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/types/TimestampWithoutTZType.scala |  2 ++
 .../apache/spark/sql/catalyst/expressions/CastSuite.scala   | 13 +
 2 files changed, 15 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala
index 558f5ee..856d549 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala
@@ -48,6 +48,8 @@ class TimestampWithoutTZType private() extends AtomicType {
*/
   override def defaultSize: Int = 8
 
+  override def typeName: String = "timestamp without time zone"
+
   private[spark] override def asNullable: TimestampWithoutTZType = this
 }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index c268d52..910c757 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -1295,6 +1295,19 @@ abstract class AnsiCastSuiteBase extends CastSuiteBase {
   }
 }
   }
+
+  test("disallow type conversions between Numeric types and Timestamp without 
time zone type") {
+import DataTypeTestUtils.numericTypes
+checkInvalidCastFromNumericType(TimestampWithoutTZType)
+var errorMsg = "cannot cast bigint to timestamp without time zone"
+verifyCastFailure(cast(Literal(0L), TimestampWithoutTZType), 
Some(errorMsg))
+
+val timestampWithoutTZLiteral = Literal.create(LocalDateTime.now(), 
TimestampWithoutTZType)
+errorMsg = "cannot cast timestamp without time zone to"
+numericTypes.foreach { numericType =>
+  verifyCastFailure(cast(timestampWithoutTZLiteral, numericType), 
Some(errorMsg))
+}
+  }
 }
 
 /**

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b9aeeb4 -> a50bd8f)

2021-06-15 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b9aeeb4  [SPARK-35683][PYTHON] Fix Index.difference to avoid collect 
'other' to driver side
 add a50bd8f  [SPARK-35742][SQL] Expression.semanticEquals should be 
symmetrical

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/Expression.scala  |   2 +-
 .../catalyst/expressions/namedExpressions.scala|   5 -
 .../spark/sql/catalyst/expressions/subquery.scala  |   7 -
 .../catalyst/expressions/CanonicalizeSuite.scala   |   7 +
 .../execution/SubqueryAdaptiveBroadcastExec.scala  |   6 +
 .../execution/aggregate/HashAggregateExec.scala|   4 +-
 .../org/apache/spark/sql/execution/subquery.scala  |  10 +-
 .../approved-plans-v1_4/q23b/explain.txt   | 323 -
 .../approved-plans-v1_4/q23b/simplified.txt|  18 +-
 .../approved-plans-v1_4/q44.sf100/explain.txt  | 231 +++
 .../approved-plans-v1_4/q44.sf100/simplified.txt   |  13 +-
 .../approved-plans-v1_4/q44/explain.txt| 231 +++
 .../approved-plans-v1_4/q44/simplified.txt |  13 +-
 .../approved-plans-v1_4/q58.sf100/explain.txt  | 394 ---
 .../approved-plans-v1_4/q58.sf100/simplified.txt   |  40 +-
 .../approved-plans-v1_4/q58/explain.txt| 368 --
 .../approved-plans-v1_4/q58/simplified.txt |  40 +-
 .../approved-plans-v2_7/q14a.sf100/explain.txt | 770 ++---
 .../approved-plans-v2_7/q14a.sf100/simplified.txt  | 114 +--
 .../approved-plans-v2_7/q14a/explain.txt   | 770 ++---
 .../approved-plans-v2_7/q14a/simplified.txt| 114 +--
 21 files changed, 1092 insertions(+), 2388 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org