date:20210806

[spark] branch branch-3.1 updated (9c95d3f -> aece7e7)

2021-08-06 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9c95d3f  [SPARK-36441][INFRA] Fix GA failure related to downloading 
lintr dependencies
 add aece7e7  [SPARK-36393][BUILD][3.1] Try to raise memory for GHA

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 2 +-
 build/sbt-launch-lib.bash| 6 --
 dev/run-tests.py | 7 +--
 pom.xml  | 6 +++---
 project/SparkBuild.scala | 4 ++--
 5 files changed, 11 insertions(+), 14 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6e72951 -> 4624e59)

2021-08-06 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6e72951  [SPARK-36423][SHUFFLE] Randomize order of blocks in a push 
request to improve block merge ratio for push-based shuffle
 add 4624e59  [SPARK-36359][SQL] Coalesce drop all expressions after the 
first non nullable expression

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/nullExpressions.scala  |  4 +++-
 .../spark/sql/catalyst/optimizer/expressions.scala  | 11 ---
 .../spark/sql/catalyst/trees/TreePatterns.scala |  1 +
 .../BinaryComparisonSimplificationSuite.scala   | 21 +
 .../scala/org/apache/spark/sql/ExplainSuite.scala   |  4 ++--
 5 files changed, 35 insertions(+), 6 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle

2021-08-06 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 552a332  [SPARK-36423][SHUFFLE] Randomize order of blocks in a push 
request to improve block merge ratio for push-based shuffle
552a332 is described below

commit 552a332dd464a47531574c4aba97060410d889cb
Author: Min Shen 
AuthorDate: Fri Aug 6 09:47:42 2021 -0500

[SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to 
improve block merge ratio for push-based shuffle

### What changes were proposed in this pull request?

On the client side, we are currently randomizing the order of push requests 
before processing each request. In addition we can further randomize the order 
of blocks within each push request before pushing them.
In our benchmark, this has resulted in a 60%-70% reduction of blocks that 
fail to be merged due to bock collision (the existing block merge ratio is 
already pretty good in general, and this further improves it).

### Why are the changes needed?

Improve block merge ratio for push-based shuffle

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Straightforward small change, no additional test needed.

Closes #33649 from Victsm/SPARK-36423.

Lead-authored-by: Min Shen 
Co-authored-by: Min Shen 
Signed-off-by: Mridul Muralidharan gmail.com>
(cherry picked from commit 6e729515fd2bb228afed964b50f0d02329684934)
Signed-off-by: Mridul Muralidharan 
---
 .../scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala  | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala 
b/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala
index 56f915b..ecaa4f0 100644
--- a/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala
+++ b/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala
@@ -242,10 +242,16 @@ private[spark] class ShuffleBlockPusher(conf: SparkConf) 
extends Logging {
 handleResult(PushResult(blockId, exception))
   }
 }
+// In addition to randomizing the order of the push requests, further 
randomize the order
+// of blocks within the push request to further reduce the likelihood of 
shuffle server side
+// collision of pushed blocks. This does not increase the cost of reading 
unmerged shuffle
+// files on the executor side, because we are still reading MB-size chunks 
and only randomize
+// the in-memory sliced buffers post reading.
+val (blockPushIds, blockPushBuffers) = Utils.randomize(blockIds.zip(
+  sliceReqBufferIntoBlockBuffers(request.reqBuffer, 
request.blocks.map(_._2.unzip
 SparkEnv.get.blockManager.blockStoreClient.pushBlocks(
-  address.host, address.port, blockIds.toArray,
-  sliceReqBufferIntoBlockBuffers(request.reqBuffer, 
request.blocks.map(_._2)),
-  blockPushListener)
+  address.host, address.port, blockPushIds.toArray,
+  blockPushBuffers.toArray, blockPushListener)
   }
 
   /**

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle

2021-08-06 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6e72951  [SPARK-36423][SHUFFLE] Randomize order of blocks in a push 
request to improve block merge ratio for push-based shuffle
6e72951 is described below

commit 6e729515fd2bb228afed964b50f0d02329684934
Author: Min Shen 
AuthorDate: Fri Aug 6 09:47:42 2021 -0500

[SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to 
improve block merge ratio for push-based shuffle

### What changes were proposed in this pull request?

On the client side, we are currently randomizing the order of push requests 
before processing each request. In addition we can further randomize the order 
of blocks within each push request before pushing them.
In our benchmark, this has resulted in a 60%-70% reduction of blocks that 
fail to be merged due to bock collision (the existing block merge ratio is 
already pretty good in general, and this further improves it).

### Why are the changes needed?

Improve block merge ratio for push-based shuffle

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Straightforward small change, no additional test needed.

Closes #33649 from Victsm/SPARK-36423.

Lead-authored-by: Min Shen 
Co-authored-by: Min Shen 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala  | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala 
b/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala
index 56f915b..ecaa4f0 100644
--- a/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala
+++ b/core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala
@@ -242,10 +242,16 @@ private[spark] class ShuffleBlockPusher(conf: SparkConf) 
extends Logging {
 handleResult(PushResult(blockId, exception))
   }
 }
+// In addition to randomizing the order of the push requests, further 
randomize the order
+// of blocks within the push request to further reduce the likelihood of 
shuffle server side
+// collision of pushed blocks. This does not increase the cost of reading 
unmerged shuffle
+// files on the executor side, because we are still reading MB-size chunks 
and only randomize
+// the in-memory sliced buffers post reading.
+val (blockPushIds, blockPushBuffers) = Utils.randomize(blockIds.zip(
+  sliceReqBufferIntoBlockBuffers(request.reqBuffer, 
request.blocks.map(_._2.unzip
 SparkEnv.get.blockManager.blockStoreClient.pushBlocks(
-  address.host, address.port, blockIds.toArray,
-  sliceReqBufferIntoBlockBuffers(request.reqBuffer, 
request.blocks.map(_._2)),
-  blockPushListener)
+  address.host, address.port, blockPushIds.toArray,
+  blockPushBuffers.toArray, blockPushListener)
   }
 
   /**

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-595][DOCS] Add local-cluster mode option in Documentation

2021-08-06 Thread tgraves

This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new a5d0eaf  [SPARK-595][DOCS] Add local-cluster mode option in 
Documentation
a5d0eaf is described below

commit a5d0eafa324279e4516bd4c6b544b0cc7dbbd4e3
Author: Yuto Akutsu 
AuthorDate: Fri Aug 6 09:26:13 2021 -0500

[SPARK-595][DOCS] Add local-cluster mode option in Documentation

### What changes were proposed in this pull request?

Add local-cluster mode option to submitting-applications.md

### Why are the changes needed?

Help users to find/use this option for unit tests.

### Does this PR introduce _any_ user-facing change?

Yes, docs changed.

### How was this patch tested?

`SKIP_API=1 bundle exec jekyll build`
https://user-images.githubusercontent.com/87687356/127125380-6beb4601-7cf4-4876-b2c6-459454ce2a02.png;>

Closes #33537 from yutoacts/SPARK-595.

Lead-authored-by: Yuto Akutsu 
Co-authored-by: Yuto Akutsu 
Co-authored-by: Yuto Akutsu <87687356+yutoa...@users.noreply.github.com>
Signed-off-by: Thomas Graves 
(cherry picked from commit 41b011e416286374e2e8e8dea36ba79f4c403040)
Signed-off-by: Thomas Graves 
---
 docs/submitting-applications.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md
index 0319859..402dd06 100644
--- a/docs/submitting-applications.md
+++ b/docs/submitting-applications.md
@@ -162,9 +162,10 @@ The master URL passed to Spark can be in one of the 
following formats:
 Master URLMeaning
  local  Run Spark locally with one worker thread 
(i.e. no parallelism at all). 
  local[K]  Run Spark locally with K worker 
threads (ideally, set this to the number of cores on your machine). 
- local[K,F]  Run Spark locally with K worker 
threads and F maxFailures (see spark.task.maxFailures for an 
explanation of this variable) 
+ local[K,F]  Run Spark locally with K worker 
threads and F maxFailures (see spark.task.maxFailures for an 
explanation of this variable). 
  local[*]  Run Spark locally with as many worker 
threads as logical cores on your machine.
  local[*,F]  Run Spark locally with as many 
worker threads as logical cores on your machine and F maxFailures.
+ local-cluster[N,C,M]  Local-cluster mode is 
only for unit tests. It emulates a distributed cluster in a single JVM with N 
number of workers, C cores per worker and M MiB of memory per worker.
  spark://HOST:PORT  Connect to the given Spark standalone
 cluster master. The port must be whichever one your master is 
configured to use, which is 7077 by default.
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-595][DOCS] Add local-cluster mode option in Documentation

2021-08-06 Thread tgraves

This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 41b011e  [SPARK-595][DOCS] Add local-cluster mode option in 
Documentation
41b011e is described below

commit 41b011e416286374e2e8e8dea36ba79f4c403040
Author: Yuto Akutsu 
AuthorDate: Fri Aug 6 09:26:13 2021 -0500

[SPARK-595][DOCS] Add local-cluster mode option in Documentation

### What changes were proposed in this pull request?

Add local-cluster mode option to submitting-applications.md

### Why are the changes needed?

Help users to find/use this option for unit tests.

### Does this PR introduce _any_ user-facing change?

Yes, docs changed.

### How was this patch tested?

`SKIP_API=1 bundle exec jekyll build`
https://user-images.githubusercontent.com/87687356/127125380-6beb4601-7cf4-4876-b2c6-459454ce2a02.png;>

Closes #33537 from yutoacts/SPARK-595.

Lead-authored-by: Yuto Akutsu 
Co-authored-by: Yuto Akutsu 
Co-authored-by: Yuto Akutsu <87687356+yutoa...@users.noreply.github.com>
Signed-off-by: Thomas Graves 
---
 docs/submitting-applications.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md
index 0319859..402dd06 100644
--- a/docs/submitting-applications.md
+++ b/docs/submitting-applications.md
@@ -162,9 +162,10 @@ The master URL passed to Spark can be in one of the 
following formats:
 Master URLMeaning
  local  Run Spark locally with one worker thread 
(i.e. no parallelism at all). 
  local[K]  Run Spark locally with K worker 
threads (ideally, set this to the number of cores on your machine). 
- local[K,F]  Run Spark locally with K worker 
threads and F maxFailures (see spark.task.maxFailures for an 
explanation of this variable) 
+ local[K,F]  Run Spark locally with K worker 
threads and F maxFailures (see spark.task.maxFailures for an 
explanation of this variable). 
  local[*]  Run Spark locally with as many worker 
threads as logical cores on your machine.
  local[*,F]  Run Spark locally with as many 
worker threads as logical cores on your machine and F maxFailures.
+ local-cluster[N,C,M]  Local-cluster mode is 
only for unit tests. It emulates a distributed cluster in a single JVM with N 
number of workers, C cores per worker and M MiB of memory per worker.
  spark://HOST:PORT  Connect to the given Spark standalone
 cluster master. The port must be whichever one your master is 
configured to use, which is 7077 by default.
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: Revert "[SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported"

2021-08-06 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 586eb5d  Revert "[SPARK-36429][SQL] JacksonParser should throw 
exception when data type unsupported"
586eb5d is described below

commit 586eb5d4c6b01b008cb0ace076f94f49580201de
Author: Kousuke Saruta 
AuthorDate: Fri Aug 6 20:56:24 2021 +0900

Revert "[SPARK-36429][SQL] JacksonParser should throw exception when data 
type unsupported"

### What changes were proposed in this pull request?

This PR reverts the change in SPARK-36429 (#33654).
See 
[conversation](https://github.com/apache/spark/pull/33654#issuecomment-894160037).

### Why are the changes needed?

To recover CIs.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #33670 from sarutak/revert-SPARK-36429.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit e17612d0bfa1b1dc719f6f2c202e2a4ea7870ff1)
Signed-off-by: Kousuke Saruta 
---
 .../scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala  | 8 ++--
 .../sql-tests/results/timestampNTZ/timestamp-ansi.sql.out | 5 ++---
 .../resources/sql-tests/results/timestampNTZ/timestamp.sql.out| 5 ++---
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
index 2761c52..04a0f1a 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
@@ -330,8 +330,12 @@ class JacksonParser(
 case udt: UserDefinedType[_] =>
   makeConverter(udt.sqlType)
 
-// We don't actually hit this exception though, we keep it for 
understandability
-case _ => throw QueryExecutionErrors.unsupportedTypeError(dataType)
+case _ =>
+  (parser: JsonParser) =>
+// Here, we pass empty `PartialFunction` so that this case can be
+// handled as a failed conversion. It will throw an exception as
+// long as the value is not null.
+parseJsonToken[AnyRef](parser, 
dataType)(PartialFunction.empty[JsonToken, AnyRef])
   }
 
   /**
diff --git 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
index fae7721..fe83675 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
@@ -661,10 +661,9 @@ You may get a different result due to the upgrading of 
Spark 3.0: Fail to recogn
 -- !query
 select from_json('{"t":"26/October/2015"}', 't Timestamp', 
map('timestampFormat', 'dd/M/'))
 -- !query schema
-struct<>
+struct>
 -- !query output
-java.lang.Exception
-Unsupported type: timestamp_ntz
+{"t":null}
 
 
 -- !query
diff --git 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out
index c6de535..b8a6800 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out
@@ -642,10 +642,9 @@ You may get a different result due to the upgrading of 
Spark 3.0: Fail to recogn
 -- !query
 select from_json('{"t":"26/October/2015"}', 't Timestamp', 
map('timestampFormat', 'dd/M/'))
 -- !query schema
-struct<>
+struct>
 -- !query output
-java.lang.Exception
-Unsupported type: timestamp_ntz
+{"t":null}
 
 
 -- !query

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (888f8f0 -> e17612d)

2021-08-06 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 888f8f0  [SPARK-36339][SQL] References to grouping that not part of 
aggregation should be replaced
 add e17612d  Revert "[SPARK-36429][SQL] JacksonParser should throw 
exception when data type unsupported"

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala  | 8 ++--
 .../sql-tests/results/timestampNTZ/timestamp-ansi.sql.out | 5 ++---
 .../resources/sql-tests/results/timestampNTZ/timestamp.sql.out| 5 ++---
 3 files changed, 10 insertions(+), 8 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36339][SQL] References to grouping that not part of aggregation should be replaced

2021-08-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 33e4ce5  [SPARK-36339][SQL] References to grouping that not part of 
aggregation should be replaced
33e4ce5 is described below

commit 33e4ce562a59d191b06d477ecc6d9230e43b96b8
Author: gaoyajun02 
AuthorDate: Fri Aug 6 16:34:37 2021 +0800

[SPARK-36339][SQL] References to grouping that not part of aggregation 
should be replaced

### What changes were proposed in this pull request?

Currently, references to grouping sets are reported as errors after 
aggregated expressions, e.g.
```
SELECT count(name) c, name
FROM VALUES ('Alice'), ('Bob') people(name)
GROUP BY name GROUPING SETS(name);
```
Error in query: expression 'people.`name`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;

### Why are the changes needed?

Fix the map anonymous function in the constructAggregateExprs function does 
not use underscores to avoid

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests.

Closes #33574 from gaoyajun02/SPARK-36339.

Lead-authored-by: gaoyajun02 
Co-authored-by: gaoyajun02 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 888f8f03c89ea7ee8997171eadf64c87e17c4efe)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala   |  4 ++--
 .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 13 +
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 75fad11a..963b42b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -580,7 +580,7 @@ class Analyzer(override val catalogManager: CatalogManager)
 aggregations: Seq[NamedExpression],
 groupByAliases: Seq[Alias],
 groupingAttrs: Seq[Expression],
-gid: Attribute): Seq[NamedExpression] = aggregations.map {
+gid: Attribute): Seq[NamedExpression] = aggregations.map { agg =>
   // collect all the found AggregateExpression, so we can check an 
expression is part of
   // any AggregateExpression or not.
   val aggsBuffer = ArrayBuffer[Expression]()
@@ -588,7 +588,7 @@ class Analyzer(override val catalogManager: CatalogManager)
   def isPartOfAggregation(e: Expression): Boolean = {
 aggsBuffer.exists(a => a.find(_ eq e).isDefined)
   }
-  replaceGroupingFunc(_, groupByExprs, gid).transformDown {
+  replaceGroupingFunc(agg, groupByExprs, gid).transformDown {
 // AggregateExpression should be computed on the unmodified value of 
its argument
 // expressions, so we should not replace any references to grouping 
expression
 // inside it.
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index ed3b479..032ddbb 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -3405,6 +3405,19 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 }
   }
 
+  test("SPARK-36339: References to grouping attributes should be replaced") {
+withTempView("t") {
+  Seq("a", "a", "b").toDF("x").createOrReplaceTempView("t")
+  checkAnswer(
+sql(
+  """
+|select count(x) c, x from t
+|group by x grouping sets(x)
+  """.stripMargin),
+Seq(Row(2, "a"), Row(1, "b")))
+}
+  }
+
   test("SPARK-31166: UNION map and other maps should not fail") {
 checkAnswer(
   sql("(SELECT map()) UNION ALL (SELECT map(1, 2))"),

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7bb53b8 -> 888f8f0)

2021-08-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7bb53b8  [SPARK-36098][CORE] Grouping exception in core/storage
 add 888f8f0  [SPARK-36339][SQL] References to grouping that not part of 
aggregation should be replaced

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala   |  4 ++--
 .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 13 +
 2 files changed, 15 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c97fb68 -> 7bb53b8)

2021-08-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c97fb68  [SPARK-35221][SQL] Add the check of supported join hints
 add 7bb53b8  [SPARK-36098][CORE] Grouping exception in core/storage

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/errors/SparkCoreErrors.scala  | 97 +-
 .../scala/org/apache/spark/storage/BlockId.scala   |  4 +-
 .../apache/spark/storage/BlockInfoManager.scala|  8 +-
 .../org/apache/spark/storage/BlockManager.scala| 26 +++---
 .../spark/storage/BlockManagerDecommissioner.scala |  3 +-
 .../apache/spark/storage/BlockManagerMaster.scala  |  7 +-
 ...avedOnDecommissionedBlockManagerException.scala |  2 +-
 .../apache/spark/storage/DiskBlockManager.scala|  5 +-
 .../spark/storage/DiskBlockObjectWriter.scala  |  3 +-
 .../storage/ShuffleBlockFetcherIterator.scala  | 16 ++--
 10 files changed, 133 insertions(+), 38 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (63c7d18 -> c97fb68)

2021-08-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 63c7d18  [SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply 
with the change in SPARK-36429
 add c97fb68  [SPARK-35221][SQL] Add the check of supported join hints

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/HintErrorLogger.scala|  4 +
 .../spark/sql/catalyst/optimizer/joins.scala   |  4 +
 .../spark/sql/catalyst/plans/logical/hints.scala   |  6 ++
 .../spark/sql/execution/SparkStrategies.scala  | 48 ++-
 .../scala/org/apache/spark/sql/JoinHintSuite.scala | 95 ++
 5 files changed, 154 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply with the change in SPARK-36429

2021-08-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new f3761bd  [SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply 
with the change in SPARK-36429
f3761bd is described below

commit f3761bdb76559ff666effde31bf14773a75c452b
Author: Kousuke Saruta 
AuthorDate: Fri Aug 6 15:20:54 2021 +0800

[SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply with the change 
in SPARK-36429

### What changes were proposed in this pull request?

This PR updates a golden to comply with the change in SPARK-36429 (#33654).

### Why are the changes needed?

To recover GA failure.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA itself.

Closes #33663 from sarutak/followup-SPARK-36429.

Authored-by: Kousuke Saruta 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 63c7d1847d97dca5ceb9a46c77a623cb78565f5b)
Signed-off-by: Wenchen Fan 
---
 .../resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
index fe83675..fae7721 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out
@@ -661,9 +661,10 @@ You may get a different result due to the upgrading of 
Spark 3.0: Fail to recogn
 -- !query
 select from_json('{"t":"26/October/2015"}', 't Timestamp', 
map('timestampFormat', 'dd/M/'))
 -- !query schema
-struct>
+struct<>
 -- !query output
-{"t":null}
+java.lang.Exception
+Unsupported type: timestamp_ntz
 
 
 -- !query

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (eb12727 -> 63c7d18)

2021-08-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from eb12727  [SPARK-36429][SQL] JacksonParser should throw exception when 
data type unsupported
 add 63c7d18  [SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply 
with the change in SPARK-36429

No new revisions were added by this update.

Summary of changes:
 .../resources/sql-tests/results/timestampNTZ/timestamp-ansi.sql.out  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported

2021-08-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new be19270  [SPARK-36429][SQL] JacksonParser should throw exception when 
data type unsupported
be19270 is described below

commit be192708809de363a04895e62bc1ca1216658395
Author: gengjiaan 
AuthorDate: Fri Aug 6 12:53:04 2021 +0800

[SPARK-36429][SQL] JacksonParser should throw exception when data type 
unsupported

### What changes were proposed in this pull request?
Currently, when `set spark.sql.timestampType=TIMESTAMP_NTZ`, the behavior 
is different between `from_json` and `from_csv`.
```
-- !query
select from_json('{"t":"26/October/2015"}', 't Timestamp', 
map('timestampFormat', 'dd/M/'))
-- !query schema
struct>
-- !query output
{"t":null}
```

```
-- !query
select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 
'dd/M/'))
-- !query schema
struct<>
-- !query output
java.lang.Exception
Unsupported type: timestamp_ntz
```

We should make `from_json` throws exception too.
This PR fix the discussion below
https://github.com/apache/spark/pull/33640#discussion_r682862523

### Why are the changes needed?
Make the behavior of `from_json` more reasonable.

### Does this PR introduce _any_ user-facing change?
'Yes'.
from_json throwing Exception when we set 
spark.sql.timestampType=TIMESTAMP_NTZ.

### How was this patch tested?
Tests updated.

Closes #33654 from beliefer/SPARK-36429.

Authored-by: gengjiaan 
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala  | 8 ++--
 .../resources/sql-tests/results/timestampNTZ/timestamp.sql.out| 5 +++--
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
index 04a0f1a..2761c52 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
@@ -330,12 +330,8 @@ class JacksonParser(
 case udt: UserDefinedType[_] =>
   makeConverter(udt.sqlType)
 
-case _ =>
-  (parser: JsonParser) =>
-// Here, we pass empty `PartialFunction` so that this case can be
-// handled as a failed conversion. It will throw an exception as
-// long as the value is not null.
-parseJsonToken[AnyRef](parser, 
dataType)(PartialFunction.empty[JsonToken, AnyRef])
+// We don't actually hit this exception though, we keep it for 
understandability
+case _ => throw QueryExecutionErrors.unsupportedTypeError(dataType)
   }
 
   /**
diff --git 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out
index b8a6800..c6de535 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/timestamp.sql.out
@@ -642,9 +642,10 @@ You may get a different result due to the upgrading of 
Spark 3.0: Fail to recogn
 -- !query
 select from_json('{"t":"26/October/2015"}', 't Timestamp', 
map('timestampFormat', 'dd/M/'))
 -- !query schema
-struct>
+struct<>
 -- !query output
-{"t":null}
+java.lang.Exception
+Unsupported type: timestamp_ntz
 
 
 -- !query

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated (9c95d3f -> aece7e7)

[spark] branch master updated (6e72951 -> 4624e59)

[spark] branch branch-3.2 updated: [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle

[spark] branch master updated: [SPARK-36423][SHUFFLE] Randomize order of blocks in a push request to improve block merge ratio for push-based shuffle

[spark] branch branch-3.2 updated: [SPARK-595][DOCS] Add local-cluster mode option in Documentation

[spark] branch master updated: [SPARK-595][DOCS] Add local-cluster mode option in Documentation

[spark] branch branch-3.2 updated: Revert "[SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported"

[spark] branch master updated (888f8f0 -> e17612d)

[spark] branch branch-3.2 updated: [SPARK-36339][SQL] References to grouping that not part of aggregation should be replaced

[spark] branch master updated (7bb53b8 -> 888f8f0)

[spark] branch master updated (c97fb68 -> 7bb53b8)

[spark] branch master updated (63c7d18 -> c97fb68)

[spark] branch branch-3.2 updated: [SPARK-36429][SQL][FOLLOWUP] Update a golden file to comply with the change in SPARK-36429

[spark] branch master updated (eb12727 -> 63c7d18)

[spark] branch branch-3.2 updated: [SPARK-36429][SQL] JacksonParser should throw exception when data type unsupported

15 matches

Site Navigation

Mail list logo

Footer information