from:"gengliang"

[spark] branch master updated (6f68ccf -> d691d85)

2020-11-25 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6f68ccf  [SPARK-31257][SPARK-33561][SQL] Unify create table syntax
 add d691d85  [SPARK-33496][SQL] Improve error message of ANSI explicit cast

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/Cast.scala  | 51 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala | 38 +---
 2 files changed, 82 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8594958 -> 29e415d)

2020-12-03 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8594958  [SPARK-33650][SQL] Fix the error from ALTER TABLE .. ADD/DROP 
PARTITION for non-supported partition management table
 add 29e415d  [SPARK-33649][SQL][DOC] Improve the doc of 
spark.sql.ansi.enabled

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md   |  3 ++-
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 ++-
 2 files changed, 8 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c24f2b2 -> 5d0045e)

2020-12-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c24f2b2  [SPARK-33612][SQL] Add dataSourceRewriteRules batch to 
Optimizer
 add 5d0045e  [SPARK-33611][UI] Avoid encoding twice on the query parameter 
of rewritten proxy URL

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 16 ++--
 core/src/test/scala/org/apache/spark/ui/UISuite.scala|  9 +
 2 files changed, 15 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL

2020-12-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6abfeb6  [SPARK-33611][UI] Avoid encoding twice on the query parameter 
of rewritten proxy URL
6abfeb6 is described below

commit 6abfeb6884a3cdfe4c6e621219e6cf5a35d6467e
Author: Gengliang Wang 
AuthorDate: Wed Dec 2 01:36:41 2020 +0800

[SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten 
proxy URL

### What changes were proposed in this pull request?

When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server), 
the request URL can be encoded twice if we pass the query string directly to 
the constructor of `java.net.URI`:
```
> val uri = "http://localhost:8081/test;
> val query = "order%5B0%5D%5Bcolumn%5D=0"  // query string of URL from the 
reverse proxy
> val rewrittenURI = URI.create(uri.toString())

> new URI(rewrittenURI.getScheme(),
  rewrittenURI.getAuthority(),
  rewrittenURI.getPath(),
  query,
  rewrittenURI.getFragment()).toString
result: http://localhost:8081/test?order%255B0%255D%255Bcolumn%255D=0
```

In Spark's stage page, the URL of "/taskTable" contains query parameter 
order[0][dir]. After encoding twice, the query parameter becomes 
`order%255B0%255D%255Bdir%255D` and it will be decoded as 
`order%5B0%5D%5Bdir%5D` instead of `order[0][dir]`. As a result, there will be 
NullPointerException from 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176
Other than that, the other parameter may not work as expected after encoded 
twice.

This PR is to fix the bug by calling the method `URI.create(String URL)` 
directly. This convenience method can avoid encoding twice on the query 
parameter.
```
> val uri = "http://localhost:8081/test;
> val query = "order%5B0%5D%5Bcolumn%5D=0"
> URI.create(s"$uri?$query").toString
result: http://localhost:8081/test?order%5B0%5D%5Bcolumn%5D=0

> URI.create(s"$uri?$query").getQuery
result: order[0][column]=0
```

### Why are the changes needed?

Fix a potential bug when Spark's reverse proxy is enabled.
The bug itself is similar to https://github.com/apache/spark/pull/29271.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add a new unit test.
Also, Manual UI testing for master, worker and app UI with an nginx proxy

Spark config:
```
spark.ui.port 8080
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/path/to/spark/
```
nginx config:
```
server {
listen 9000;
set $SPARK_MASTER http://127.0.0.1:8080;
# split spark UI path into prefix and local path within master UI
location ~ ^(/path/to/spark/) {
# strip prefix when forwarding request
rewrite /path/to/spark(/.*) $1  break;
#rewrite /path/to/spark/ "/" ;
# forward to spark master UI
proxy_pass $SPARK_MASTER;
proxy_intercept_errors on;
error_page 301 302 307 = handle_redirects;
}
location handle_redirects {
set $saved_redirect_location '$upstream_http_location';
proxy_pass $saved_redirect_location;
}
}
```

Closes #30552 from gengliangwang/decodeProxyRedirect.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 5d0045eedf4b138c031accac2b1fa1e8d6f3f7c6)
Signed-off-by: Gengliang Wang 
---
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 16 ++--
 core/src/test/scala/org/apache/spark/ui/UISuite.scala|  9 +
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala 
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index a4ba565..3820a88 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -400,17 +400,13 @@ private[spark] object JettyUtils extends Logging {
   uri.append(rest)
 }
 
-val rewrittenURI = URI.create(uri.toString())
-if (query != null) {
-  return new URI(
-  rewrittenURI.getScheme(),
-  rewrittenURI.getAuthority(),
-  rewrittenURI.getPath(),
-  query,
-  rewrittenURI.getFragment()
-).normalize()
+val queryString = if (query == null) {
+  ""
+} else {
+  s"?$query"
 }
-rewrittenURI.normalize()
+// SPARK-33611: use method `URI.cre

[spark] branch master updated (cdd8e51 -> f80fe21)

2020-11-13 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cdd8e51  [SPARK-33419][SQL] Unexpected behavior when using SET 
commands before a query in SparkSession.sql
 add f80fe21  [SPARK-33166][DOC] Provide Search Function in Spark docs site

No new revisions were added by this update.

Summary of changes:
 docs/_layouts/global.html | 23 +++
 docs/css/docsearch.css| 36 
 2 files changed, 59 insertions(+)
 create mode 100644 docs/css/docsearch.css


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cdd8e51 -> f80fe21)

2020-11-13 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cdd8e51  [SPARK-33419][SQL] Unexpected behavior when using SET 
commands before a query in SparkSession.sql
 add f80fe21  [SPARK-33166][DOC] Provide Search Function in Spark docs site

No new revisions were added by this update.

Summary of changes:
 docs/_layouts/global.html | 23 +++
 docs/css/docsearch.css| 36 
 2 files changed, 59 insertions(+)
 create mode 100644 docs/css/docsearch.css


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cdd8e51 -> f80fe21)

2020-11-13 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cdd8e51  [SPARK-33419][SQL] Unexpected behavior when using SET 
commands before a query in SparkSession.sql
 add f80fe21  [SPARK-33166][DOC] Provide Search Function in Spark docs site

No new revisions were added by this update.

Summary of changes:
 docs/_layouts/global.html | 23 +++
 docs/css/docsearch.css| 36 
 2 files changed, 59 insertions(+)
 create mode 100644 docs/css/docsearch.css


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cdd8e51 -> f80fe21)

2020-11-13 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cdd8e51  [SPARK-33419][SQL] Unexpected behavior when using SET 
commands before a query in SparkSession.sql
 add f80fe21  [SPARK-33166][DOC] Provide Search Function in Spark docs site

No new revisions were added by this update.

Summary of changes:
 docs/_layouts/global.html | 23 +++
 docs/css/docsearch.css| 36 
 2 files changed, 59 insertions(+)
 create mode 100644 docs/css/docsearch.css


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (74bd046 -> a180e02)

2020-11-18 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 74bd046  [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1
 add a180e02  [SPARK-32852][SQL][DOC][FOLLOWUP] Revise the documentation of 
spark.sql.hive.metastore.jars

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/hive/HiveUtils.scala  | 23 +++---
 1 file changed, 12 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (74bd046 -> a180e02)

2020-11-18 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 74bd046  [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1
 add a180e02  [SPARK-32852][SQL][DOC][FOLLOWUP] Revise the documentation of 
spark.sql.hive.metastore.jars

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/hive/HiveUtils.scala  | 23 +++---
 1 file changed, 12 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (74bd046 -> a180e02)

2020-11-18 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 74bd046  [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1
 add a180e02  [SPARK-32852][SQL][DOC][FOLLOWUP] Revise the documentation of 
spark.sql.hive.metastore.jars

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/hive/HiveUtils.scala  | 23 +++---
 1 file changed, 12 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (74bd046 -> a180e02)

2020-11-18 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 74bd046  [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1
 add a180e02  [SPARK-32852][SQL][DOC][FOLLOWUP] Revise the documentation of 
spark.sql.hive.metastore.jars

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/hive/HiveUtils.scala  | 23 +++---
 1 file changed, 12 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (74bd046 -> a180e02)

2020-11-18 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 74bd046  [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1
 add a180e02  [SPARK-32852][SQL][DOC][FOLLOWUP] Revise the documentation of 
spark.sql.hive.metastore.jars

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/hive/HiveUtils.scala  | 23 +++---
 1 file changed, 12 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b8a440f -> 2b6dfa5)

2020-11-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b8a440f  [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop 
consuming after the task ends
 add 2b6dfa5  [SPARK-20044][UI] Support Spark UI behind front-end reverse 
proxy using a path prefix Revert proxy url

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/SparkContext.scala |   4 +-
 .../org/apache/spark/deploy/master/Master.scala|   8 +-
 .../spark/deploy/worker/ExecutorRunner.scala   |   3 +-
 .../org/apache/spark/deploy/worker/Worker.scala|   9 +-
 .../main/scala/org/apache/spark/ui/UIUtils.scala   |   3 +-
 .../apache/spark/deploy/master/MasterSuite.scala   | 101 +++--
 docs/configuration.md  |  25 -
 7 files changed, 140 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b8a440f -> 2b6dfa5)

2020-11-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b8a440f  [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop 
consuming after the task ends
 add 2b6dfa5  [SPARK-20044][UI] Support Spark UI behind front-end reverse 
proxy using a path prefix Revert proxy url

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/SparkContext.scala |   4 +-
 .../org/apache/spark/deploy/master/Master.scala|   8 +-
 .../spark/deploy/worker/ExecutorRunner.scala   |   3 +-
 .../org/apache/spark/deploy/worker/Worker.scala|   9 +-
 .../main/scala/org/apache/spark/ui/UIUtils.scala   |   3 +-
 .../apache/spark/deploy/master/MasterSuite.scala   | 101 +++--
 docs/configuration.md  |  25 -
 7 files changed, 140 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b8a440f -> 2b6dfa5)

2020-11-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b8a440f  [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop 
consuming after the task ends
 add 2b6dfa5  [SPARK-20044][UI] Support Spark UI behind front-end reverse 
proxy using a path prefix Revert proxy url

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/SparkContext.scala |   4 +-
 .../org/apache/spark/deploy/master/Master.scala|   8 +-
 .../spark/deploy/worker/ExecutorRunner.scala   |   3 +-
 .../org/apache/spark/deploy/worker/Worker.scala|   9 +-
 .../main/scala/org/apache/spark/ui/UIUtils.scala   |   3 +-
 .../apache/spark/deploy/master/MasterSuite.scala   | 101 +++--
 docs/configuration.md  |  25 -
 7 files changed, 140 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b8a440f -> 2b6dfa5)

2020-11-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b8a440f  [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop 
consuming after the task ends
 add 2b6dfa5  [SPARK-20044][UI] Support Spark UI behind front-end reverse 
proxy using a path prefix Revert proxy url

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/SparkContext.scala |   4 +-
 .../org/apache/spark/deploy/master/Master.scala|   8 +-
 .../spark/deploy/worker/ExecutorRunner.scala   |   3 +-
 .../org/apache/spark/deploy/worker/Worker.scala|   9 +-
 .../main/scala/org/apache/spark/ui/UIUtils.scala   |   3 +-
 .../apache/spark/deploy/master/MasterSuite.scala   | 101 +++--
 docs/configuration.md  |  25 -
 7 files changed, 140 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b8a440f -> 2b6dfa5)

2020-11-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b8a440f  [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop 
consuming after the task ends
 add 2b6dfa5  [SPARK-20044][UI] Support Spark UI behind front-end reverse 
proxy using a path prefix Revert proxy url

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/SparkContext.scala |   4 +-
 .../org/apache/spark/deploy/master/Master.scala|   8 +-
 .../spark/deploy/worker/ExecutorRunner.scala   |   3 +-
 .../org/apache/spark/deploy/worker/Worker.scala|   9 +-
 .../main/scala/org/apache/spark/ui/UIUtils.scala   |   3 +-
 .../apache/spark/deploy/master/MasterSuite.scala   | 101 +++--
 docs/configuration.md  |  25 -
 7 files changed, 140 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7e8eb04 -> 551b504)

2020-11-04 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7e8eb04  [SPARK-33314][SQL] Avoid dropping rows in Avro reader
 add 551b504  [SPARK-33316][SQL] Support user provided nullable Avro schema 
for non-nullable catalyst schema in Avro writing

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSerializer.scala | 54 
 .../apache/spark/sql/avro/SchemaConverters.scala   |  2 +
 .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 37 ++
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 57 ++
 4 files changed, 140 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7e8eb04 -> 551b504)

2020-11-04 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7e8eb04  [SPARK-33314][SQL] Avoid dropping rows in Avro reader
 add 551b504  [SPARK-33316][SQL] Support user provided nullable Avro schema 
for non-nullable catalyst schema in Avro writing

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSerializer.scala | 54 
 .../apache/spark/sql/avro/SchemaConverters.scala   |  2 +
 .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 37 ++
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 57 ++
 4 files changed, 140 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7e8eb04 -> 551b504)

2020-11-04 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7e8eb04  [SPARK-33314][SQL] Avoid dropping rows in Avro reader
 add 551b504  [SPARK-33316][SQL] Support user provided nullable Avro schema 
for non-nullable catalyst schema in Avro writing

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSerializer.scala | 54 
 .../apache/spark/sql/avro/SchemaConverters.scala   |  2 +
 .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 37 ++
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 57 ++
 4 files changed, 140 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7e8eb04 -> 551b504)

2020-11-04 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7e8eb04  [SPARK-33314][SQL] Avoid dropping rows in Avro reader
 add 551b504  [SPARK-33316][SQL] Support user provided nullable Avro schema 
for non-nullable catalyst schema in Avro writing

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSerializer.scala | 54 
 .../apache/spark/sql/avro/SchemaConverters.scala   |  2 +
 .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 37 ++
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 57 ++
 4 files changed, 140 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7e8eb04 -> 551b504)

2020-11-04 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7e8eb04  [SPARK-33314][SQL] Avoid dropping rows in Avro reader
 add 551b504  [SPARK-33316][SQL] Support user provided nullable Avro schema 
for non-nullable catalyst schema in Avro writing

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSerializer.scala | 54 
 .../apache/spark/sql/avro/SchemaConverters.scala   |  2 +
 .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 37 ++
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 57 ++
 4 files changed, 140 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d163110 -> f6c00079)

2020-11-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d163110  [SPARK-32934][SQL][FOLLOW-UP] Refine class naming and code 
comments
 add f6c00079 [SPARK-33342][WEBUI] fix the wrong url and display name of 
blocking thread in threadDump page

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d163110 -> f6c00079)

2020-11-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d163110  [SPARK-32934][SQL][FOLLOW-UP] Refine class naming and code 
comments
 add f6c00079 [SPARK-33342][WEBUI] fix the wrong url and display name of 
blocking thread in threadDump page

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d163110 -> f6c00079)

2020-11-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d163110  [SPARK-32934][SQL][FOLLOW-UP] Refine class naming and code 
comments
 add f6c00079 [SPARK-33342][WEBUI] fix the wrong url and display name of 
blocking thread in threadDump page

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d163110 -> f6c00079)

2020-11-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d163110  [SPARK-32934][SQL][FOLLOW-UP] Refine class naming and code 
comments
 add f6c00079 [SPARK-33342][WEBUI] fix the wrong url and display name of 
blocking thread in threadDump page

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d163110 -> f6c00079)

2020-11-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d163110  [SPARK-32934][SQL][FOLLOW-UP] Refine class naming and code 
comments
 add f6c00079 [SPARK-33342][WEBUI] fix the wrong url and display name of 
blocking thread in threadDump page

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-34005][CORE][3.1] Update peak memory metrics for each Executor on task end

2021-01-20 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 7b870e3  [SPARK-34005][CORE][3.1] Update peak memory metrics for each 
Executor on task end
7b870e3 is described below

commit 7b870e38d7c6ff46e16785e31a471120fe5b8428
Author: Kousuke Saruta 
AuthorDate: Wed Jan 20 19:50:05 2021 +0800

[SPARK-34005][CORE][3.1] Update peak memory metrics for each Executor on 
task end

### What changes were proposed in this pull request?

This PR backports SPARK-34005 (#31029).
This PR makes `AppStatusListener` update the peak memory metrics for each 
Executor on task end like other peak memory metrics (e.g, stage, executors in a 
stage).

### Why are the changes needed?

When `AppStatusListener#onExecutorMetricsUpdate` is called, peak memory 
metrics for Executors, stages and executors in a stage are updated but 
currently, the metrics only for Executors are not updated on task end.

### Does this PR introduce _any_ user-facing change?

Yes. Executor peak memory metrics is updated more accurately.

### How was this patch tested?

After I run a job with `local-cluster[1,1,1024]` and visited 
`/api/v1//executors`, I confirmed `peakExecutorMemory` metrics is shown 
for an Executor even though the life time of each job is very short .
I also modify the json files for `HistoryServerSuite`.

Closes #31261 from sarutak/SPARK-34005-branch-3.1.

Authored-by: Kousuke Saruta 
Signed-off-by: Gengliang Wang 
---
 .../apache/spark/status/AppStatusListener.scala|  1 +
 .../executor_list_json_expectation.json| 22 ++
 .../executor_memory_usage_expectation.json | 88 ++
 ...executor_node_excludeOnFailure_expectation.json | 88 ++
 ...e_excludeOnFailure_unexcluding_expectation.json | 88 ++
 5 files changed, 287 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala 
b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
index 6cb013b..52d41cd 100644
--- a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
+++ b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
@@ -759,6 +759,7 @@ private[spark] class AppStatusListener(
   exec.completedTasks += completedDelta
   exec.failedTasks += failedDelta
   exec.totalDuration += event.taskInfo.duration
+  
exec.peakExecutorMetrics.compareAndUpdatePeakValues(event.taskExecutorMetrics)
 
   // Note: For resubmitted tasks, we continue to use the metrics that 
belong to the
   // first attempt of this task. This may not be 100% accurate because the 
first attempt
diff --git 
a/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json
 
b/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json
index c18a2e3..be12507 100644
--- 
a/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json
+++ 
b/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json
@@ -21,6 +21,28 @@
   "addTime" : "2015-02-03T16:43:00.906GMT",
   "executorLogs" : { },
   "blacklistedInStages" : [ ],
+  "peakMemoryMetrics" : {
+"JVMHeapMemory" : 0,
+"JVMOffHeapMemory" : 0,
+"OnHeapExecutionMemory" : 0,
+"OffHeapExecutionMemory" : 0,
+"OnHeapStorageMemory" : 0,
+"OffHeapStorageMemory" : 0,
+"OnHeapUnifiedMemory" : 0,
+"OffHeapUnifiedMemory" : 0,
+"DirectPoolMemory" : 0,
+"MappedPoolMemory" : 0,
+"ProcessTreeJVMVMemory" : 0,
+"ProcessTreeJVMRSSMemory" : 0,
+"ProcessTreePythonVMemory" : 0,
+"ProcessTreePythonRSSMemory" : 0,
+"ProcessTreeOtherVMemory" : 0,
+"ProcessTreeOtherRSSMemory" : 0,
+"MinorGCCount" : 0,
+"MinorGCTime" : 0,
+"MajorGCCount" : 0,
+"MajorGCTime" : 0
+  },
   "attributes" : { },
   "resources" : { },
   "resourceProfileId" : 0,
diff --git 
a/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json
 
b/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json
index 5144934..0a3eb81 100644
--- 
a/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json
+++ 
b/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json
@@ -64,6 +64,28 @@
 "totalOffHeapStorageMemory" : 524288000
   },
   "blacklistedI

[spark] branch master updated (f2b22d1 -> bd9eeeb)

2021-01-29 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f2b22d1  [SPARK-34289][SQL] Parquet vectorized reader support column 
index
 add bd9eeeb  [SPARK-34288][WEBUI] Add a tip info for the `resources` 
column in the executors page

No new revisions were added by this update.

Summary of changes:
 .../resources/org/apache/spark/ui/static/executorspage-template.html| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e79dd89 -> 1b1a8e4)

2021-06-08 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e79dd89  [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float 
infinity to integer) in partitionBy function
 add 1b1a8e4  [SPARK-30993][FOLLOWUP][SQL] Refactor LocalDateTimeUDT as 
YearUDT in UserDefinedTypeSuite

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/UserDefinedTypeSuite.scala| 34 ++
 1 file changed, 16 insertions(+), 18 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (825b620 -> 84c5ca3)

2021-06-09 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 825b620  [SPARK-35687][SQL][TEST] PythonUDFSuite move assume into its 
methods
 add 84c5ca3  [SPARK-35664][SQL] Support java.time.LocalDateTime as an 
external type of TimestampWithoutTZ type

No new revisions were added by this update.

Summary of changes:
 .../expressions/SpecializedGettersReader.java  |  3 ++
 .../main/scala/org/apache/spark/sql/Encoders.scala |  8 +
 .../sql/catalyst/CatalystTypeConverters.scala  | 21 +++-
 .../sql/catalyst/DeserializerBuildHelper.scala |  9 ++
 .../apache/spark/sql/catalyst/InternalRow.scala|  4 +--
 .../spark/sql/catalyst/JavaTypeInference.scala |  7 
 .../spark/sql/catalyst/ScalaReflection.scala   | 10 ++
 .../spark/sql/catalyst/SerializerBuildHelper.scala |  9 ++
 .../apache/spark/sql/catalyst/dsl/package.scala|  4 +++
 .../spark/sql/catalyst/encoders/RowEncoder.scala   |  9 ++
 .../expressions/InterpretedUnsafeProjection.scala  |  2 +-
 .../catalyst/expressions/SpecificInternalRow.scala |  4 +--
 .../expressions/codegen/CodeGenerator.scala|  5 +--
 .../spark/sql/catalyst/expressions/literals.scala  | 10 --
 .../spark/sql/catalyst/util/DateTimeUtils.scala|  8 +
 .../org/apache/spark/sql/types/DataType.scala  |  2 +-
 .../sql/catalyst/CatalystTypeConvertersSuite.scala | 31 +-
 .../sql/catalyst/encoders/RowEncoderSuite.scala| 10 ++
 .../expressions/LiteralExpressionSuite.scala   | 11 +++
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 37 --
 .../scala/org/apache/spark/sql/SQLImplicits.scala  |  3 ++
 .../org/apache/spark/sql/JavaDatasetSuite.java | 13 +---
 .../scala/org/apache/spark/sql/DatasetSuite.scala  |  5 +++
 23 files changed, 206 insertions(+), 19 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ebb4858 -> 43f6b4a)

2021-06-09 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ebb4858  [SPARK-35058][SQL] Group exception messages in hive/client
 add 43f6b4a  [SPARK-35674][SQL][TESTS] Test timestamp without time zone in 
UDF

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/sql/UDFSuite.scala | 28 ++
 1 file changed, 28 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (43f6b4a -> 0b5683a)

2021-06-09 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 43f6b4a  [SPARK-35674][SQL][TESTS] Test timestamp without time zone in 
UDF
 add 0b5683a  [SPARK-35694][INFRA] Increase the default JVM stack size of 
SBT/Maven

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 2 +-
 build/sbt| 2 +-
 pom.xml  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88f1d82 -> 4180692)

2021-06-10 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88f1d82  [SPARK-34524][SQL][FOLLOWUP] Remove unused 
checkAlterTablePartition in CheckAnalysis.scala
 add 4180692  [SPARK-35711][SQL] Support casting of timestamp without time 
zone to timestamp type

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/Cast.scala  |  5 
 .../spark/sql/catalyst/expressions/CastSuite.scala | 32 ++
 2 files changed, 26 insertions(+), 11 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (79362c4 -> 05e2b76)

2021-06-17 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 79362c4  [SPARK-34898][CORE] We should log 
SparkListenerExecutorMetricsUpdateEvent   of `driver` appropriately when  
`spark.eventLog.logStageExecutorMetrics` is true
 add 05e2b76  [SPARK-35720][SQL] Support casting of String to timestamp 
without time zone type

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/Cast.scala  |  28 +
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 123 ++---
 .../catalyst/expressions/AnsiCastSuiteBase.scala   |  13 +++
 .../spark/sql/catalyst/expressions/CastSuite.scala |  11 ++
 .../sql/catalyst/expressions/CastSuiteBase.scala   |  20 
 5 files changed, 180 insertions(+), 15 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2f537a8 -> 2c4598d)

2021-06-18 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2f537a8  [SPARK-35469][PYTHON] Fix disallow_untyped_defs mypy checks
 add 2c4598d  [SPARK-35608][SQL] Support AQE optimizer side 
transformUpWithPruning

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/trees/TreePatterns.scala |  1 +
 .../sql/execution/adaptive/AQEPropagateEmptyRelation.scala | 10 --
 .../spark/sql/execution/adaptive/LogicalQueryStage.scala   |  2 ++
 3 files changed, 11 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2c91672 -> a100a01)

2021-06-21 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2c91672  [SPARK-35775][SQL][TESTS] Check all year-month interval types 
in aggregate expressions
 add a100a01  [SPARK-35842][INFRA] Ignore all .idea folders

No new revisions were added by this update.

Summary of changes:
 .gitignore | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a87ee5d -> 960a7e5)

2021-06-22 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a87ee5d  [SPARK-35695][SQL][FOLLOWUP] Use AQE helper to simplify the 
code in CollectMetricsExec
 add 960a7e5  [SPARK-35856][SQL][TESTS] Move new interval type test cases 
from CastSuite to CastBaseSuite

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/AnsiCastSuiteBase.scala   |   8 ++
 .../spark/sql/catalyst/expressions/CastSuite.scala | 124 +--
 .../sql/catalyst/expressions/CastSuiteBase.scala   | 133 -
 .../sql/catalyst/expressions/TryCastSuite.scala|   2 +-
 4 files changed, 142 insertions(+), 125 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35845][SQL] OuterReference resolution should reject ambiguous column names

2021-06-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 20edfdd  [SPARK-35845][SQL] OuterReference resolution should reject 
ambiguous column names
20edfdd is described below

commit 20edfdd39a83c52813f91e4028f816d06a6be99e
Author: Wenchen Fan 
AuthorDate: Wed Jun 23 14:32:34 2021 +0800

[SPARK-35845][SQL] OuterReference resolution should reject ambiguous column 
names

### What changes were proposed in this pull request?

The current OuterReference resolution is a bit weird: when the outer plan 
has more than one child, it resolves OuterReference from the output of each 
child, one by one, left to right.

This is incorrect in the case of join, as the column name can be ambiguous 
if both left and right sides output this column.

This PR fixes this bug by resolving OuterReference with 
`outerPlan.resolveChildren`, instead of something like 
`outerPlan.children.foreach(_.resolve(...))`

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

The problem only occurs in join, and join condition doesn't support 
correlated subquery yet. So this PR only improves the error message. Before 
this PR, people see
```
java.lang.UnsupportedOperationException
Cannot generate code for expression: outer(t1a#291)
```

### How was this patch tested?

a new test

Closes #33004 from cloud-fan/outer-ref.

Authored-by: Wenchen Fan 
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 35 +++---
 .../catalyst/optimizer/DecorrelateInnerQuery.scala | 10 ++-
 .../spark/sql/catalyst/optimizer/subquery.scala| 26 
 .../optimizer/DecorrelateInnerQuerySuite.scala |  6 ++--
 .../negative-cases/invalid-correlation.sql |  9 ++
 .../negative-cases/invalid-correlation.sql.out | 24 ++-
 6 files changed, 68 insertions(+), 42 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 555be01..ba680ba 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -2285,8 +2285,8 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 }
 
 /**
- * Resolve the correlated expressions in a subquery by using the an outer 
plans' references. All
- * resolved outer references are wrapped in an [[OuterReference]]
+ * Resolve the correlated expressions in a subquery, as if the expressions 
live in the outer
+ * plan. All resolved outer references are wrapped in an [[OuterReference]]
  */
 private def resolveOuterReferences(plan: LogicalPlan, outer: LogicalPlan): 
LogicalPlan = {
   
plan.resolveOperatorsDownWithPruning(_.containsPattern(UNRESOLVED_ATTRIBUTE)) {
@@ -2295,7 +2295,7 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 case u @ UnresolvedAttribute(nameParts) =>
   withPosition(u) {
 try {
-  outer.resolve(nameParts, resolver) match {
+  outer.resolveChildren(nameParts, resolver) match {
 case Some(outerAttr) => wrapOuterReference(outerAttr)
 case None => u
   }
@@ -2317,7 +2317,7 @@ class Analyzer(override val catalogManager: 
CatalogManager)
  */
 private def resolveSubQuery(
 e: SubqueryExpression,
-plans: Seq[LogicalPlan])(
+outer: LogicalPlan)(
 f: (LogicalPlan, Seq[Expression]) => SubqueryExpression): 
SubqueryExpression = {
   // Step 1: Resolve the outer expressions.
   var previous: LogicalPlan = null
@@ -2328,10 +2328,8 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 current = executeSameContext(current)
 
 // Use the outer references to resolve the subquery plan if it isn't 
resolved yet.
-val i = plans.iterator
-val afterResolve = current
-while (!current.resolved && current.fastEquals(afterResolve) && 
i.hasNext) {
-  current = resolveOuterReferences(current, i.next())
+if (!current.resolved) {
+  current = resolveOuterReferences(current, outer)
 }
   } while (!current.resolved && !current.fastEquals(previous))
 
@@ -2354,20 +2352,20 @@ class Analyzer(override val catalogManager: 
CatalogManager)
  * (2) Any aggregate expression(s) that reference outer attributes are 
pushed down to
  * outer plan to get evaluated.
  */
-private def

[spark] branch master updated (758b423 -> 6f51e37)

2021-06-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 758b423  [SPARK-35860][SQL] Support UpCast between different field of 
YearMonthIntervalType/DayTimeIntervalType
 add 6f51e37  [SPARK-35857][SQL] The ANSI flag of Cast should be kept after 
being copied

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala |  2 +-
 .../spark/sql/catalyst/analysis/StreamingJoinHelper.scala |  2 +-
 .../org/apache/spark/sql/catalyst/expressions/Cast.scala  | 11 ---
 .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala |  8 
 .../org/apache/spark/sql/catalyst/optimizer/expressions.scala | 10 +-
 .../sql/catalyst/plans/logical/QueryPlanConstraints.scala |  4 ++--
 sql/core/src/main/scala/org/apache/spark/sql/Column.scala |  2 +-
 .../apache/spark/sql/execution/SubqueryBroadcastExec.scala|  2 +-
 .../sql/execution/analysis/DetectAmbiguousSelfJoin.scala  |  2 +-
 .../scala/org/apache/spark/sql/hive/client/HiveShim.scala |  2 +-
 10 files changed, 25 insertions(+), 20 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35817][SQL] Restore performance of queries against wide Avro tables

2021-06-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 66d5a00  [SPARK-35817][SQL] Restore performance of queries against 
wide Avro tables
66d5a00 is described below

commit 66d5a0049a638cec7c70566ea880897651aa95f1
Author: Bruce Robbins 
AuthorDate: Wed Jun 23 22:36:56 2021 +0800

[SPARK-35817][SQL] Restore performance of queries against wide Avro tables

### What changes were proposed in this pull request?

When creating a record writer in an AvroDeserializer, or creating a struct 
converter in an AvroSerializer, look up Avro fields using a map rather than 
scanning the entire list of Avro fields.

### Why are the changes needed?

A query against an Avro table can be quite slow when all are true:

* There are many columns in the Avro file
* The query contains a wide projection
* There are many splits in the input
* Some of the splits are read serially (e.g., less executors than there are 
tasks)

A write to an Avro table can be quite slow when all are true:

* There are many columns in the new rows
* The operation is creating many files

For example, a single-threaded query against a 6000 column Avro data set 
with 50K rows and 20 files takes less than a minute with Spark 3.0.1 but over 7 
minutes with Spark 3.2.0-SNAPSHOT.

This PR restores the faster time.

For the 1000 column read benchmark:
Before patch: 108447 ms
After patch: 35925 ms
percent improvement: 66%

For the 1000 column write benchmark:
Before patch: 123307
After patch: 42313
percent improvement: 65%

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

* Ran existing unit tests
* Added new unit tests
* Added new benchmarks

Closes #32969 from bersprockets/SPARK-35817.

Authored-by: Bruce Robbins 
Signed-off-by: Gengliang Wang 
---
 .../avro/benchmarks/AvroReadBenchmark-results.txt  | 115 +++--
 .../avro/benchmarks/AvroWriteBenchmark-results.txt |  20 ++--
 .../apache/spark/sql/avro/AvroDeserializer.scala   |   3 +-
 .../org/apache/spark/sql/avro/AvroSerializer.scala |   4 +-
 .../org/apache/spark/sql/avro/AvroUtils.scala  |  47 +
 .../spark/sql/avro/AvroSchemaHelperSuite.scala |  67 
 .../execution/benchmark/AvroReadBenchmark.scala|  31 ++
 .../execution/benchmark/AvroWriteBenchmark.scala   |  32 ++
 8 files changed, 239 insertions(+), 80 deletions(-)

diff --git a/external/avro/benchmarks/AvroReadBenchmark-results.txt 
b/external/avro/benchmarks/AvroReadBenchmark-results.txt
index f77db2d..5483cf6 100644
--- a/external/avro/benchmarks/AvroReadBenchmark-results.txt
+++ b/external/avro/benchmarks/AvroReadBenchmark-results.txt
@@ -2,129 +2,140 @@
 SQL Single Numeric Column Scan
 

 
-OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 4.18.0-193.6.3.el8_2.x86_64
+Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
 SQL Single TINYINT Column Scan:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Sum2802   2826 
 34  5.6 178.1   1.0X
+Sum2648   2658 
 15  5.9 168.3   1.0X
 
-OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 4.18.0-193.6.3.el8_2.x86_64
+Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
 SQL Single SMALLINT Column Scan:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Sum2786   2810 
 35  5.6 177.1   1.0X
+Sum2584   2624 
 56  6.1 164.3   1.0X
 
-OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 4.18.0-193.6.3.el8_2.x86_64
+Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
 SQL Single INT Column Scan:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

[spark] branch master updated: [SPARK-35831][YARN][TEST-MAVEN] Handle PathOperationException in copyFileToRemote on the same src and dest

2021-06-21 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2b9902d  [SPARK-35831][YARN][TEST-MAVEN] Handle PathOperationException 
in copyFileToRemote on the same src and dest
2b9902d is described below

commit 2b9902d26a5b7e3aeecfed3aa21744d1d2016d26
Author: Dongjoon Hyun 
AuthorDate: Mon Jun 21 23:28:27 2021 +0800

[SPARK-35831][YARN][TEST-MAVEN] Handle PathOperationException in 
copyFileToRemote on the same src and dest

### What changes were proposed in this pull request?

This PR aims to be more robust on the underlying Hadoop library changes. 
Apache Spark's `copyFileToRemote` has an option, `force`, to invoke copying 
always and it can hit `org.apache.hadoop.fs.PathOperationException` in some 
Hadoop versions.

From Apache Hadoop 3.3.1, we reverted 
[HADOOP-16878](https://issues.apache.org/jira/browse/HADOOP-16878) as the last 
revert commit on `branch-3.3.1`. However, it's still in Apache Hadoop 3.4.0.
- 
https://github.com/apache/hadoop/commit/a3b9c37a397ad4188041dd80621bdeefc46885f2

### Why are the changes needed?

Currently, Apache Spark Jenkins hits a flakiness issue.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2/lastCompletedBuild/testReport/org.apache.spark.deploy.yarn/ClientSuite/distribute_jars_archive/history/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/2459/testReport/junit/org.apache.spark.deploy.yarn/ClientSuite/distribute_jars_archive/

```
org.apache.hadoop.fs.PathOperationException:
`Source 
(file:/home/jenkins/workspace/spark-master-test-maven-hadoop-3.2/resource-managers/yarn/target/tmp/spark-703b8e99-63cc-4ba6-a9bc-25c7cae8f5f9/testJar9120517778809167117.jar)
 and destination 
(/home/jenkins/workspace/spark-master-test-maven-hadoop-3.2/resource-managers/yarn/target/tmp/spark-703b8e99-63cc-4ba6-a9bc-25c7cae8f5f9/testJar9120517778809167117.jar)
are equal in the copy command.': Operation not supported
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:403)
```

Apache Spark has three cases.
- `!compareFs(srcFs, destFs)`: This is safe because we will not have this 
exception.
- `"file".equals(srcFs.getScheme)`: This is safe because this cannot be a 
`false` alarm.
- `force=true`:
- For the `good` alarm part, Spark works in the same way.
- For the `false` alarm part, Spark is safe because we use `force = 
true` only for copying `localConfArchive` instead of a general copy between two 
random clusters.

```scala
val localConfArchive = new Path(createConfArchive(confsToOverride).toURI())
copyFileToRemote(destDir, localConfArchive, replication, symlinkCache, 
force = true,
destName = Some(LOCALIZED_CONF_ARCHIVE))
```

### Does this PR introduce _any_ user-facing change?

No. This preserves the previous Apache Spark behavior.

### How was this patch tested?

Pass the Jenkins with Maven.

Closes #32983 from dongjoon-hyun/SPARK-35831.

Authored-by: Dongjoon Hyun 
Signed-off-by: Gengliang Wang 
---
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 427202f..364bc3b 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -401,7 +401,13 @@ private[spark] class Client(
 if (force || !compareFs(srcFs, destFs) || "file".equals(srcFs.getScheme)) {
   destPath = new Path(destDir, destName.getOrElse(srcPath.getName()))
   logInfo(s"Uploading resource $srcPath -> $destPath")
-  FileUtil.copy(srcFs, srcPath, destFs, destPath, false, hadoopConf)
+  try {
+FileUtil.copy(srcFs, srcPath, destFs, destPath, false, hadoopConf)
+  } catch {
+// HADOOP-16878 changes the behavior to throw exceptions when src 
equals to dest
+case e: PathOperationException
+if 
srcFs.makeQualified(srcPath).equals(destFs.makeQualified(destPath)) =>
+  }
   destFs.setReplication(destPath, replication)
   destFs.setPermission(destPath, new FsPermission(APP_FILE_PERMISSION))
 } else {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6ca56b0 -> 2bdd9fe)

2021-06-21 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6ca56b0  [SPARK-35614][PYTHON] Make the conversion to pandas 
data-type-based for ExtensionDtypes
 add 2bdd9fe  [SPARK-35839][SQL] New SQL function: to_timestamp_ntz

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |   1 +
 .../catalyst/expressions/datetimeExpressions.scala |  99 ++-
 .../catalyst/util/DateTimeFormatterHelper.scala|   2 +-
 .../sql/catalyst/util/TimestampFormatter.scala |  55 +++-
 .../expressions/DateExpressionsSuite.scala |  29 +-
 .../apache/spark/sql/execution/HiveResult.scala|   3 +-
 .../sql-functions/sql-expression-schema.md |   5 +-
 .../test/resources/sql-tests/inputs/datetime.sql   |  46 +++
 .../sql-tests/results/ansi/datetime.sql.out| 325 -
 .../sql-tests/results/datetime-legacy.sql.out  | 317 +++-
 .../resources/sql-tests/results/datetime.sql.out   | 317 +++-
 .../SparkExecuteStatementOperation.scala   |   3 +-
 12 files changed, 1187 insertions(+), 15 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bc61b62 -> ce53b71)

2021-06-22 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc61b62  [SPARK-35727][SQL] Return INTERVAL DAY from dates subtraction
 add ce53b71  [SPARK-35854][SQL] Improve the error message of 
to_timestamp_ntz with invalid format pattern

No new revisions were added by this update.

Summary of changes:
 .../catalyst/util/DateTimeFormatterHelper.scala|  7 +-
 .../sql/catalyst/util/TimestampFormatter.scala | 29 --
 .../spark/sql/errors/QueryExecutionErrors.scala| 10 ++--
 .../sql-tests/results/ansi/datetime.sql.out| 12 -
 .../sql-tests/results/datetime-legacy.sql.out  | 12 -
 .../resources/sql-tests/results/datetime.sql.out   | 12 -
 6 files changed, 53 insertions(+), 29 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated (8bcc6a4 -> 4ad6001)

2021-06-24 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8bcc6a4  [SPARK-35885][K8S][R] Use keyserver.ubuntu.com as a keyserver 
for CRAN
 add 4ad6001  [SPARK-35817][SQL][3.1] Restore performance of queries 
against wide Avro tables

No new revisions were added by this update.

Summary of changes:
 .../avro/benchmarks/AvroReadBenchmark-results.txt  | 115 +++--
 .../avro/benchmarks/AvroWriteBenchmark-results.txt |  20 ++--
 .../apache/spark/sql/avro/AvroDeserializer.scala   |   3 +-
 .../org/apache/spark/sql/avro/AvroSerializer.scala |   3 +-
 .../org/apache/spark/sql/avro/AvroUtils.scala  |  50 +
 .../spark/sql/avro/AvroSchemaHelperSuite.scala |  67 
 .../execution/benchmark/AvroReadBenchmark.scala|  31 ++
 .../execution/benchmark/AvroWriteBenchmark.scala   |  32 ++
 8 files changed, 241 insertions(+), 80 deletions(-)
 create mode 100644 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSchemaHelperSuite.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35889][SQL] Support adding TimestampWithoutTZ with Interval types

2021-06-25 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9814cf8  [SPARK-35889][SQL] Support adding TimestampWithoutTZ with 
Interval types
9814cf8 is described below

commit 9814cf88533c049036cee5f6d62346f237dcec19
Author: Gengliang Wang 
AuthorDate: Fri Jun 25 19:58:42 2021 +0800

[SPARK-35889][SQL] Support adding TimestampWithoutTZ with Interval types

### What changes were proposed in this pull request?

Supprot the following operations:

- TimestampWithoutTZ + Calendar interval
- TimestampWithoutTZ + Year-Month interval
- TimestampWithoutTZ + Daytime interval

### Why are the changes needed?

Support basic '+' operator for timestamp without time zone type.

### Does this PR introduce _any_ user-facing change?

No, the timestamp without time zone type is not release yet.

### How was this patch tested?

Unit tests

Closes #33076 from gengliangwang/addForNewTS.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala |   6 +-
 .../catalyst/expressions/datetimeExpressions.scala |  28 ++-
 .../apache/spark/sql/types/AbstractDataType.scala  |   8 +
 .../expressions/DateExpressionsSuite.scala | 245 +++--
 .../test/resources/sql-tests/inputs/datetime.sql   |  11 +
 .../sql-tests/results/ansi/datetime.sql.out|  76 ++-
 .../sql-tests/results/datetime-legacy.sql.out  |  76 ++-
 .../resources/sql-tests/results/datetime.sql.out   |  76 ++-
 .../typeCoercion/native/dateTimeOperations.sql.out |  54 ++---
 9 files changed, 424 insertions(+), 156 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 0a3bd09..6737ed5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -357,8 +357,10 @@ class Analyzer(override val catalogManager: CatalogManager)
   case (_: DayTimeIntervalType, DateType) => TimeAdd(Cast(r, 
TimestampType), l)
   case (DateType, _: YearMonthIntervalType) => DateAddYMInterval(l, r)
   case (_: YearMonthIntervalType, DateType) => DateAddYMInterval(r, l)
-  case (TimestampType, _: YearMonthIntervalType) => 
TimestampAddYMInterval(l, r)
-  case (_: YearMonthIntervalType, TimestampType) => 
TimestampAddYMInterval(r, l)
+  case (TimestampType | TimestampWithoutTZType, _: 
YearMonthIntervalType) =>
+TimestampAddYMInterval(l, r)
+  case (_: YearMonthIntervalType, TimestampType | 
TimestampWithoutTZType) =>
+TimestampAddYMInterval(r, l)
   case (CalendarIntervalType, CalendarIntervalType) |
(_: DayTimeIntervalType, _: DayTimeIntervalType) => a
   case (DateType, CalendarIntervalType) => DateAddInterval(l, r, 
ansiEnabled = f)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 63f6c03..d84b6eb 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -59,6 +59,11 @@ trait TimeZoneAwareExpression extends Expression {
   def withTimeZone(timeZoneId: String): TimeZoneAwareExpression
 
   @transient lazy val zoneId: ZoneId = DateTimeUtils.getZoneId(timeZoneId.get)
+
+  def zoneIdForType(dataType: DataType): ZoneId = dataType match {
+case _: TimestampWithoutTZType => java.time.ZoneOffset.UTC
+case _ => zoneId
+  }
 }
 
 trait TimestampFormatterHelper extends TimeZoneAwareExpression {
@@ -1446,23 +1451,25 @@ case class TimeAdd(start: Expression, interval: 
Expression, timeZoneId: Option[S
   override def toString: String = s"$left + $right"
   override def sql: String = s"${left.sql} + ${right.sql}"
   override def inputTypes: Seq[AbstractDataType] =
-Seq(TimestampType, TypeCollection(CalendarIntervalType, 
DayTimeIntervalType))
+Seq(TypeCollection.AllTimestampTypes, TypeCollection(CalendarIntervalType, 
DayTimeIntervalType))
 
-  override def dataType: DataType = TimestampType
+  override def dataType: DataType = start.dataType
 
   override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
 copy(timeZoneId = Option(timeZoneId))
 
+  @transient private lazy val zoneIdInEval: ZoneId = 
zoneIdForType(left.dataType)
+
   over

[spark] branch master updated (f49bf1a -> 74b3df8)

2021-06-09 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f49bf1a  [SPARK-34382][SQL] Support LATERAL subqueries
 add 74b3df8  [SPARK-35698][SQL] Support casting of timestamp without time 
zone to strings

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/expressions/Cast.scala   | 11 ++-
 .../spark/sql/catalyst/expressions/CastSuite.scala | 18 +-
 2 files changed, 23 insertions(+), 6 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b74260f -> c382d40)

2021-06-15 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b74260f  [SPARK-35765][SQL] Distinct aggs are not duplicate sensitive
 add c382d40  [SPARK-35766][SQL][TESTS] Break down CastSuite/AnsiCastSuite 
into multiple files

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/AnsiCastSuiteBase.scala   |  481 +++
 .../spark/sql/catalyst/expressions/CastSuite.scala | 1357 +---
 .../sql/catalyst/expressions/CastSuiteBase.scala   |  930 ++
 3 files changed, 1412 insertions(+), 1356 deletions(-)
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AnsiCastSuiteBase.scala
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6c5fcac -> 02c99f1)

2021-05-13 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6c5fcac  [SPARK-35373][BUILD] Check Maven artifact checksum in 
build/mvn
 add 02c99f1  [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |   2 +
 .../spark/sql/catalyst/expressions/TryEval.scala   | 110 +
 ...deterministicSuite.scala => TryEvalSuite.scala} |  32 --
 .../sql-functions/sql-expression-schema.md |   4 +-
 .../resources/sql-tests/inputs/try_arithmetic.sql  |  11 +++
 .../sql-tests/results/try_arithmetic.sql.out   |  66 +
 6 files changed, 215 insertions(+), 10 deletions(-)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala
 copy 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/{NondeterministicSuite.scala
 => TryEvalSuite.scala} (56%)
 create mode 100644 
sql/core/src/test/resources/sql-tests/inputs/try_arithmetic.sql
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c4ca232 -> 7c9a9ec)

2021-05-11 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c4ca232  [SPARK-35363][SQL] Refactor sort merge join code-gen be 
agnostic to join type
 add 7c9a9ec  [SPARK-35146][SQL] Migrate to transformWithPruning or 
resolveWithPruning for rules in finishAnalysis.scala

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/expressions/Expression.scala|  3 +++
 .../spark/sql/catalyst/expressions/aggregate/CountIf.scala|  3 +++
 .../sql/catalyst/expressions/aggregate/UnevaluableAggs.scala  |  3 +++
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala  |  5 +
 .../org/apache/spark/sql/catalyst/expressions/misc.scala  |  3 +++
 .../apache/spark/sql/catalyst/optimizer/finishAnalysis.scala  | 11 +++
 .../org/apache/spark/sql/catalyst/trees/TreePatterns.scala|  4 
 7 files changed, 28 insertions(+), 4 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala

2021-05-12 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d92018e  [SPARK-35298][SQL] Migrate to transformWithPruning for rules 
in Optimizer.scala
d92018e is described below

commit d92018ee358b0009dac626e2c5568db8363f53ee
Author: Yingyi Bu 
AuthorDate: Wed May 12 20:42:47 2021 +0800

[SPARK-35298][SQL] Migrate to transformWithPruning for rules in 
Optimizer.scala

### What changes were proposed in this pull request?

Added the following TreePattern enums:
- ALIAS
- AND_OR
- AVERAGE
- GENERATE
- INTERSECT
- SORT
- SUM
- DISTINCT_LIKE
- PROJECT
- REPARTITION_OPERATION
- UNION

Added tree traversal pruning to the following rules in Optimizer.scala:
- EliminateAggregateFilter
- RemoveRedundantAggregates
- RemoveNoopOperators
- RemoveNoopUnion
- LimitPushDown
- ColumnPruning
- CollapseRepartition
- OptimizeRepartition
- OptimizeWindowFunctions
- CollapseWindow
- TransposeWindow
- InferFiltersFromGenerate
- InferFiltersFromConstraints
- CombineUnions
- CombineFilters
- EliminateSorts
- PruneFilters
- EliminateLimits
- DecimalAggregates
- ConvertToLocalRelation
- ReplaceDistinctWithAggregate
- ReplaceIntersectWithSemiJoin
- ReplaceExceptWithAntiJoin
- RewriteExceptAll
- RewriteIntersectAll
- RemoveLiteralFromGroupExpressions
- RemoveRepetitionFromGroupExpressions
- OptimizeLimitZero

### Why are the changes needed?

Reduce the number of tree traversals and hence improve the query 
compilation latency.

perf diff:
Rule name | Total Time (baseline) | Total Time (experiment) | 
experiment/baseline
RemoveRedundantAggregates | 51290766 | 67070477 | 1.31
RemoveNoopOperators | 192371141 | 196631275 | 1.02
RemoveNoopUnion | 49222561 | 43266681 | 0.88
LimitPushDown | 40885185 | 21672646 | 0.53
ColumnPruning | 2003406120 | 1285562149 | 0.64
CollapseRepartition | 40648048 | 72646515 | 1.79
OptimizeRepartition | 37813850 | 20600803 | 0.54
OptimizeWindowFunctions | 174426904 | 46741409 | 0.27
CollapseWindow | 38959957 | 24542426 | 0.63
TransposeWindow | 33533191 | 20414930 | 0.61
InferFiltersFromGenerate | 21758688 | 15597344 | 0.72
InferFiltersFromConstraints | 518009794 | 493282321 | 0.95
CombineUnions | 67694022 | 70550382 | 1.04
CombineFilters | 35265060 | 29005424 | 0.82
EliminateSorts | 57025509 | 19795776 | 0.35
PruneFilters | 433964815 | 465579200 | 1.07
EliminateLimits | 44275393 | 24476859 | 0.55
DecimalAggregates | 83143172 | 28816090 | 0.35
ReplaceDistinctWithAggregate | 21783760 | 18287489 | 0.84
ReplaceIntersectWithSemiJoin | 22311271 | 16566393 | 0.74
ReplaceExceptWithAntiJoin | 23838520 | 16588808 | 0.70
RewriteExceptAll | 32750296 | 29421957 | 0.90
RewriteIntersectAll | 29760454 | 21243599 | 0.71
RemoveLiteralFromGroupExpressions | 28151861 | 25270947 | 0.90
RemoveRepetitionFromGroupExpressions | 29587030 | 23447041 | 0.79
OptimizeLimitZero | 18081943 | 15597344 | 0.86
**Accumulated | 4129959311 | 3112676285 | 0.75**

### How was this patch tested?

Existing tests.

Closes #32439 from sigmod/optimizer.

Authored-by: Yingyi Bu 
Signed-off-by: Gengliang Wang 
---
 .../catalyst/expressions/aggregate/Average.scala   |   3 +
 .../sql/catalyst/expressions/aggregate/Sum.scala   |   3 +
 .../catalyst/expressions/namedExpressions.scala|   2 +
 .../spark/sql/catalyst/optimizer/Optimizer.scala   | 113 ++---
 .../plans/logical/basicLogicalOperators.scala  |  10 ++
 .../sql/catalyst/rules/RuleIdCollection.scala  |  24 +
 .../spark/sql/catalyst/trees/TreePatterns.scala|  11 +-
 7 files changed, 128 insertions(+), 38 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
index 8ae24e5..82ad2df 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions.aggregate
 import org.apache.spark.sql.catalyst.analysis.{DecimalPrecision, 
FunctionRegistry, TypeCheckResult}
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.trees.TreePattern.{AVERAGE, TreePattern}
 import org.apache.spark.sql.catalyst.trees.UnaryLike
 import org.apache.spark.sql.catalyst.util.TypeUtils
 import

[spark] branch master updated: [SPARK-35144][SQL] Migrate to transformWithPruning for object rules

2021-05-07 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 72d3266  [SPARK-35144][SQL] Migrate to transformWithPruning for object 
rules
72d3266 is described below

commit 72d32662d470e286a639783fed8dcf6c3948
Author: Yingyi Bu 
AuthorDate: Fri May 7 18:36:28 2021 +0800

[SPARK-35144][SQL] Migrate to transformWithPruning for object rules

### What changes were proposed in this pull request?

Added the following TreePattern enums:
- APPEND_COLUMNS
- DESERIALIZE_TO_OBJECT
- LAMBDA_VARIABLE
- MAP_OBJECTS
- SERIALIZE_FROM_OBJECT
- PROJECT
- TYPED_FILTER

Added tree traversal pruning to the following rules dealing with objects:
- EliminateSerialization
- CombineTypedFilters
- EliminateMapObjects
- ObjectSerializerPruning

### Why are the changes needed?

Reduce the number of tree traversals and hence improve the query 
compilation latency.

### How was this patch tested?

Existing tests.

Closes #32451 from sigmod/object.

Authored-by: Yingyi Bu 
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/catalyst/expressions/objects/objects.scala  |  6 +-
 .../org/apache/spark/sql/catalyst/optimizer/objects.scala | 15 ++-
 .../catalyst/plans/logical/basicLogicalOperators.scala|  2 ++
 .../apache/spark/sql/catalyst/plans/logical/object.scala  |  8 
 .../spark/sql/catalyst/rules/RuleIdCollection.scala   |  5 +
 .../apache/spark/sql/catalyst/trees/TreePatterns.scala|  7 +++
 6 files changed, 37 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
index 469c895..40378a3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
@@ -33,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.expressions.codegen.Block._
 import org.apache.spark.sql.catalyst.trees.TernaryLike
-import org.apache.spark.sql.catalyst.trees.TreePattern.{NULL_CHECK, 
TreePattern}
+import org.apache.spark.sql.catalyst.trees.TreePattern._
 import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, ArrayData, 
GenericArrayData, MapData}
 import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.sql.types._
@@ -669,6 +669,8 @@ case class LambdaVariable(
 
   private val accessor: (InternalRow, Int) => Any = 
InternalRow.getAccessor(dataType, nullable)
 
+  final override val nodePatterns: Seq[TreePattern] = Seq(LAMBDA_VARIABLE)
+
   // Interpreted execution of `LambdaVariable` always get the 0-index element 
from input row.
   override def eval(input: InternalRow): Any = {
 assert(input.numFields == 1,
@@ -781,6 +783,8 @@ case class MapObjects private(
   override def second: Expression = lambdaFunction
   override def third: Expression = inputData
 
+  final override val nodePatterns: Seq[TreePattern] = Seq(MAP_OBJECTS)
+
   // The data with UserDefinedType are actually stored with the data type of 
its sqlType.
   // When we want to apply MapObjects on it, we have to use it.
   lazy private val inputDataType = inputData.dataType match {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala
index 97712a0..52544ff 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala
@@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.objects._
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules._
+import org.apache.spark.sql.catalyst.trees.TreePattern._
 import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructType, 
UserDefinedType}
 
 /*
@@ -35,7 +36,8 @@ import org.apache.spark.sql.types.{ArrayType, DataType, 
MapType, StructType, Use
  * representation of data item.  For example back to back map operations.
  */
 object EliminateSerialization extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
+_.containsAnyPattern(DESERIALIZE_TO_OBJECT, APPEND_COLUMNS, TYPED_FILTER), 
ruleId) {
 case d @ DeserializeToObject(_, _, s: SerializeFromObj

[spark] branch master updated (7182f8c -> d2a535f)

2021-05-10 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7182f8c  [SPARK-35360][SQL] RepairTableCommand respects 
`spark.sql.addPartitionInBatch.size` too
 add d2a535f  [SPARK-34246][FOLLOWUP] Change the definition of 
`findTightestCommonType` for backward compatibility

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/AnsiTypeCoercion.scala   | 53 ++
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |  6 +--
 2 files changed, 27 insertions(+), 32 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4fe4b65 -> 7970318)

2021-05-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4fe4b65  [SPARK-35315][TESTS] Keep benchmark result consistent between 
spark-submit and SBT
 add 7970318  [SPARK-35155][SQL] Add rule id pruning to Analyzer rules

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala | 105 +
 .../catalyst/analysis/DeduplicateRelations.scala   |   4 +-
 .../spark/sql/catalyst/analysis/ResolveHints.scala |   8 +-
 .../catalyst/analysis/ResolveInlineTables.scala|   4 +-
 .../spark/sql/catalyst/analysis/ResolveUnion.scala |   4 +-
 .../analysis/SubstituteUnresolvedOrdinals.scala|   4 +-
 .../catalyst/analysis/higherOrderFunctions.scala   |   3 +-
 .../sql/catalyst/analysis/timeZoneAnalysis.scala   |   3 +-
 .../sql/catalyst/optimizer/UpdateFields.scala  |   4 +-
 .../sql/catalyst/rules/RuleIdCollection.scala  |  41 
 10 files changed, 133 insertions(+), 47 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7c9a9ec -> 2b6640a)

2021-05-11 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7c9a9ec  [SPARK-35146][SQL] Migrate to transformWithPruning or 
resolveWithPruning for rules in finishAnalysis.scala
 add 2b6640a  [SPARK-35229][WEBUI] Limit the maximum number of items on the 
timeline view

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/internal/config/UI.scala  | 15 +
 .../org/apache/spark/ui/jobs/AllJobsPage.scala | 39 --
 .../scala/org/apache/spark/ui/jobs/JobPage.scala   | 39 --
 .../scala/org/apache/spark/ui/jobs/JobsTab.scala   |  1 +
 .../scala/org/apache/spark/ui/jobs/StagePage.scala |  3 +-
 docs/configuration.md  | 32 ++
 6 files changed, 121 insertions(+), 8 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2e9936d -> e1296ea)

2021-05-20 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2e9936d  [SPARK-35456][CORE] Print the invalid value in config 
validation error message
 add e1296ea  [SPARK-35445][SQL] Reduce the execution time of 
DeduplicateRelations

No new revisions were added by this update.

Summary of changes:
 .../catalyst/analysis/DeduplicateRelations.scala   | 88 ++
 1 file changed, 56 insertions(+), 32 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-35514][INFRA] Automatically update version index of DocSearch via release-tag.sh

2021-05-25 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new ac4d95e  [SPARK-35514][INFRA] Automatically update version index of 
DocSearch via release-tag.sh
ac4d95e is described below

commit ac4d95e465c28cc42c0c3f9adba42457ce763f51
Author: Gengliang Wang 
AuthorDate: Wed May 26 00:30:44 2021 +0800

[SPARK-35514][INFRA] Automatically update version index of DocSearch via 
release-tag.sh

### What changes were proposed in this pull request?

Automatically update version index of DocSearch via release-tag.sh for 
releasing new documentation site, instead of the current manual update.

### Why are the changes needed?

Simplify the release process.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually run the following command and check the diff
```
R_NEXT_VERSION=3.2.0
sed -i".tmp8" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml
```

Closes #32662 from gengliangwang/updateDocsearchInRelease.
    
Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 321c6545b38976b8b051ac1e80666f96922d5950)
Signed-off-by: Gengliang Wang 
---
 dev/create-release/release-tag.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dev/create-release/release-tag.sh 
b/dev/create-release/release-tag.sh
index a9a518f..4be1f9a 100755
--- a/dev/create-release/release-tag.sh
+++ b/dev/create-release/release-tag.sh
@@ -106,6 +106,8 @@ sed -i".tmp5" 's/__version__ = .*$/__version__ = 
"'"$R_NEXT_VERSION.dev0"'"/' py
 sed -i".tmp6" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' 
docs/_config.yml
 # Use R version for short version
 sed -i".tmp7" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$R_NEXT_VERSION"'/g' docs/_config.yml
+# Update the version index of DocSearch as the short version
+sed -i".tmp8" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml
 
 git commit -a -m "Preparing development version $NEXT_VERSION"
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35514][INFRA] Automatically update version index of DocSearch via release-tag.sh

2021-05-25 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 321c654  [SPARK-35514][INFRA] Automatically update version index of 
DocSearch via release-tag.sh
321c654 is described below

commit 321c6545b38976b8b051ac1e80666f96922d5950
Author: Gengliang Wang 
AuthorDate: Wed May 26 00:30:44 2021 +0800

[SPARK-35514][INFRA] Automatically update version index of DocSearch via 
release-tag.sh

### What changes were proposed in this pull request?

Automatically update version index of DocSearch via release-tag.sh for 
releasing new documentation site, instead of the current manual update.

### Why are the changes needed?

Simplify the release process.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually run the following command and check the diff
```
R_NEXT_VERSION=3.2.0
sed -i".tmp8" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml
```

Closes #32662 from gengliangwang/updateDocsearchInRelease.
    
Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 dev/create-release/release-tag.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dev/create-release/release-tag.sh 
b/dev/create-release/release-tag.sh
index a9a518f..4be1f9a 100755
--- a/dev/create-release/release-tag.sh
+++ b/dev/create-release/release-tag.sh
@@ -106,6 +106,8 @@ sed -i".tmp5" 's/__version__ = .*$/__version__ = 
"'"$R_NEXT_VERSION.dev0"'"/' py
 sed -i".tmp6" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' 
docs/_config.yml
 # Use R version for short version
 sed -i".tmp7" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: 
'"$R_NEXT_VERSION"'/g' docs/_config.yml
+# Update the version index of DocSearch as the short version
+sed -i".tmp8" "s/'facetFilters':.*$/'facetFilters': 
[\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml
 
 git commit -a -m "Preparing development version $NEXT_VERSION"
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0ad5ae5 -> 9d0d4ed)

2021-06-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0ad5ae5  [SPARK-35539][PYTHON] Restore to_koalas to keep the backward 
compatibility
 add 9d0d4ed  [SPARK-35595][TESTS] Support multiple loggers in testing 
method withLogAppender

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/SparkFunSuite.scala | 24 ++
 .../catalyst/expressions/CodeGenerationSuite.scala |  2 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  |  6 --
 3 files changed, 21 insertions(+), 11 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c2de0a6 -> 3f6322f)

2021-06-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c2de0a6  [SPARK-35100][ML] Refactor AFT - support virtual centering
 add 3f6322f  [SPARK-35077][SQL] Migrate to transformWithPruning for 
leftover optimizer rules

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/optimizer/ComplexTypes.scala   |  6 --
 .../sql/catalyst/optimizer/NormalizeFloatingNumbers.scala|  3 ++-
 .../org/apache/spark/sql/catalyst/optimizer/Optimizer.scala  |  9 +++--
 .../org/apache/spark/sql/catalyst/optimizer/joins.scala  |  5 +++--
 .../apache/spark/sql/catalyst/rules/RuleIdCollection.scala   |  1 +
 .../dynamicpruning/CleanupDynamicPruningFilters.scala|  8 ++--
 .../spark/sql/execution/python/ExtractPythonUDFs.scala   | 12 +---
 7 files changed, 32 insertions(+), 12 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join

2021-06-02 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 54e  [SPARK-35604][SQL] Fix condition check for FULL OUTER sort 
merge join
54e is described below

commit 54ed39823c4fc236f328fe55e46607515cd0
Author: Cheng Su 
AuthorDate: Wed Jun 2 14:01:34 2021 +0800

[SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join

### What changes were proposed in this pull request?

The condition check for FULL OUTER sort merge join 
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L1368
 ) has unnecessary trip when `leftIndex == leftMatches.size` or `rightIndex == 
rightMatches.size`. Though this does not affect correctness 
(`scanNextInBuffered()` returns false anyway). But we can avoid it in the first 
place.

### Why are the changes needed?

Better readability for developers and avoid unnecessary execution.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests, such as `OuterJoinSuite.scala`.

Closes #32736 from c21/join-bug.

Authored-by: Cheng Su 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
index c565f91..5873754 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
@@ -1365,7 +1365,7 @@ private class SortMergeFullOuterJoinScanner(
 
   def advanceNext(): Boolean = {
 // If we already buffered some matching rows, use them directly
-if (leftIndex <= leftMatches.size || rightIndex <= rightMatches.size) {
+if (leftIndex < leftMatches.size || rightIndex < rightMatches.size) {
   if (scanNextInBuffered()) {
 return true
   }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated (264ce7b -> 92fb23e)

2021-06-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 264ce7b  [SPARK-35573][R][TESTSt] Make SparkR tests pass with R 4.1+
 add 92fb23e  [SPARK-35576][SQL][3.1] Redact the sensitive info in the 
result of Set command

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala  |  9 -
 .../org/apache/spark/sql/execution/command/SetCommand.scala |  6 --
 .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 13 +
 3 files changed, 25 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (73d4f67 -> 1dd0ca2)

2021-05-31 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 73d4f67  [SPARK-35433][DOCS] Move CSV data source options from Python 
and Scala into a single page
 add 1dd0ca2  [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala | 99 --
 .../sql/catalyst/analysis/CTESubstitution.scala|  9 +-
 .../catalyst/analysis/DeduplicateRelations.scala   |  4 +-
 .../analysis/ResolveCommandsWithIfExists.scala |  4 +-
 .../spark/sql/catalyst/analysis/ResolveHints.scala | 10 ++-
 .../catalyst/analysis/ResolvePartitionSpec.scala   |  4 +-
 .../spark/sql/catalyst/analysis/ResolveUnion.scala |  4 +-
 .../analysis/SubstituteUnresolvedOrdinals.scala|  4 +-
 .../analysis/UpdateAttributeNullability.scala  |  4 +-
 .../catalyst/analysis/higherOrderFunctions.scala   |  8 +-
 .../sql/catalyst/analysis/timeZoneAnalysis.scala   |  6 +-
 .../spark/sql/catalyst/analysis/unresolved.scala   | 12 +++
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  |  2 +
 .../spark/sql/catalyst/expressions/Cast.scala  |  6 +-
 .../spark/sql/catalyst/expressions/PythonUDF.scala |  3 +
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |  3 +
 .../sql/catalyst/expressions/TimeWindow.scala  |  2 +
 .../expressions/aggregate/interfaces.scala |  3 +
 .../catalyst/expressions/datetimeExpressions.scala | 12 ++-
 .../sql/catalyst/expressions/generators.scala  |  3 +
 .../spark/sql/catalyst/expressions/grouping.scala  |  4 +
 .../expressions/higherOrderFunctions.scala |  5 ++
 .../sql/catalyst/expressions/jsonExpressions.scala |  2 +-
 .../sql/catalyst/expressions/objects/objects.scala |  2 +
 .../spark/sql/catalyst/plans/logical/Command.scala |  2 +
 .../plans/logical/EventTimeWatermark.scala |  3 +
 .../plans/logical/basicLogicalOperators.scala  |  4 +
 .../spark/sql/catalyst/plans/logical/hints.scala   |  2 +
 .../sql/catalyst/rules/RuleIdCollection.scala  |  2 +
 .../spark/sql/catalyst/trees/TreePatterns.scala| 32 ++-
 30 files changed, 188 insertions(+), 72 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7bc364b -> 510bde4)

2021-06-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7bc364b  [SPARK-35621][SQL] Add rule id pruning to the TypeCoercion 
rule
 add 510bde4  [SPARK-35655][BUILD] Upgrade HtmlUnit and its related 
artifacts to 2.50

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b5678be -> 7bc364b)

2021-06-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b5678be  [SPARK-35446] Override getJDBCType in MySQLDialect to map 
FloatType to FLOAT
 add 7bc364b  [SPARK-35621][SQL] Add rule id pruning to the TypeCoercion 
rule

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/TypeCoercion.scala | 25 +++---
 .../sql/catalyst/rules/RuleIdCollection.scala  | 29 -
 .../apache/spark/sql/catalyst/trees/TreeNode.scala | 38 ++
 .../catalyst/analysis/AnsiTypeCoercionSuite.scala  |  7 
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  |  7 
 5 files changed, 91 insertions(+), 15 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (53a758b -> c7fb0e1)

2021-06-04 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 53a758b  [SPARK-35636][SQL] Lambda keys should not be referenced 
outside of the lambda function
 add c7fb0e1  [SPARK-35629][SQL] Use better exception type if database 
doesn't exist on `drop database`

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/catalog/SessionCatalog.scala  |  3 +++
 .../spark/sql/catalyst/catalog/SessionCatalogSuite.scala| 13 ++---
 .../org/apache/spark/sql/execution/command/DDLSuite.scala   |  7 +--
 3 files changed, 6 insertions(+), 17 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35552][SQL] Make query stage materialized more readable

2021-05-28 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3b94aad  [SPARK-35552][SQL] Make query stage materialized more readable
3b94aad is described below

commit 3b94aad5e72a6b96e4a8f517ac60e0a2fed2590b
Author: ulysses-you 
AuthorDate: Fri May 28 20:42:11 2021 +0800

[SPARK-35552][SQL] Make query stage materialized more readable

### What changes were proposed in this pull request?

Add a new method `isMaterialized` in `QueryStageExec`.

### Why are the changes needed?

Currently, we use `resultOption().get.isDefined` to check if a query stage 
has materialized. The code is not readable at a glance. It's better to use a 
new method like `isMaterialized` to define it.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass CI.

Closes #32689 from ulysses-you/SPARK-35552.

Authored-by: ulysses-you 
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala   | 5 ++---
 .../spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala   | 6 +++---
 .../apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala | 2 +-
 .../org/apache/spark/sql/execution/adaptive/QueryStageExec.scala   | 7 +--
 4 files changed, 11 insertions(+), 9 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala
index 614fc78..648d2e7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala
@@ -37,14 +37,13 @@ object AQEPropagateEmptyRelation extends 
PropagateEmptyRelationBase {
 super.nonEmpty(plan) || getRowCount(plan).exists(_ > 0)
 
   private def getRowCount(plan: LogicalPlan): Option[BigInt] = plan match {
-case LogicalQueryStage(_, stage: QueryStageExec) if 
stage.resultOption.get().isDefined =>
+case LogicalQueryStage(_, stage: QueryStageExec) if stage.isMaterialized =>
   stage.getRuntimeStatistics.rowCount
 case _ => None
   }
 
   private def isRelationWithAllNullKeys(plan: LogicalPlan): Boolean = plan 
match {
-case LogicalQueryStage(_, stage: BroadcastQueryStageExec)
-  if stage.resultOption.get().isDefined =>
+case LogicalQueryStage(_, stage: BroadcastQueryStageExec) if 
stage.isMaterialized =>
   stage.broadcast.relationFuture.get().value == 
HashedRelationWithAllNullKeys
 case _ => false
   }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index 556c036..ebff790 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -420,7 +420,7 @@ case class AdaptiveSparkPlanExec(
   context.stageCache.get(e.canonicalized) match {
 case Some(existingStage) if conf.exchangeReuseEnabled =>
   val stage = reuseQueryStage(existingStage, e)
-  val isMaterialized = stage.resultOption.get().isDefined
+  val isMaterialized = stage.isMaterialized
   CreateStageResult(
 newPlan = stage,
 allChildStagesMaterialized = isMaterialized,
@@ -442,7 +442,7 @@ case class AdaptiveSparkPlanExec(
 newStage = reuseQueryStage(queryStage, e)
   }
 }
-val isMaterialized = newStage.resultOption.get().isDefined
+val isMaterialized = newStage.isMaterialized
 CreateStageResult(
   newPlan = newStage,
   allChildStagesMaterialized = isMaterialized,
@@ -455,7 +455,7 @@ case class AdaptiveSparkPlanExec(
 
 case q: QueryStageExec =>
   CreateStageResult(newPlan = q,
-allChildStagesMaterialized = q.resultOption.get().isDefined, newStages 
= Seq.empty)
+allChildStagesMaterialized = q.isMaterialized, newStages = Seq.empty)
 
 case _ =>
   if (plan.children.isEmpty) {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala
index 61124f0..a8c74b5 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala
@@ -53,7 +53,7 @@ object Dyn

[spark] branch branch-3.2 updated: [SPARK-36025][SQL][TESTS] Reduce the run time of DateExpressionsSuite

2021-07-06 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 8f26722  [SPARK-36025][SQL][TESTS] Reduce the run time of 
DateExpressionsSuite
8f26722 is described below

commit 8f267226e45f18c8fe6b6a252a50e204a1a0731c
Author: Gengliang Wang 
AuthorDate: Tue Jul 6 20:17:02 2021 +0800

[SPARK-36025][SQL][TESTS] Reduce the run time of DateExpressionsSuite

### What changes were proposed in this pull request?

Some of the test cases in `DateExpressionsSuite` are quite slow:

- `Hour`: 24s
- `Minute`: 26s
- `Day / DayOfMonth`: 8s
- `Year`: 4s

Each test case has a large loop. We should improve them.

### Why are the changes needed?

Save test running time

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Verified the run times on local:
- `Hour`: 2s
- `Minute`: 3.2
- `Day / DayOfMonth`:0.5s
- `Year`: 2s

Total reduced time: 54.3s

Closes #33229 from gengliangwang/improveTest.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit d5d12226861f67243dd575c9240238bcd08e1a91)
Signed-off-by: Gengliang Wang 
---
 .../expressions/DateExpressionsSuite.scala | 49 ++
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
index d33fb7d..afcc729 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
@@ -25,7 +25,9 @@ import java.time.temporal.ChronoUnit
 import java.util.{Calendar, Locale, TimeZone}
 import java.util.concurrent.TimeUnit._
 
+import scala.language.postfixOps
 import scala.reflect.ClassTag
+import scala.util.Random
 
 import org.apache.spark.{SparkFunSuite, SparkUpgradeException}
 import org.apache.spark.sql.catalyst.InternalRow
@@ -122,8 +124,8 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 (2000 to 2002).foreach { y =>
   (0 to 11 by 11).foreach { m =>
 c.set(y, m, 28)
-(0 to 5 * 24).foreach { i =>
-  c.add(Calendar.HOUR_OF_DAY, 1)
+(0 to 12).foreach { i =>
+  c.add(Calendar.HOUR_OF_DAY, 10)
   checkEvaluation(Year(Literal(new Date(c.getTimeInMillis))),
 c.get(Calendar.YEAR))
 }
@@ -195,8 +197,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 val c = Calendar.getInstance()
 (1999 to 2000).foreach { y =>
   c.set(y, 0, 1, 0, 0, 0)
-  (0 to 365).foreach { d =>
-c.add(Calendar.DATE, 1)
+  val random = new Random(System.nanoTime)
+  random.shuffle(0 to 365 toList).take(10).foreach { d =>
+c.set(Calendar.DAY_OF_YEAR, d)
 checkEvaluation(DayOfMonth(Literal(new Date(c.getTimeInMillis))),
   c.get(Calendar.DAY_OF_MONTH))
   }
@@ -332,19 +335,15 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   val timeZoneId = Option(zid.getId)
   c.setTimeZone(TimeZone.getTimeZone(zid))
   (0 to 24 by 5).foreach { h =>
-(0 to 60 by 29).foreach { m =>
-  (0 to 60 by 29).foreach { s =>
-// validate timestamp with local time zone
-c.set(2015, 18, 3, h, m, s)
-checkEvaluation(
-  Hour(Literal(new Timestamp(c.getTimeInMillis)), timeZoneId),
-  c.get(Calendar.HOUR_OF_DAY))
+// validate timestamp with local time zone
+c.set(2015, 18, 3, h, 29, 59)
+checkEvaluation(
+  Hour(Literal(new Timestamp(c.getTimeInMillis)), timeZoneId),
+  c.get(Calendar.HOUR_OF_DAY))
 
-// validate timestamp without time zone
-val localDateTime = LocalDateTime.of(2015, 1, 3, h, m, s)
-checkEvaluation(Hour(Literal(localDateTime), timeZoneId), h)
-  }
-}
+// validate timestamp without time zone
+val localDateTime = LocalDateTime.of(2015, 1, 3, h, 29, 59)
+checkEvaluation(Hour(Literal(localDateTime), timeZoneId), h)
   }
   Seq(TimestampType, TimestampNTZType).foreach { dt =>
 checkConsistencyBetweenInterpretedAndCodegen(
@@ -367,17 +366,15 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   val timeZoneId = Option(zid.getId)
   c.setTimeZone(TimeZone.getTimeZone(zid))
   (0 to 59 by 5).foreach { m =>
-(0 to 59 by 15).f

[spark] branch master updated: [SPARK-36025][SQL][TESTS] Reduce the run time of DateExpressionsSuite

2021-07-06 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d5d1222  [SPARK-36025][SQL][TESTS] Reduce the run time of 
DateExpressionsSuite
d5d1222 is described below

commit d5d12226861f67243dd575c9240238bcd08e1a91
Author: Gengliang Wang 
AuthorDate: Tue Jul 6 20:17:02 2021 +0800

[SPARK-36025][SQL][TESTS] Reduce the run time of DateExpressionsSuite

### What changes were proposed in this pull request?

Some of the test cases in `DateExpressionsSuite` are quite slow:

- `Hour`: 24s
- `Minute`: 26s
- `Day / DayOfMonth`: 8s
- `Year`: 4s

Each test case has a large loop. We should improve them.

### Why are the changes needed?

Save test running time

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Verified the run times on local:
- `Hour`: 2s
- `Minute`: 3.2
- `Day / DayOfMonth`:0.5s
- `Year`: 2s

Total reduced time: 54.3s

Closes #33229 from gengliangwang/improveTest.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../expressions/DateExpressionsSuite.scala | 49 ++
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
index d33fb7d..afcc729 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
@@ -25,7 +25,9 @@ import java.time.temporal.ChronoUnit
 import java.util.{Calendar, Locale, TimeZone}
 import java.util.concurrent.TimeUnit._
 
+import scala.language.postfixOps
 import scala.reflect.ClassTag
+import scala.util.Random
 
 import org.apache.spark.{SparkFunSuite, SparkUpgradeException}
 import org.apache.spark.sql.catalyst.InternalRow
@@ -122,8 +124,8 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 (2000 to 2002).foreach { y =>
   (0 to 11 by 11).foreach { m =>
 c.set(y, m, 28)
-(0 to 5 * 24).foreach { i =>
-  c.add(Calendar.HOUR_OF_DAY, 1)
+(0 to 12).foreach { i =>
+  c.add(Calendar.HOUR_OF_DAY, 10)
   checkEvaluation(Year(Literal(new Date(c.getTimeInMillis))),
 c.get(Calendar.YEAR))
 }
@@ -195,8 +197,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 val c = Calendar.getInstance()
 (1999 to 2000).foreach { y =>
   c.set(y, 0, 1, 0, 0, 0)
-  (0 to 365).foreach { d =>
-c.add(Calendar.DATE, 1)
+  val random = new Random(System.nanoTime)
+  random.shuffle(0 to 365 toList).take(10).foreach { d =>
+c.set(Calendar.DAY_OF_YEAR, d)
 checkEvaluation(DayOfMonth(Literal(new Date(c.getTimeInMillis))),
   c.get(Calendar.DAY_OF_MONTH))
   }
@@ -332,19 +335,15 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   val timeZoneId = Option(zid.getId)
   c.setTimeZone(TimeZone.getTimeZone(zid))
   (0 to 24 by 5).foreach { h =>
-(0 to 60 by 29).foreach { m =>
-  (0 to 60 by 29).foreach { s =>
-// validate timestamp with local time zone
-c.set(2015, 18, 3, h, m, s)
-checkEvaluation(
-  Hour(Literal(new Timestamp(c.getTimeInMillis)), timeZoneId),
-  c.get(Calendar.HOUR_OF_DAY))
+// validate timestamp with local time zone
+c.set(2015, 18, 3, h, 29, 59)
+checkEvaluation(
+  Hour(Literal(new Timestamp(c.getTimeInMillis)), timeZoneId),
+  c.get(Calendar.HOUR_OF_DAY))
 
-// validate timestamp without time zone
-val localDateTime = LocalDateTime.of(2015, 1, 3, h, m, s)
-checkEvaluation(Hour(Literal(localDateTime), timeZoneId), h)
-  }
-}
+// validate timestamp without time zone
+val localDateTime = LocalDateTime.of(2015, 1, 3, h, 29, 59)
+checkEvaluation(Hour(Literal(localDateTime), timeZoneId), h)
   }
   Seq(TimestampType, TimestampNTZType).foreach { dt =>
 checkConsistencyBetweenInterpretedAndCodegen(
@@ -367,17 +366,15 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   val timeZoneId = Option(zid.getId)
   c.setTimeZone(TimeZone.getTimeZone(zid))
   (0 to 59 by 5).foreach { m =>
-(0 to 59 by 15).foreach { s =>
-  // validate timestamp with local time zone
-  c.set(2015, 18, 3, 3, m, s)
-

[spark] branch branch-3.2 updated: [SPARK-36043][SQL][TESTS] Add end-to-end tests with default timestamp type as TIMESTAMP_NTZ

2021-07-08 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new cafb829  [SPARK-36043][SQL][TESTS] Add end-to-end tests with default 
timestamp type as TIMESTAMP_NTZ
cafb829 is described below

commit cafb829c42fc60722bae621da47cac9602e40f4d
Author: Gengliang Wang 
AuthorDate: Thu Jul 8 19:38:52 2021 +0800

[SPARK-36043][SQL][TESTS] Add end-to-end tests with default timestamp type 
as TIMESTAMP_NTZ

### What changes were proposed in this pull request?

Run end-to-end tests with default timestamp type as TIMESTAMP_NTZ to 
increase test coverage.

### Why are the changes needed?

Inrease test coverage.
Also, there will be more and more expressions have different behaviors when 
the default timestamp type is TIMESTAMP_NTZ, for example, `to_timestamp`, 
`from_json`, `from_csv`, and so on. Having this new test suite helps future 
developments.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI tests.

Closes #33259 from gengliangwang/ntzTest.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 57342dfc1dd7deaf60209127d93d416c096645ea)
Signed-off-by: Gengliang Wang 
---
 .../sql-tests/inputs/timestampNTZ/datetime.sql |1 +
 .../results/timestampNTZ/datetime.sql.out  | 1595 
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |   15 +
 .../thriftserver/ThriftServerQueryTestSuite.scala  |6 +
 4 files changed, 1617 insertions(+)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql 
b/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql
new file mode 100644
index 000..58ecf80
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql
@@ -0,0 +1 @@
+--IMPORT datetime.sql
diff --git 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out
new file mode 100644
index 000..131ad01
--- /dev/null
+++ 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out
@@ -0,0 +1,1595 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 193
+
+
+-- !query
+select 
TIMESTAMP_SECONDS(1230219000),TIMESTAMP_SECONDS(-1230219000),TIMESTAMP_SECONDS(null)
+-- !query schema
+struct
+-- !query output
+2008-12-25 07:30:001931-01-07 00:30:00 NULL
+
+
+-- !query
+select TIMESTAMP_SECONDS(1.23), TIMESTAMP_SECONDS(1.23d), 
TIMESTAMP_SECONDS(FLOAT(1.23))
+-- !query schema
+struct
+-- !query output
+1969-12-31 16:00:01.23 1969-12-31 16:00:01.23  1969-12-31 16:00:01.23
+
+
+-- !query
+select 
TIMESTAMP_MILLIS(1230219000123),TIMESTAMP_MILLIS(-1230219000123),TIMESTAMP_MILLIS(null)
+-- !query schema
+struct
+-- !query output
+2008-12-25 07:30:00.1231931-01-07 00:29:59.877 NULL
+
+
+-- !query
+select 
TIMESTAMP_MICROS(1230219000123123),TIMESTAMP_MICROS(-1230219000123123),TIMESTAMP_MICROS(null)
+-- !query schema
+struct
+-- !query output
+2008-12-25 07:30:00.123123 1931-01-07 00:29:59.876877  NULL
+
+
+-- !query
+select TIMESTAMP_SECONDS(1230219000123123)
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArithmeticException
+long overflow
+
+
+-- !query
+select TIMESTAMP_SECONDS(-1230219000123123)
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArithmeticException
+long overflow
+
+
+-- !query
+select TIMESTAMP_MILLIS(92233720368547758)
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArithmeticException
+long overflow
+
+
+-- !query
+select TIMESTAMP_MILLIS(-92233720368547758)
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArithmeticException
+long overflow
+
+
+-- !query
+select TIMESTAMP_SECONDS(0.1234567)
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArithmeticException
+Rounding necessary
+
+
+-- !query
+select TIMESTAMP_SECONDS(0.1234567d), TIMESTAMP_SECONDS(FLOAT(0.1234567))
+-- !query schema
+struct
+-- !query output
+1969-12-31 16:00:00.123456 1969-12-31 16:00:00.123456
+
+
+-- !query
+select UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08Z')), 
UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_SECONDS(null)
+-- !query schema
+struct
+-- !query output
+1606833008 1606833008  NULL
+
+
+-- !query
+select UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08Z')), 
UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_MILLIS(null)
+-- !query schema
+struct
+-- !query output
+1606833008000  1606833008999   NULL
+
+
+-- !query
+select UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08Z')), 
UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_MICROS(null)
+-- !query schema
+struct
+-- !

[spark] branch master updated: [SPARK-36043][SQL][TESTS] Add end-to-end tests with default timestamp type as TIMESTAMP_NTZ

2021-07-08 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 57342df  [SPARK-36043][SQL][TESTS] Add end-to-end tests with default 
timestamp type as TIMESTAMP_NTZ
57342df is described below

commit 57342dfc1dd7deaf60209127d93d416c096645ea
Author: Gengliang Wang 
AuthorDate: Thu Jul 8 19:38:52 2021 +0800

[SPARK-36043][SQL][TESTS] Add end-to-end tests with default timestamp type 
as TIMESTAMP_NTZ

### What changes were proposed in this pull request?

Run end-to-end tests with default timestamp type as TIMESTAMP_NTZ to 
increase test coverage.

### Why are the changes needed?

Inrease test coverage.
Also, there will be more and more expressions have different behaviors when 
the default timestamp type is TIMESTAMP_NTZ, for example, `to_timestamp`, 
`from_json`, `from_csv`, and so on. Having this new test suite helps future 
developments.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI tests.

Closes #33259 from gengliangwang/ntzTest.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../sql-tests/inputs/timestampNTZ/datetime.sql |1 +
 .../results/timestampNTZ/datetime.sql.out  | 1595 
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |   15 +
 .../thriftserver/ThriftServerQueryTestSuite.scala  |6 +
 4 files changed, 1617 insertions(+)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql 
b/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql
new file mode 100644
index 000..58ecf80
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql
@@ -0,0 +1 @@
+--IMPORT datetime.sql
diff --git 
a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out
new file mode 100644
index 000..131ad01
--- /dev/null
+++ 
b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out
@@ -0,0 +1,1595 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 193
+
+
+-- !query
+select 
TIMESTAMP_SECONDS(1230219000),TIMESTAMP_SECONDS(-1230219000),TIMESTAMP_SECONDS(null)
+-- !query schema
+struct
+-- !query output
+2008-12-25 07:30:001931-01-07 00:30:00 NULL
+
+
+-- !query
+select TIMESTAMP_SECONDS(1.23), TIMESTAMP_SECONDS(1.23d), 
TIMESTAMP_SECONDS(FLOAT(1.23))
+-- !query schema
+struct
+-- !query output
+1969-12-31 16:00:01.23 1969-12-31 16:00:01.23  1969-12-31 16:00:01.23
+
+
+-- !query
+select 
TIMESTAMP_MILLIS(1230219000123),TIMESTAMP_MILLIS(-1230219000123),TIMESTAMP_MILLIS(null)
+-- !query schema
+struct
+-- !query output
+2008-12-25 07:30:00.1231931-01-07 00:29:59.877 NULL
+
+
+-- !query
+select 
TIMESTAMP_MICROS(1230219000123123),TIMESTAMP_MICROS(-1230219000123123),TIMESTAMP_MICROS(null)
+-- !query schema
+struct
+-- !query output
+2008-12-25 07:30:00.123123 1931-01-07 00:29:59.876877  NULL
+
+
+-- !query
+select TIMESTAMP_SECONDS(1230219000123123)
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArithmeticException
+long overflow
+
+
+-- !query
+select TIMESTAMP_SECONDS(-1230219000123123)
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArithmeticException
+long overflow
+
+
+-- !query
+select TIMESTAMP_MILLIS(92233720368547758)
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArithmeticException
+long overflow
+
+
+-- !query
+select TIMESTAMP_MILLIS(-92233720368547758)
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArithmeticException
+long overflow
+
+
+-- !query
+select TIMESTAMP_SECONDS(0.1234567)
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArithmeticException
+Rounding necessary
+
+
+-- !query
+select TIMESTAMP_SECONDS(0.1234567d), TIMESTAMP_SECONDS(FLOAT(0.1234567))
+-- !query schema
+struct
+-- !query output
+1969-12-31 16:00:00.123456 1969-12-31 16:00:00.123456
+
+
+-- !query
+select UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08Z')), 
UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_SECONDS(null)
+-- !query schema
+struct
+-- !query output
+1606833008 1606833008  NULL
+
+
+-- !query
+select UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08Z')), 
UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_MILLIS(null)
+-- !query schema
+struct
+-- !query output
+1606833008000  1606833008999   NULL
+
+
+-- !query
+select UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08Z')), 
UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_MICROS(null)
+-- !query schema
+struct
+-- !query output
+160683300800   160683300899NULL
+
+
+-- !query
+select DATE_FROM_UNIX_DATE(0), DATE_FROM_UNIX_DATE(10

[spark] branch branch-3.2 created (now 79a6e00)

2021-07-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 79a6e00  [SPARK-35825][INFRA][FOLLOWUP] Increase it in build/mvn script

No new revisions were added by this update.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (95d9494 -> 47485a3)

2021-07-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 95d9494  [SPARK-35339][PYTHON] Improve unit tests for data-type-based 
basic operations
 add 47485a3  [SPARK-35897][SS] Support user defined initial state with 
flatMapGroupsWithState in Structured Streaming

No new revisions were added by this update.

Summary of changes:
 .../analysis/UnsupportedOperationChecker.scala |  12 +
 .../spark/sql/catalyst/plans/logical/object.scala  |  65 -
 .../analysis/UnsupportedOperationsSuite.scala  | 116 ++---
 .../apache/spark/sql/KeyValueGroupedDataset.scala  | 164 
 .../spark/sql/execution/SparkStrategies.scala  |  10 +-
 .../streaming/FlatMapGroupsWithStateExec.scala | 266 ++-
 .../execution/streaming/IncrementalExecution.scala |   6 +-
 .../execution/streaming/statefulOperators.scala|   4 +-
 .../apache/spark/sql/streaming/GroupState.scala|   5 +
 .../org/apache/spark/sql/JavaDatasetSuite.java |  66 +
 .../streaming/FlatMapGroupsWithStateSuite.scala| 283 -
 11 files changed, 875 insertions(+), 122 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (47485a3 -> 1fda011)

2021-07-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 47485a3  [SPARK-35897][SS] Support user defined initial state with 
flatMapGroupsWithState in Structured Streaming
 add 1fda011  [SPARK-35955][SQL] Check for overflow in Average in ANSI mode

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/aggregate/Average.scala |  7 +--
 .../scala/org/apache/spark/sql/DataFrameSuite.scala  | 20 ++--
 2 files changed, 19 insertions(+), 8 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dc85b0b  [SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the 
executors page
dc85b0b is described below

commit dc85b0b51a02b9d6c52ffb1600f26ccdd7d7829a
Author: Kevin Su 
AuthorDate: Thu Jul 1 12:32:54 2021 +0800

[SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the executors page

### What changes were proposed in this pull request?

Update the executor's page, so it can successfully hide the "Exec Loss 
Reason" column.

### Why are the changes needed?

When unselected the checkbox "Exec Loss Reason" on the executor page,
the "Active tasks" column disappears instead of the "Exec Loss Reason" 
column.

Before:
![Screenshot from 2021-06-30 
15-55-05](https://user-images.githubusercontent.com/37936015/123930908-bd6f4180-d9c2-11eb-9aba-bbfe0a237776.png)
After:
![Screenshot from 2021-06-30 
22-21-38](https://user-images.githubusercontent.com/37936015/123977632-bf042e00-d9f1-11eb-910e-93d615d2db47.png)

### Does this PR introduce _any_ user-facing change?

Yes, The Web UI is updated.

### How was this patch tested?

Pass the CIs.

Closes #33155 from pingsutw/SPARK-35950.

Lead-authored-by: Kevin Su 
Co-authored-by: Kevin Su 
Signed-off-by: Gengliang Wang 
---
 .../src/main/resources/org/apache/spark/ui/static/executorspage.js | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js 
b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
index ab412a8..b7fbe04 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
@@ -140,7 +140,7 @@ function totalDurationColor(totalGCTime, totalDuration) {
 }
 
 var sumOptionalColumns = [3, 4];
-var execOptionalColumns = [5, 6, 7, 8, 9, 10, 13, 14, 15];
+var execOptionalColumns = [5, 6, 7, 8, 9, 10, 13, 14, 25];
 var execDataTable;
 var sumDataTable;
 
@@ -566,7 +566,8 @@ $(document).ready(function () {
 {"visible": false, "targets": 9},
 {"visible": false, "targets": 10},
 {"visible": false, "targets": 13},
-{"visible": false, "targets": 14}
+{"visible": false, "targets": 14},
+{"visible": false, "targets": 25}
   ],
   "deferRender": true
 };
@@ -721,7 +722,7 @@ $(document).ready(function () {
   " Peak Pool Memory 
Direct / Mapped" +
   " 
Resources" +
   " Resource Profile Id" +
-  " Exec Loss Reason" +
+  " Exec Loss Reason" +
   "");
 
 reselectCheckboxesBasedOnTaskTableState();

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8d28839 -> ad4b679)

2021-06-29 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8d28839  [SPARK-35946][PYTHON] Respect Py4J server in 
InheritableThread API
 add ad4b679  [SPARK-35937][SQL] Extracting date field from timestamp 
should work in ANSI mode

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/AnsiTypeCoercion.scala | 18 +-
 .../spark/sql/catalyst/rules/RuleIdCollection.scala|  1 +
 .../sql/catalyst/analysis/AnsiTypeCoercionSuite.scala  | 10 ++
 .../sql-tests/results/postgreSQL/timestamp.sql.out |  9 ++---
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  8 
 5 files changed, 42 insertions(+), 4 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6bbfb45 -> 4dd41b9)

2021-06-30 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6bbfb45  [SPARK-33298][CORE][FOLLOWUP] Add Unstable annotation to 
`FileCommitProtocol`
 add 4dd41b9  [SPARK-34365][AVRO] Add support for positional 
Catalyst-to-Avro schema matching

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-avro.md  |   6 +
 .../apache/spark/sql/avro/AvroDeserializer.scala   |  15 +-
 .../org/apache/spark/sql/avro/AvroFileFormat.scala |   1 +
 .../org/apache/spark/sql/avro/AvroOptions.scala|   8 +
 .../apache/spark/sql/avro/AvroOutputWriter.scala   |   5 +-
 .../spark/sql/avro/AvroOutputWriterFactory.scala   |   8 +-
 .../org/apache/spark/sql/avro/AvroSerializer.scala |  22 +--
 .../org/apache/spark/sql/avro/AvroUtils.scala  |  42 +-
 .../sql/v2/avro/AvroPartitionReaderFactory.scala   |   1 +
 .../sql/avro/AvroCatalystDataConversionSuite.scala |   1 +
 .../apache/spark/sql/avro/AvroRowReaderSuite.scala |   1 +
 .../spark/sql/avro/AvroSchemaHelperSuite.scala |  24 ++-
 .../org/apache/spark/sql/avro/AvroSerdeSuite.scala | 164 ++---
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  41 +-
 14 files changed, 258 insertions(+), 81 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35951][DOCS] Add since versions for Avro options in Documentation

2021-06-30 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c6afd6e  [SPARK-35951][DOCS] Add since versions for Avro options in 
Documentation
c6afd6e is described below

commit c6afd6ed5296980e81160e441a4e9bea98c74196
Author: Gengliang Wang 
AuthorDate: Wed Jun 30 17:24:48 2021 +0800

[SPARK-35951][DOCS] Add since versions for Avro options in Documentation

### What changes were proposed in this pull request?

There are two new Avro options `datetimeRebaseMode` and 
`positionalFieldMatching` after Spark 3.2.
We should document the since version so that users can know whether the 
option works in their Spark version.

### Why are the changes needed?

Better documentation.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Manual preview on local setup.
https://user-images.githubusercontent.com/1097932/123934000-ba833b00-d947-11eb-9ca5-ce8ff8add74b.png;>

https://user-images.githubusercontent.com/1097932/123934126-d4bd1900-d947-11eb-8d80-69df8f3d9900.png;>

Closes #33153 from gengliangwang/version.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 docs/sql-data-sources-avro.md | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md
index 7fb0ef5..94dd7e1 100644
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@@ -224,7 +224,7 @@ Data source options of Avro can be set via:
  * the `options` parameter in function `from_avro`.
 
 
-  Property 
NameDefaultMeaningScope
+  Property 
NameDefaultMeaningScopeSince
 Version
   
 avroSchema
 None
@@ -244,24 +244,28 @@ Data source options of Avro can be set via:
   
 
  read, write and function from_avro
+2.4.0
   
   
 recordName
 topLevelRecord
 Top level record name in write result, which is required in Avro 
spec.
 write
+2.4.0
   
   
 recordNamespace
 ""
 Record namespace in write result.
 write
+2.4.0
   
   
 ignoreExtension
 true
 The option controls ignoring of files without .avro 
extensions in read. If the option is enabled, all files (with and without 
.avro extension) are loaded. The option has been deprecated, 
and it will be removed in the future releases. Please use the general data 
source option pathGlobFilter
 for filtering file names.
 read
+2.4.0
   
   
 compression
@@ -269,6 +273,7 @@ Data source options of Avro can be set via:
 The compression option allows to specify a compression 
codec used in write.
   Currently supported codecs are uncompressed, 
snappy, deflate, bzip2 and 
xz. If the option is not set, the configuration 
spark.sql.avro.compression.codec config is taken into account.
 write
+2.4.0
   
   
 mode
@@ -282,6 +287,7 @@ Data source options of Avro can be set via:
   
 
 function from_avro
+2.4.0
   
   
 datetimeRebaseMode
@@ -295,12 +301,14 @@ Data source options of Avro can be set via:
   
 
 read and function from_avro
+3.2.0
   
   
 positionalFieldMatching
 false
 This can be used in tandem with the `avroSchema` option to adjust the 
behavior for matching the fields in the provided Avro schema with those in the 
SQL schema. By default, the matching will be performed using field names, 
ignoring their positions. If this option is set to "true", the matching will be 
based on the position of the fields.
 read and write
+3.2.0
   
 
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35971][SQL] Rename the type name of TimestampNTZType as "timestamp_ntz"

2021-07-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3acc4b9  [SPARK-35971][SQL] Rename the type name of TimestampNTZType 
as "timestamp_ntz"
3acc4b9 is described below

commit 3acc4b973b57f88fbe681c7db89cd55699750178
Author: Gengliang Wang 
AuthorDate: Thu Jul 1 20:50:19 2021 +0800

[SPARK-35971][SQL] Rename the type name of TimestampNTZType as 
"timestamp_ntz"

### What changes were proposed in this pull request?

Rename the type name string of TimestampNTZType from "timestamp without 
time zone" to "timestamp_ntz".

### Why are the changes needed?

This is to make the column header shorter and simpler.
Snowflake and Flink uses similar approach:
https://docs.snowflake.com/en/sql-reference/data-types-datetime.html

https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/

### Does this PR introduce _any_ user-facing change?

No, the new timestamp type is not released yet.

### How was this patch tested?

Unit tests

Closes #33173 from gengliangwang/reviseTypeName.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../apache/spark/sql/types/TimestampNTZType.scala  |   2 +-
 .../sql/catalyst/expressions/CastSuiteBase.scala   |   4 +-
 .../sql-functions/sql-expression-schema.md |   4 +-
 .../sql-tests/results/ansi/datetime.sql.out|  92 +-
 .../sql-tests/results/datetime-legacy.sql.out  | 108 ++---
 .../resources/sql-tests/results/datetime.sql.out   | 108 ++---
 6 files changed, 159 insertions(+), 159 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala
index 347fd4a..f7d20a0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala
@@ -48,7 +48,7 @@ class TimestampNTZType private() extends AtomicType {
*/
   override def defaultSize: Int = 8
 
-  override def typeName: String = "timestamp without time zone"
+  override def typeName: String = "timestamp_ntz"
 
   private[spark] override def asNullable: TimestampNTZType = this
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala
index f6a628a..66f5b50 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala
@@ -939,11 +939,11 @@ abstract class CastSuiteBase extends SparkFunSuite with 
ExpressionEvalHelper {
   test("disallow type conversions between Numeric types and Timestamp without 
time zone type") {
 import DataTypeTestUtils.numericTypes
 checkInvalidCastFromNumericType(TimestampNTZType)
-var errorMsg = "cannot cast bigint to timestamp without time zone"
+var errorMsg = "cannot cast bigint to timestamp_ntz"
 verifyCastFailure(cast(Literal(0L), TimestampNTZType), Some(errorMsg))
 
 val timestampNTZLiteral = Literal.create(LocalDateTime.now(), 
TimestampNTZType)
-errorMsg = "cannot cast timestamp without time zone to"
+errorMsg = "cannot cast timestamp_ntz to"
 numericTypes.foreach { numericType =>
   verifyCastFailure(cast(timestampNTZLiteral, numericType), Some(errorMsg))
 }
diff --git a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md 
b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
index 5fa37c4..00fb172 100644
--- a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
+++ b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
@@ -206,7 +206,7 @@
 | org.apache.spark.sql.catalyst.expressions.Overlay | overlay | SELECT 
overlay('Spark SQL' PLACING '_' FROM 6) | struct |
 | org.apache.spark.sql.catalyst.expressions.ParseToDate | to_date | SELECT 
to_date('2009-07-30 04:17:52') | struct |
 | org.apache.spark.sql.catalyst.expressions.ParseToTimestamp | to_timestamp | 
SELECT to_timestamp('2016-12-31 00:12:00') | struct |
-| org.apache.spark.sql.catalyst.expressions.ParseToTimestampNTZ | 
to_timestamp_ntz | SELECT to_timestamp_ntz('2016-12-31 00:12:00') | 
struct |
+| org.apache.spark.sql.catalyst.expressions.ParseToTimestampNTZ | 
to_timestamp_ntz | SELECT to_timestamp_ntz('2016-12-31 00:12:00') | 
struct |
 | org.apache.spark.sql.catalyst.expressions.ParseUrl | parse_url | SELECT 
pars

[spark] branch master updated (c6afd6e -> e88aa49)

2021-06-30 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c6afd6e  [SPARK-35951][DOCS] Add since versions for Avro options in 
Documentation
 add e88aa49  [SPARK-35932][SQL] Support extracting hour/minute/second from 
timestamp without time zone

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/AnsiTypeCoercion.scala   |   1 +
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |   6 +-
 .../catalyst/expressions/datetimeExpressions.scala |   8 +-
 .../apache/spark/sql/types/AbstractDataType.scala  |   2 +-
 .../expressions/DateExpressionsSuite.scala |  65 +---
 .../test/resources/sql-tests/inputs/extract.sql|  66 
 .../resources/sql-tests/results/extract.sql.out| 182 ++---
 7 files changed, 181 insertions(+), 149 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0a7a6f7 -> 7635114)

2021-06-29 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0a7a6f7  [SPARK-35483][FOLLOWUP][TESTS] Update run-tests.py doctest
 add 7635114  [SPARK-35916][SQL] Support subtraction among 
Date/Timestamp/TimestampWithoutTZ

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  4 +-
 .../spark/sql/catalyst/analysis/TypeCoercion.scala | 13 +++--
 .../catalyst/expressions/datetimeExpressions.scala |  8 ++-
 .../apache/spark/sql/types/AbstractDataType.scala  | 11 
 .../expressions/DateExpressionsSuite.scala | 65 +
 .../test/resources/sql-tests/inputs/datetime.sql   | 10 
 .../sql-tests/results/ansi/datetime.sql.out| 66 +-
 .../sql-tests/results/datetime-legacy.sql.out  | 66 +-
 .../resources/sql-tests/results/datetime.sql.out   | 66 +-
 .../typeCoercion/native/decimalPrecision.sql.out   | 16 +++---
 .../typeCoercion/native/promoteStrings.sql.out |  4 +-
 11 files changed, 307 insertions(+), 22 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5db51ef -> 78e6263)

2021-06-29 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5db51ef  [SPARK-35721][PYTHON] Path level discover for python unittests
 add 78e6263  [SPARK-35927][SQL] Remove type collection AllTimestampTypes

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala  | 5 ++---
 .../main/scala/org/apache/spark/sql/types/AbstractDataType.scala  | 8 
 2 files changed, 2 insertions(+), 11 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-33603][SQL] Grouping exception messages in execution/command

2021-07-12 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d03f716  [SPARK-33603][SQL] Grouping exception messages in 
execution/command
d03f716 is described below

commit d03f71657ed745247d026ca1e5de2a2d7c9a6a30
Author: dgd-contributor 
AuthorDate: Tue Jul 13 01:28:43 2021 +0800

[SPARK-33603][SQL] Grouping exception messages in execution/command

### What changes were proposed in this pull request?
This PR group exception messages in 
sql/core/src/main/scala/org/apache/spark/sql/execution/command

### Why are the changes needed?
It will largely help with standardization of error messages and its 
maintenance.

### Does this PR introduce any user-facing change?
No. Error messages remain unchanged.

### How was this patch tested?
No new tests - pass all original tests to make sure it doesn't break any 
existing behavior.

Closes #32951 from dgd-contributor/SPARK-33603_grouping_execution/command.

Authored-by: dgd-contributor 
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/errors/QueryCompilationErrors.scala  | 368 -
 .../spark/sql/errors/QueryExecutionErrors.scala|  18 +
 .../execution/command/AnalyzeColumnCommand.scala   |  16 +-
 .../command/AnalyzePartitionCommand.scala  |  17 +-
 .../spark/sql/execution/command/CommandUtils.scala |   9 +-
 .../sql/execution/command/DataWritingCommand.scala |   9 +-
 .../command/InsertIntoDataSourceDirCommand.scala   |   6 +-
 .../execution/command/createDataSourceTables.scala |   6 +-
 .../apache/spark/sql/execution/command/ddl.scala   |  57 ++--
 .../spark/sql/execution/command/functions.scala|  22 +-
 .../spark/sql/execution/command/tables.scala   | 117 +++
 .../apache/spark/sql/execution/command/views.scala |  44 ++-
 12 files changed, 505 insertions(+), 184 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 4f82e25..d1dcbbc 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -17,12 +17,15 @@
 
 package org.apache.spark.sql.errors
 
+import scala.collection.mutable
+
 import org.apache.hadoop.fs.Path
 
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.{FunctionIdentifier, QualifiedTableName, 
TableIdentifier}
-import 
org.apache.spark.sql.catalyst.analysis.{CannotReplaceMissingTableException, 
NamespaceAlreadyExistsException, NoSuchNamespaceException, 
NoSuchTableException, ResolvedNamespace, ResolvedTable, ResolvedView, 
TableAlreadyExistsException}
+import 
org.apache.spark.sql.catalyst.analysis.{CannotReplaceMissingTableException, 
NamespaceAlreadyExistsException, NoSuchFunctionException, 
NoSuchNamespaceException, NoSuchPartitionException, NoSuchTableException, 
ResolvedNamespace, ResolvedTable, ResolvedView, TableAlreadyExistsException}
 import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogTable, 
InvalidUDFClassException}
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
 import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
AttributeReference, AttributeSet, CreateMap, Expression, GroupingID, 
NamedExpression, SpecifiedWindowFrame, WindowFrame, WindowFunction, 
WindowSpecDefinition}
 import org.apache.spark.sql.catalyst.plans.JoinType
 import org.apache.spark.sql.catalyst.plans.logical.{InsertIntoStatement, Join, 
LogicalPlan, SerdeInfo, Window}
@@ -1696,6 +1699,369 @@ private[spark] object QueryCompilationErrors {
   s"Found duplicate column(s) $colType: ${duplicateCol.sorted.mkString(", 
")}")
   }
 
+  def noSuchTableError(db: String, table: String): Throwable = {
+new NoSuchTableException(db = db, table = table)
+  }
+
+  def tempViewNotCachedForAnalyzingColumnsError(tableIdent: TableIdentifier): 
Throwable = {
+new AnalysisException(s"Temporary view $tableIdent is not cached for 
analyzing columns.")
+  }
+
+  def columnTypeNotSupportStatisticsCollectionError(
+  name: String,
+  tableIdent: TableIdentifier,
+  dataType: DataType): Throwable = {
+new AnalysisException(s"Column $name in table $tableIdent is of type 
$dataType, " +
+  "and Spark does not support statistics collection on this column type.")
+  }
+
+  def analyzeTableNotSupportedOnViewsError(): Throwable = {
+new AnalysisException("ANALYZE TABLE is not supported on views.")
+  }
+
+  def unexpectedPartitionColumnPrefixError(
+  table: String,
+  database: String,
+  schemaColumns:

[spark] branch branch-3.2 updated: [SPARK-33603][SQL] Grouping exception messages in execution/command

2021-07-12 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 12aecb4  [SPARK-33603][SQL] Grouping exception messages in 
execution/command
12aecb4 is described below

commit 12aecb43302fcb9ddbdd3ab0633291ccf3e91f6b
Author: dgd-contributor 
AuthorDate: Tue Jul 13 01:28:43 2021 +0800

[SPARK-33603][SQL] Grouping exception messages in execution/command

### What changes were proposed in this pull request?
This PR group exception messages in 
sql/core/src/main/scala/org/apache/spark/sql/execution/command

### Why are the changes needed?
It will largely help with standardization of error messages and its 
maintenance.

### Does this PR introduce any user-facing change?
No. Error messages remain unchanged.

### How was this patch tested?
No new tests - pass all original tests to make sure it doesn't break any 
existing behavior.

Closes #32951 from dgd-contributor/SPARK-33603_grouping_execution/command.

Authored-by: dgd-contributor 
Signed-off-by: Gengliang Wang 
(cherry picked from commit d03f71657ed745247d026ca1e5de2a2d7c9a6a30)
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/errors/QueryCompilationErrors.scala  | 368 -
 .../spark/sql/errors/QueryExecutionErrors.scala|  18 +
 .../execution/command/AnalyzeColumnCommand.scala   |  16 +-
 .../command/AnalyzePartitionCommand.scala  |  17 +-
 .../spark/sql/execution/command/CommandUtils.scala |   9 +-
 .../sql/execution/command/DataWritingCommand.scala |   9 +-
 .../command/InsertIntoDataSourceDirCommand.scala   |   6 +-
 .../execution/command/createDataSourceTables.scala |   6 +-
 .../apache/spark/sql/execution/command/ddl.scala   |  57 ++--
 .../spark/sql/execution/command/functions.scala|  22 +-
 .../spark/sql/execution/command/tables.scala   | 117 +++
 .../apache/spark/sql/execution/command/views.scala |  44 ++-
 12 files changed, 505 insertions(+), 184 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 4f82e25..d1dcbbc 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -17,12 +17,15 @@
 
 package org.apache.spark.sql.errors
 
+import scala.collection.mutable
+
 import org.apache.hadoop.fs.Path
 
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.{FunctionIdentifier, QualifiedTableName, 
TableIdentifier}
-import 
org.apache.spark.sql.catalyst.analysis.{CannotReplaceMissingTableException, 
NamespaceAlreadyExistsException, NoSuchNamespaceException, 
NoSuchTableException, ResolvedNamespace, ResolvedTable, ResolvedView, 
TableAlreadyExistsException}
+import 
org.apache.spark.sql.catalyst.analysis.{CannotReplaceMissingTableException, 
NamespaceAlreadyExistsException, NoSuchFunctionException, 
NoSuchNamespaceException, NoSuchPartitionException, NoSuchTableException, 
ResolvedNamespace, ResolvedTable, ResolvedView, TableAlreadyExistsException}
 import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogTable, 
InvalidUDFClassException}
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
 import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
AttributeReference, AttributeSet, CreateMap, Expression, GroupingID, 
NamedExpression, SpecifiedWindowFrame, WindowFrame, WindowFunction, 
WindowSpecDefinition}
 import org.apache.spark.sql.catalyst.plans.JoinType
 import org.apache.spark.sql.catalyst.plans.logical.{InsertIntoStatement, Join, 
LogicalPlan, SerdeInfo, Window}
@@ -1696,6 +1699,369 @@ private[spark] object QueryCompilationErrors {
   s"Found duplicate column(s) $colType: ${duplicateCol.sorted.mkString(", 
")}")
   }
 
+  def noSuchTableError(db: String, table: String): Throwable = {
+new NoSuchTableException(db = db, table = table)
+  }
+
+  def tempViewNotCachedForAnalyzingColumnsError(tableIdent: TableIdentifier): 
Throwable = {
+new AnalysisException(s"Temporary view $tableIdent is not cached for 
analyzing columns.")
+  }
+
+  def columnTypeNotSupportStatisticsCollectionError(
+  name: String,
+  tableIdent: TableIdentifier,
+  dataType: DataType): Throwable = {
+new AnalysisException(s"Column $name in table $tableIdent is of type 
$dataType, " +
+  "and Spark does not support statistics collection on this column type.")
+  }
+
+  def analyzeTableNotSupportedOnViewsError(): Throwable = {
+new AnalysisException("ANALYZE TABLE is not supported on views.")
+  }
+
+  de

[spark] branch master updated: [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType

2021-07-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c605ba2  [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for 
TimestampNTZType
c605ba2 is described below

commit c605ba2d46742ca13db794ca1be136a4b10b652e
Author: gengjiaan 
AuthorDate: Mon Jul 5 18:48:00 2021 +0800

[SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType

### What changes were proposed in this pull request?
This PR fix the incorrect comment for `TimestampNTZType`.

### Why are the changes needed?
Fix the incorrect comment

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
No need.

Closes #33218 from beliefer/SPARK-35664-followup.

Authored-by: gengjiaan 
Signed-off-by: Gengliang Wang 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
index 15a93a7..f23f3c6 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
@@ -116,7 +116,7 @@ object Encoders {
 
   /**
* Creates an encoder that serializes instances of the 
`java.time.LocalDateTime` class
-   * to the internal representation of nullable Catalyst's DateType.
+   * to the internal representation of nullable Catalyst's TimestampNTZType.
*
* @since 3.2.0
*/

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType

2021-07-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new d3e8c9c  [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for 
TimestampNTZType
d3e8c9c is described below

commit d3e8c9c78b364580523e3f915ee51369ca7df0bf
Author: gengjiaan 
AuthorDate: Mon Jul 5 18:48:00 2021 +0800

[SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType

### What changes were proposed in this pull request?
This PR fix the incorrect comment for `TimestampNTZType`.

### Why are the changes needed?
Fix the incorrect comment

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
No need.

Closes #33218 from beliefer/SPARK-35664-followup.

Authored-by: gengjiaan 
Signed-off-by: Gengliang Wang 
(cherry picked from commit c605ba2d46742ca13db794ca1be136a4b10b652e)
Signed-off-by: Gengliang Wang 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
index 15a93a7..f23f3c6 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
@@ -116,7 +116,7 @@ object Encoders {
 
   /**
* Creates an encoder that serializes instances of the 
`java.time.LocalDateTime` class
-   * to the internal representation of nullable Catalyst's DateType.
+   * to the internal representation of nullable Catalyst's TimestampNTZType.
*
* @since 3.2.0
*/

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-35979][SQL] Return different timestamp literals based on the default timestamp type

2021-07-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new a9947cb  [SPARK-35979][SQL] Return different timestamp literals based 
on the default timestamp type
a9947cb is described below

commit a9947cbd716b83e2f65dfec035c7abf29ea40922
Author: Gengliang Wang 
AuthorDate: Tue Jul 6 00:54:58 2021 +0800

[SPARK-35979][SQL] Return different timestamp literals based on the default 
timestamp type

### What changes were proposed in this pull request?

For the timestamp literal, it should have the following behavior.
1. When `spark.sql.timestampType` is TIMESTAMP_NTZ: if there is no time 
zone part, return timestamp without time zone literal; otherwise, return 
timestamp with local time zone literal

2. When `spark.sql.timestampType` is TIMESTAMP_LTZ: return timestamp with 
local time zone literal

### Why are the changes needed?

When the default timestamp type is TIMESTAMP_NTZ, the result of type 
literal should return TIMESTAMP_NTZ when there is no time zone part in the 
string.

From setion 5.3 "literal" of ANSI SQL standard 2011:
```
27) The declared type of a  that does not specify  is TIMESTAMP(P) WITHOUT TIME ZONE, where P is the number of 
digits in , if specified, and 0 (zero) otherwise. The 
declared type of a  that specifies  is 
TIMESTAMP(P) WITH TIME ZONE, where P is the number of digits in , if specified, and 0 (zero) otherwise.
```
Since we don't have "timestamp with time zone", we use timestamp with local 
time zone instead.
### Does this PR introduce _any_ user-facing change?

No, the new timestmap type and the default timestamp configuration is not 
released yet.

### How was this patch tested?

Unit test

Closes #33215 from gengliangwang/tsLiteral.
    
Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 2fffec7de8d31bd01c8acd8bca72acacaf189c97)
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/catalyst/parser/AstBuilder.scala | 32 ++
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 31 +
 .../org/apache/spark/sql/internal/SQLConf.scala|  6 ++--
 .../catalyst/parser/ExpressionParserSuite.scala| 12 
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 15 ++
 5 files changed, 83 insertions(+), 13 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 361ecc1..5b9107f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -38,8 +38,8 @@ import org.apache.spark.sql.catalyst.parser.SqlBaseParser._
 import org.apache.spark.sql.catalyst.plans._
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.trees.CurrentOrigin
-import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, IntervalUtils}
-import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, 
convertSpecialTimestamp, getZoneId, stringToDate, stringToTimestamp}
+import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, 
IntervalUtils}
+import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, 
convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, 
stringToTimestamp, stringToTimestampWithoutTimeZone}
 import org.apache.spark.sql.catalyst.util.IntervalUtils.IntervalUnit
 import org.apache.spark.sql.connector.catalog.{SupportsNamespaces, 
TableCatalog}
 import org.apache.spark.sql.connector.catalog.TableChange.ColumnPosition
@@ -2126,9 +2126,31 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
   val specialDate = convertSpecialDate(value, zoneId).map(Literal(_, 
DateType))
   specialDate.getOrElse(toLiteral(stringToDate, DateType))
 case "TIMESTAMP" =>
-  val zoneId = getZoneId(conf.sessionLocalTimeZone)
-  val specialTs = convertSpecialTimestamp(value, 
zoneId).map(Literal(_, TimestampType))
-  specialTs.getOrElse(toLiteral(stringToTimestamp(_, zoneId), 
TimestampType))
+  def constructTimestampLTZLiteral(value: String): Literal = {
+val zoneId = getZoneId(conf.sessionLocalTimeZone)
+val specialTs = convertSpecialTimestamp(value, 
zoneId).map(Literal(_, TimestampType))
+specialTs.getOrElse(toLiteral(stringToTimestamp(_, zoneId), 
TimestampType))
+  }
+
+  SQLConf.get.timestampType match {
+case TimestampNTZType =>
+  val sp

[spark] branch master updated (c605ba2 -> 2fffec7)

2021-07-05 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c605ba2  [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for 
TimestampNTZType
 add 2fffec7  [SPARK-35979][SQL] Return different timestamp literals based 
on the default timestamp type

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/parser/AstBuilder.scala | 32 ++
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 31 +
 .../org/apache/spark/sql/internal/SQLConf.scala|  6 ++--
 .../catalyst/parser/ExpressionParserSuite.scala| 12 
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 15 ++
 5 files changed, 83 insertions(+), 13 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-35978][SQL] Support non-reserved keyword TIMESTAMP_LTZ

2021-07-06 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new e09feda  [SPARK-35978][SQL] Support non-reserved keyword TIMESTAMP_LTZ
e09feda is described below

commit e09feda1d23a89a6f15a900f8001405f47b7e058
Author: Gengliang Wang 
AuthorDate: Tue Jul 6 14:33:22 2021 +0800

[SPARK-35978][SQL] Support non-reserved keyword TIMESTAMP_LTZ

### What changes were proposed in this pull request?

Support new keyword `TIMESTAMP_LTZ`, which can be used for:

- timestamp with local time zone data type in DDL
- timestamp with local time zone data type in Cast clause.
- timestamp with local time zone data type literal

### Why are the changes needed?

Users can use `TIMESTAMP_LTZ` in DDL/Cast/Literals for the timestamp with 
local time zone type directly. The new keyword is independent of the SQL 
configuration `spark.sql.timestampType`.

### Does this PR introduce _any_ user-facing change?

No, the new timestamp type is not released yet.

### How was this patch tested?

Unit test

Closes #33224 from gengliangwang/TIMESTAMP_LTZ.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
(cherry picked from commit b0b9643cd76da48ed90e958e40717a664bc7494b)
Signed-off-by: Gengliang Wang 
---
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 16 ++--
 .../sql/catalyst/parser/DataTypeParserSuite.scala |  1 +
 .../sql/catalyst/parser/ExpressionParserSuite.scala   | 19 +++
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 680d781..d6363b5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -2119,6 +2119,13 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
 throw QueryParsingErrors.cannotParseValueTypeError(valueType, value, 
ctx)
   }
 }
+
+def constructTimestampLTZLiteral(value: String): Literal = {
+  val zoneId = getZoneId(conf.sessionLocalTimeZone)
+  val specialTs = convertSpecialTimestamp(value, zoneId).map(Literal(_, 
TimestampType))
+  specialTs.getOrElse(toLiteral(stringToTimestamp(_, zoneId), 
TimestampType))
+}
+
 try {
   valueType match {
 case "DATE" =>
@@ -2128,13 +2135,9 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
 case "TIMESTAMP_NTZ" =>
   val specialTs = convertSpecialTimestampNTZ(value).map(Literal(_, 
TimestampNTZType))
   specialTs.getOrElse(toLiteral(stringToTimestampWithoutTimeZone, 
TimestampNTZType))
+case "TIMESTAMP_LTZ" =>
+  constructTimestampLTZLiteral(value)
 case "TIMESTAMP" =>
-  def constructTimestampLTZLiteral(value: String): Literal = {
-val zoneId = getZoneId(conf.sessionLocalTimeZone)
-val specialTs = convertSpecialTimestamp(value, 
zoneId).map(Literal(_, TimestampType))
-specialTs.getOrElse(toLiteral(stringToTimestamp(_, zoneId), 
TimestampType))
-  }
-
   SQLConf.get.timestampType match {
 case TimestampNTZType =>
   val specialTs = convertSpecialTimestampNTZ(value).map(Literal(_, 
TimestampNTZType))
@@ -2529,6 +2532,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
   case ("date", Nil) => DateType
   case ("timestamp", Nil) => SQLConf.get.timestampType
   case ("timestamp_ntz", Nil) => TimestampNTZType
+  case ("timestamp_ltz", Nil) => TimestampType
   case ("string", Nil) => StringType
   case ("character" | "char", length :: Nil) => 
CharType(length.getText.toInt)
   case ("varchar", length :: Nil) => VarcharType(length.getText.toInt)
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala
index d34..97dd0db 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala
@@ -59,6 +59,7 @@ class DataTypeParserSuite extends SparkFunSuite with 
SQLHelper {
   checkDataType("DATE", DateType)
   checkDataType("timestamp", TimestampType)

[spark] branch master updated (9544277 -> b0b9643)

2021-07-06 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9544277  [SPARK-35788][SS] Metrics support for RocksDB instance
 add b0b9643  [SPARK-35978][SQL] Support non-reserved keyword TIMESTAMP_LTZ

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 16 ++--
 .../sql/catalyst/parser/DataTypeParserSuite.scala |  1 +
 .../sql/catalyst/parser/ExpressionParserSuite.scala   | 19 +++
 3 files changed, 26 insertions(+), 10 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2febd5c -> 733e85f1)

2021-06-30 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2febd5c  [SPARK-35735][SQL] Take into account day-time interval fields 
in cast
 add 733e85f1 [SPARK-35953][SQL] Support extracting date fields from 
timestamp without time zone

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/AnsiTypeCoercion.scala   |   2 +-
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |   4 +-
 .../test/resources/sql-tests/inputs/extract.sql|  92 +++
 .../resources/sql-tests/results/extract.sql.out| 276 ++---
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |   8 +-
 5 files changed, 192 insertions(+), 190 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9b387a1 -> 7fd3f8f)

2021-05-04 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9b387a1  [SPARK-35308][TESTS] Fix bug in SPARK-35266 that creates 
benchmark files in invalid path with wrong name
 add 7fd3f8f  [SPARK-35294][SQL] Add tree traversal pruning in rules with 
dedicated files under optimizer

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/complexTypeCreator.scala  |  3 +++
 .../sql/catalyst/expressions/complexTypeExtractors.scala |  5 -
 .../spark/sql/catalyst/expressions/jsonExpressions.scala |  3 +++
 .../spark/sql/catalyst/expressions/namedExpressions.scala|  3 ++-
 .../spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala |  3 ++-
 .../sql/catalyst/optimizer/LimitPushDownThroughWindow.scala  |  4 +++-
 .../spark/sql/catalyst/optimizer/OptimizeCsvJsonExprs.scala  | 12 +---
 .../sql/catalyst/optimizer/PropagateEmptyRelation.scala  |  4 +++-
 .../sql/catalyst/optimizer/PullOutGroupingExpressions.scala  |  3 ++-
 .../sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala|  3 ++-
 .../sql/catalyst/optimizer/ReplaceExceptWithFilter.scala |  3 ++-
 .../sql/catalyst/optimizer/RewriteDistinctAggregates.scala   |  4 +++-
 .../catalyst/optimizer/SimplifyConditionalsInPredicate.scala |  4 +++-
 .../catalyst/optimizer/UnwrapCastInBinaryComparison.scala|  6 --
 .../spark/sql/catalyst/plans/logical/LocalRelation.scala |  3 +++
 .../sql/catalyst/plans/logical/basicLogicalOperators.scala   | 10 ++
 .../apache/spark/sql/catalyst/rules/RuleIdCollection.scala   |  8 +++-
 .../org/apache/spark/sql/catalyst/trees/TreePatterns.scala   |  9 +
 18 files changed, 74 insertions(+), 16 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (712a62c -> 2298ceb)

2021-03-24 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 712a62c  [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to 
true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully
 add 2298ceb  [SPARK-34477][CORE] Register KryoSerializers for Avro 
GenericData classes

No new revisions were added by this update.

Summary of changes:
 .../spark/serializer/GenericAvroSerializer.scala   | 29 
 .../apache/spark/serializer/KryoSerializer.scala   | 16 -
 .../serializer/GenericAvroSerializerSuite.scala| 78 +++---
 3 files changed, 81 insertions(+), 42 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34856][SQL] ANSI mode: Allow casting complex types as string type

2021-03-25 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0515f49  [SPARK-34856][SQL] ANSI mode: Allow casting complex types as 
string type
0515f49 is described below

commit 0515f490189466c5f13aa4f647e81aeb6c24d0bf
Author: Gengliang Wang 
AuthorDate: Fri Mar 26 00:17:43 2021 +0800

[SPARK-34856][SQL] ANSI mode: Allow casting complex types as string type

### What changes were proposed in this pull request?

Allow casting complex types as string type in ANSI mode.

### Why are the changes needed?

Currently, complex types are not allowed to cast as string type. This 
breaks the DataFrame.show() API. E.g
```
scala> sql(“select array(1, 2, 2)“).show(false)
org.apache.spark.sql.AnalysisException: cannot resolve ‘CAST(`array(1, 2, 
2)` AS STRING)’ due to data type mismatch:
 cannot cast array to string with ANSI mode on.
```
We should allow the conversion as the extension of the ANSI SQL standard, 
so that the DataFrame.show() still work in ANSI mode.
### Does this PR introduce _any_ user-facing change?

Yes, casting complex types as string type is now allowed in ANSI mode.

### How was this patch tested?

Unit tests.

Closes #31954 from gengliangwang/fixExplicitCast.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 docs/sql-ref-ansi-compliance.md|   9 +-
 .../spark/sql/catalyst/expressions/Cast.scala  |   9 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala | 228 ++---
 3 files changed, 119 insertions(+), 127 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index 557f27b..f4fd712 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -76,6 +76,9 @@ The type conversion of Spark ANSI mode follows the syntax 
rules of section 6.13
  straightforward type conversions which are disallowed as per the ANSI 
standard:
 * NumericType <=> BooleanType
 * StringType <=> BinaryType
+* ArrayType => String
+* MapType => String
+* StructType => String
 
  The valid combinations of target data type and source data type in a `CAST` 
expression are given by the following table.
 “Y” indicates that the combination is syntactically valid without restriction 
and “N” indicates that the combination is not valid.
@@ -89,9 +92,9 @@ The type conversion of Spark ANSI mode follows the syntax 
rules of section 6.13
 | Interval  | N   | Y  | N| N | Y| N   | N 
 | N | N   | N  |
 | Boolean   | Y   | Y  | N| N | N| Y   | N 
 | N | N   | N  |
 | Binary| N   | Y  | N| N | N| N   | Y 
 | N | N   | N  |
-| Array | N   | N  | N| N | N| N   | N 
 | **Y** | N   | N  |
-| Map   | N   | N  | N| N | N| N   | N 
 | N | **Y** | N  |
-| Struct| N   | N  | N| N | N| N   | N 
 | N | N   | **Y** |
+| Array | N   | Y  | N| N | N| N   | N 
 | **Y** | N   | N  |
+| Map   | N   | Y  | N| N | N| N   | N 
 | N | **Y** | N  |
+| Struct| N   | Y  | N| N | N| N   | N 
 | N | N   | **Y** |
 
 In the table above, all the `CAST`s that can cause runtime exceptions are 
marked as red **Y**:
 * CAST(Numeric AS Numeric): raise an overflow exception if the value is out of 
the target data type's range.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index 9135e6c..7599947 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -1873,6 +1873,8 @@ object AnsiCast {
 
 case (NullType, _) => true
 
+case (_, StringType) => true
+
 case (StringType, _: BinaryType) => true
 
 case (StringType, BooleanType) => true
@@ -1890,13 +1892,6 @@ object AnsiCast {
 case (StringType, _: NumericType) => true
 case (BooleanType, _: NumericType) => true
 
-case (_: NumericType, StringType) => true
-case (_: DateType, StringType) => true
-case (_: TimestampType, StringType) => true
-case (_: CalendarIntervalType, StringType) => true
-case (BooleanType, StringType) => true
-case (BinaryType, StringType) => true
-
 case (ArrayType(fromType, fn), ArrayType(toType, tn)) =>
   canCast(fromType

[spark] branch master updated (1c3bdab -> 48ef9bd)

2021-03-31 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1c3bdab  [SPARK-34911][SQL] Fix code not close issue in monitoring.md
 add 48ef9bd  [SPARK-34915][INFRA] Cache Maven, SBT and Scala in all jobs 
that use them

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 44 
 1 file changed, 44 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34881][SQL] New SQL Function: TRY_CAST

2021-03-31 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3951e33  [SPARK-34881][SQL] New SQL Function: TRY_CAST
3951e33 is described below

commit 3951e3371a83578a81474ed99fb50d59f27aac62
Author: Gengliang Wang 
AuthorDate: Wed Mar 31 20:47:04 2021 +0800

[SPARK-34881][SQL] New SQL Function: TRY_CAST

### What changes were proposed in this pull request?

Add a new SQL function `try_cast`.
`try_cast` is identical to  `AnsiCast` (or `Cast` when 
`spark.sql.ansi.enabled` is true), except it returns NULL instead of raising an 
error.
This expression has one major difference from `cast` with 
`spark.sql.ansi.enabled` as true: when the source value can't be stored in the 
target integral(Byte/Short/Int/Long) type, `try_cast` returns null instead of 
returning the low order bytes of the source value.
Note that the result of `try_cast` is not affected by the configuration 
`spark.sql.ansi.enabled`.

This is learned from Google BigQuery and Snowflake:
https://docs.snowflake.com/en/sql-reference/functions/try_cast.html

https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#safe_casting

### Why are the changes needed?

This is an useful for the following scenarios:
1. When ANSI mode is on, users can choose `try_cast` an alternative way to 
run SQL without errors for certain operations.
2. When ANSI mode is off, users can use `try_cast` to get a more reasonable 
result for casting a value to an integral type: when an overflow error happens, 
`try_cast` returns null while `cast` returns the low order bytes of the source 
value.

### Does this PR introduce _any_ user-facing change?

Yes, adding a new function `try_cast`

### How was this patch tested?

Unit tests.

Closes #31982 from gengliangwang/tryCast.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 docs/sql-ref-ansi-compliance.md|   1 +
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   5 +-
 .../spark/sql/catalyst/expressions/Cast.scala  |  27 +--
 .../spark/sql/catalyst/expressions/TryCast.scala   |  85 
 .../spark/sql/catalyst/parser/AstBuilder.scala |   8 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala |  52 +++--
 .../sql/catalyst/expressions/TryCastSuite.scala|  51 +
 .../test/resources/sql-tests/inputs/try_cast.sql   |  54 +
 .../resources/sql-tests/results/try_cast.sql.out   | 234 +
 9 files changed, 486 insertions(+), 31 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index f4fd712..70a1fa3 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -434,6 +434,7 @@ Below is a list of all the keywords in Spark SQL.
 |TRIM|non-reserved|non-reserved|non-reserved|
 |TRUE|non-reserved|non-reserved|reserved|
 |TRUNCATE|non-reserved|non-reserved|reserved|
+|TRY_CAST|non-reserved|non-reserved|non-reserved|
 |TYPE|non-reserved|non-reserved|non-reserved|
 |UNARCHIVE|non-reserved|non-reserved|non-reserved|
 |UNBOUNDED|non-reserved|non-reserved|non-reserved|
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index e694eda..55ba375 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -805,7 +805,7 @@ primaryExpression
 : name=(CURRENT_DATE | CURRENT_TIMESTAMP)  
#currentDatetime
 | CASE whenClause+ (ELSE elseExpression=expression)? END   
#searchedCase
 | CASE value=expression whenClause+ (ELSE elseExpression=expression)? END  
#simpleCase
-| CAST '(' expression AS dataType ')'  
#cast
+| name=(CAST | TRY_CAST) '(' expression AS dataType ')'
#cast
 | STRUCT '(' (argument+=namedExpression (',' argument+=namedExpression)*)? 
')' #struct
 | FIRST '(' expression (IGNORE NULLS)? ')' 
#first
 | LAST '(' expression (IGNORE NULLS)? ')'  
#last
@@ -1199,6 +1199,7 @@ ansiNonReserved
 | TRIM
 | TRUE
 | TRUNCATE
+| TRY_CAST
 | TYPE
 | UNARCHIVE
 | UNBOUNDED
@@ -1461,6 +1462,7 @@ nonReserved
 | TRIM
 | TRUE
 | TRUNCATE
+| TRY_CAST
 | TYPE
 | UNARCHIVE
 | UNBOUNDED
@@ -1720,6 +1722,7 @@ TRANSFORM: 'TRANSFORM';
 TRIM: 'TRIM';
 TRUE: 'TRUE';
 TRUNCATE

[spark] branch master updated (3c7d6c3 -> f208d80)

2021-04-07 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3c7d6c3  [SPARK-27658][SQL] Add FunctionCatalog API
 add f208d80  [SPARK-34970][SQL][SERCURITY] Redact map-type options in the 
output of explain()

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/trees/TreeNode.scala | 17 ++-
 .../resources/sql-tests/results/describe.sql.out   |  2 +-
 .../scala/org/apache/spark/sql/ExplainSuite.scala  | 53 ++
 3 files changed, 69 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-34881][SQL][FOLLOW-UP] Use multiline string for TryCast' expression description

2021-04-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8a2138d  [SPARK-34881][SQL][FOLLOW-UP] Use multiline string for 
TryCast' expression description
8a2138d is described below

commit 8a2138d09f489512e229c6a9e9860d7bf9ac6445
Author: Hyukjin Kwon 
AuthorDate: Thu Apr 1 14:50:05 2021 +0800

[SPARK-34881][SQL][FOLLOW-UP] Use multiline string for TryCast' expression 
description

### What changes were proposed in this pull request?

This PR fixes JDK 11 compilation failed:

```

/home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala:35:
 error: annotation argument needs to be a constant; found: "_FUNC_(expr AS 
type) - Casts the value `expr` to the target data type `type`. ".+("This 
expression is identical to CAST with configuration `spark.sql.ansi.enabled` as 
").+("true, except it returns NULL instead of raising an error. Note that the 
behavior of this ").+("expression doesn\'t depend on configuration  [...]
"true, except it returns NULL instead of raising an error. Note that 
the behavior of this " +
```

For whatever reason, it doesn't know that the string is actually a 
constant. This PR simply switches it to multi-line style (which is actually 
more correct).

Reference:


https://github.com/apache/spark/blob/bd0990e3e813d17065c593fc74f383b494fe8146/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L53-L57

### Why are the changes needed?

To recover the build.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

 CI in this PR

Closes #32019 from HyukjinKwon/SPARK-34881.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: HyukjinKwon 
Signed-off-by: Gengliang Wang 
---
 .../org/apache/spark/sql/catalyst/expressions/TryCast.scala| 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala
index aba76db..cae25a2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala
@@ -30,10 +30,12 @@ import org.apache.spark.sql.types.DataType
  * session local timezone by an analyzer [[ResolveTimeZone]].
  */
 @ExpressionDescription(
-  usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data 
type `type`. " +
-"This expression is identical to CAST with configuration 
`spark.sql.ansi.enabled` as " +
-"true, except it returns NULL instead of raising an error. Note that the 
behavior of this " +
-"expression doesn't depend on configuration `spark.sql.ansi.enabled`.",
+  usage = """
+_FUNC_(expr AS type) - Casts the value `expr` to the target data type 
`type`.
+  This expression is identical to CAST with configuration 
`spark.sql.ansi.enabled` as
+  true, except it returns NULL instead of raising an error. Note that the 
behavior of this
+  expression doesn't depend on configuration `spark.sql.ansi.enabled`.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('10' as int);

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (53e4dba -> 2b1c170)

2021-03-04 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 53e4dba  [SPARK-34599][SQL] Fix the issue that INSERT INTO OVERWRITE 
doesn't support partition columns containing dot for DSv2
 add 2b1c170  [SPARK-34614][SQL] ANSI mode: Casting String to Boolean 
should throw exception on parse error

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md|   1 +
 .../spark/sql/catalyst/expressions/Cast.scala  |  14 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala | 244 +
 .../sql-tests/results/postgreSQL/boolean.sql.out   |  85 +++
 4 files changed, 264 insertions(+), 80 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 952 matches

Mail list logo