[spark] branch master updated (6f68ccf -> d691d85)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6f68ccf [SPARK-31257][SPARK-33561][SQL] Unify create table syntax add d691d85 [SPARK-33496][SQL] Improve error message of ANSI explicit cast No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/Cast.scala | 51 +- .../spark/sql/catalyst/expressions/CastSuite.scala | 38 +--- 2 files changed, 82 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8594958 -> 29e415d)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8594958 [SPARK-33650][SQL] Fix the error from ALTER TABLE .. ADD/DROP PARTITION for non-supported partition management table add 29e415d [SPARK-33649][SQL][DOC] Improve the doc of spark.sql.ansi.enabled No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md | 3 ++- .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 ++- 2 files changed, 8 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c24f2b2 -> 5d0045e)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c24f2b2 [SPARK-33612][SQL] Add dataSourceRewriteRules batch to Optimizer add 5d0045e [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 16 ++-- core/src/test/scala/org/apache/spark/ui/UISuite.scala| 9 + 2 files changed, 15 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6abfeb6 [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL 6abfeb6 is described below commit 6abfeb6884a3cdfe4c6e621219e6cf5a35d6467e Author: Gengliang Wang AuthorDate: Wed Dec 2 01:36:41 2020 +0800 [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL ### What changes were proposed in this pull request? When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server), the request URL can be encoded twice if we pass the query string directly to the constructor of `java.net.URI`: ``` > val uri = "http://localhost:8081/test; > val query = "order%5B0%5D%5Bcolumn%5D=0" // query string of URL from the reverse proxy > val rewrittenURI = URI.create(uri.toString()) > new URI(rewrittenURI.getScheme(), rewrittenURI.getAuthority(), rewrittenURI.getPath(), query, rewrittenURI.getFragment()).toString result: http://localhost:8081/test?order%255B0%255D%255Bcolumn%255D=0 ``` In Spark's stage page, the URL of "/taskTable" contains query parameter order[0][dir]. After encoding twice, the query parameter becomes `order%255B0%255D%255Bdir%255D` and it will be decoded as `order%5B0%5D%5Bdir%5D` instead of `order[0][dir]`. As a result, there will be NullPointerException from https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176 Other than that, the other parameter may not work as expected after encoded twice. This PR is to fix the bug by calling the method `URI.create(String URL)` directly. This convenience method can avoid encoding twice on the query parameter. ``` > val uri = "http://localhost:8081/test; > val query = "order%5B0%5D%5Bcolumn%5D=0" > URI.create(s"$uri?$query").toString result: http://localhost:8081/test?order%5B0%5D%5Bcolumn%5D=0 > URI.create(s"$uri?$query").getQuery result: order[0][column]=0 ``` ### Why are the changes needed? Fix a potential bug when Spark's reverse proxy is enabled. The bug itself is similar to https://github.com/apache/spark/pull/29271. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add a new unit test. Also, Manual UI testing for master, worker and app UI with an nginx proxy Spark config: ``` spark.ui.port 8080 spark.ui.reverseProxy=true spark.ui.reverseProxyUrl=/path/to/spark/ ``` nginx config: ``` server { listen 9000; set $SPARK_MASTER http://127.0.0.1:8080; # split spark UI path into prefix and local path within master UI location ~ ^(/path/to/spark/) { # strip prefix when forwarding request rewrite /path/to/spark(/.*) $1 break; #rewrite /path/to/spark/ "/" ; # forward to spark master UI proxy_pass $SPARK_MASTER; proxy_intercept_errors on; error_page 301 302 307 = handle_redirects; } location handle_redirects { set $saved_redirect_location '$upstream_http_location'; proxy_pass $saved_redirect_location; } } ``` Closes #30552 from gengliangwang/decodeProxyRedirect. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang (cherry picked from commit 5d0045eedf4b138c031accac2b1fa1e8d6f3f7c6) Signed-off-by: Gengliang Wang --- core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 16 ++-- core/src/test/scala/org/apache/spark/ui/UISuite.scala| 9 + 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala index a4ba565..3820a88 100644 --- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala +++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala @@ -400,17 +400,13 @@ private[spark] object JettyUtils extends Logging { uri.append(rest) } -val rewrittenURI = URI.create(uri.toString()) -if (query != null) { - return new URI( - rewrittenURI.getScheme(), - rewrittenURI.getAuthority(), - rewrittenURI.getPath(), - query, - rewrittenURI.getFragment() -).normalize() +val queryString = if (query == null) { + "" +} else { + s"?$query" } -rewrittenURI.normalize() +// SPARK-33611: use method `URI.cre
[spark] branch master updated (cdd8e51 -> f80fe21)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cdd8e51 [SPARK-33419][SQL] Unexpected behavior when using SET commands before a query in SparkSession.sql add f80fe21 [SPARK-33166][DOC] Provide Search Function in Spark docs site No new revisions were added by this update. Summary of changes: docs/_layouts/global.html | 23 +++ docs/css/docsearch.css| 36 2 files changed, 59 insertions(+) create mode 100644 docs/css/docsearch.css - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cdd8e51 -> f80fe21)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cdd8e51 [SPARK-33419][SQL] Unexpected behavior when using SET commands before a query in SparkSession.sql add f80fe21 [SPARK-33166][DOC] Provide Search Function in Spark docs site No new revisions were added by this update. Summary of changes: docs/_layouts/global.html | 23 +++ docs/css/docsearch.css| 36 2 files changed, 59 insertions(+) create mode 100644 docs/css/docsearch.css - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cdd8e51 -> f80fe21)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cdd8e51 [SPARK-33419][SQL] Unexpected behavior when using SET commands before a query in SparkSession.sql add f80fe21 [SPARK-33166][DOC] Provide Search Function in Spark docs site No new revisions were added by this update. Summary of changes: docs/_layouts/global.html | 23 +++ docs/css/docsearch.css| 36 2 files changed, 59 insertions(+) create mode 100644 docs/css/docsearch.css - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cdd8e51 -> f80fe21)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cdd8e51 [SPARK-33419][SQL] Unexpected behavior when using SET commands before a query in SparkSession.sql add f80fe21 [SPARK-33166][DOC] Provide Search Function in Spark docs site No new revisions were added by this update. Summary of changes: docs/_layouts/global.html | 23 +++ docs/css/docsearch.css| 36 2 files changed, 59 insertions(+) create mode 100644 docs/css/docsearch.css - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (74bd046 -> a180e02)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 74bd046 [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1 add a180e02 [SPARK-32852][SQL][DOC][FOLLOWUP] Revise the documentation of spark.sql.hive.metastore.jars No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/hive/HiveUtils.scala | 23 +++--- 1 file changed, 12 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (74bd046 -> a180e02)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 74bd046 [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1 add a180e02 [SPARK-32852][SQL][DOC][FOLLOWUP] Revise the documentation of spark.sql.hive.metastore.jars No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/hive/HiveUtils.scala | 23 +++--- 1 file changed, 12 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (74bd046 -> a180e02)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 74bd046 [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1 add a180e02 [SPARK-32852][SQL][DOC][FOLLOWUP] Revise the documentation of spark.sql.hive.metastore.jars No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/hive/HiveUtils.scala | 23 +++--- 1 file changed, 12 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (74bd046 -> a180e02)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 74bd046 [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1 add a180e02 [SPARK-32852][SQL][DOC][FOLLOWUP] Revise the documentation of spark.sql.hive.metastore.jars No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/hive/HiveUtils.scala | 23 +++--- 1 file changed, 12 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (74bd046 -> a180e02)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 74bd046 [SPARK-33475][BUILD] Bump ANTLR runtime version to 4.8-1 add a180e02 [SPARK-32852][SQL][DOC][FOLLOWUP] Revise the documentation of spark.sql.hive.metastore.jars No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/hive/HiveUtils.scala | 23 +++--- 1 file changed, 12 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b8a440f -> 2b6dfa5)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b8a440f [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends add 2b6dfa5 [SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix Revert proxy url No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/SparkContext.scala | 4 +- .../org/apache/spark/deploy/master/Master.scala| 8 +- .../spark/deploy/worker/ExecutorRunner.scala | 3 +- .../org/apache/spark/deploy/worker/Worker.scala| 9 +- .../main/scala/org/apache/spark/ui/UIUtils.scala | 3 +- .../apache/spark/deploy/master/MasterSuite.scala | 101 +++-- docs/configuration.md | 25 - 7 files changed, 140 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b8a440f -> 2b6dfa5)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b8a440f [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends add 2b6dfa5 [SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix Revert proxy url No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/SparkContext.scala | 4 +- .../org/apache/spark/deploy/master/Master.scala| 8 +- .../spark/deploy/worker/ExecutorRunner.scala | 3 +- .../org/apache/spark/deploy/worker/Worker.scala| 9 +- .../main/scala/org/apache/spark/ui/UIUtils.scala | 3 +- .../apache/spark/deploy/master/MasterSuite.scala | 101 +++-- docs/configuration.md | 25 - 7 files changed, 140 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b8a440f -> 2b6dfa5)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b8a440f [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends add 2b6dfa5 [SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix Revert proxy url No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/SparkContext.scala | 4 +- .../org/apache/spark/deploy/master/Master.scala| 8 +- .../spark/deploy/worker/ExecutorRunner.scala | 3 +- .../org/apache/spark/deploy/worker/Worker.scala| 9 +- .../main/scala/org/apache/spark/ui/UIUtils.scala | 3 +- .../apache/spark/deploy/master/MasterSuite.scala | 101 +++-- docs/configuration.md | 25 - 7 files changed, 140 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b8a440f -> 2b6dfa5)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b8a440f [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends add 2b6dfa5 [SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix Revert proxy url No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/SparkContext.scala | 4 +- .../org/apache/spark/deploy/master/Master.scala| 8 +- .../spark/deploy/worker/ExecutorRunner.scala | 3 +- .../org/apache/spark/deploy/worker/Worker.scala| 9 +- .../main/scala/org/apache/spark/ui/UIUtils.scala | 3 +- .../apache/spark/deploy/master/MasterSuite.scala | 101 +++-- docs/configuration.md | 25 - 7 files changed, 140 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b8a440f -> 2b6dfa5)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b8a440f [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends add 2b6dfa5 [SPARK-20044][UI] Support Spark UI behind front-end reverse proxy using a path prefix Revert proxy url No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/SparkContext.scala | 4 +- .../org/apache/spark/deploy/master/Master.scala| 8 +- .../spark/deploy/worker/ExecutorRunner.scala | 3 +- .../org/apache/spark/deploy/worker/Worker.scala| 9 +- .../main/scala/org/apache/spark/ui/UIUtils.scala | 3 +- .../apache/spark/deploy/master/MasterSuite.scala | 101 +++-- docs/configuration.md | 25 - 7 files changed, 140 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7e8eb04 -> 551b504)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7e8eb04 [SPARK-33314][SQL] Avoid dropping rows in Avro reader add 551b504 [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSerializer.scala | 54 .../apache/spark/sql/avro/SchemaConverters.scala | 2 + .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 37 ++ .../org/apache/spark/sql/avro/AvroSuite.scala | 57 ++ 4 files changed, 140 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7e8eb04 -> 551b504)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7e8eb04 [SPARK-33314][SQL] Avoid dropping rows in Avro reader add 551b504 [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSerializer.scala | 54 .../apache/spark/sql/avro/SchemaConverters.scala | 2 + .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 37 ++ .../org/apache/spark/sql/avro/AvroSuite.scala | 57 ++ 4 files changed, 140 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7e8eb04 -> 551b504)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7e8eb04 [SPARK-33314][SQL] Avoid dropping rows in Avro reader add 551b504 [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSerializer.scala | 54 .../apache/spark/sql/avro/SchemaConverters.scala | 2 + .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 37 ++ .../org/apache/spark/sql/avro/AvroSuite.scala | 57 ++ 4 files changed, 140 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7e8eb04 -> 551b504)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7e8eb04 [SPARK-33314][SQL] Avoid dropping rows in Avro reader add 551b504 [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSerializer.scala | 54 .../apache/spark/sql/avro/SchemaConverters.scala | 2 + .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 37 ++ .../org/apache/spark/sql/avro/AvroSuite.scala | 57 ++ 4 files changed, 140 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7e8eb04 -> 551b504)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7e8eb04 [SPARK-33314][SQL] Avoid dropping rows in Avro reader add 551b504 [SPARK-33316][SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSerializer.scala | 54 .../apache/spark/sql/avro/SchemaConverters.scala | 2 + .../apache/spark/sql/avro/AvroFunctionsSuite.scala | 37 ++ .../org/apache/spark/sql/avro/AvroSuite.scala | 57 ++ 4 files changed, 140 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d163110 -> f6c00079)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d163110 [SPARK-32934][SQL][FOLLOW-UP] Refine class naming and code comments add f6c00079 [SPARK-33342][WEBUI] fix the wrong url and display name of blocking thread in threadDump page No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d163110 -> f6c00079)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d163110 [SPARK-32934][SQL][FOLLOW-UP] Refine class naming and code comments add f6c00079 [SPARK-33342][WEBUI] fix the wrong url and display name of blocking thread in threadDump page No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d163110 -> f6c00079)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d163110 [SPARK-32934][SQL][FOLLOW-UP] Refine class naming and code comments add f6c00079 [SPARK-33342][WEBUI] fix the wrong url and display name of blocking thread in threadDump page No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d163110 -> f6c00079)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d163110 [SPARK-32934][SQL][FOLLOW-UP] Refine class naming and code comments add f6c00079 [SPARK-33342][WEBUI] fix the wrong url and display name of blocking thread in threadDump page No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d163110 -> f6c00079)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d163110 [SPARK-32934][SQL][FOLLOW-UP] Refine class naming and code comments add f6c00079 [SPARK-33342][WEBUI] fix the wrong url and display name of blocking thread in threadDump page No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34005][CORE][3.1] Update peak memory metrics for each Executor on task end
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 7b870e3 [SPARK-34005][CORE][3.1] Update peak memory metrics for each Executor on task end 7b870e3 is described below commit 7b870e38d7c6ff46e16785e31a471120fe5b8428 Author: Kousuke Saruta AuthorDate: Wed Jan 20 19:50:05 2021 +0800 [SPARK-34005][CORE][3.1] Update peak memory metrics for each Executor on task end ### What changes were proposed in this pull request? This PR backports SPARK-34005 (#31029). This PR makes `AppStatusListener` update the peak memory metrics for each Executor on task end like other peak memory metrics (e.g, stage, executors in a stage). ### Why are the changes needed? When `AppStatusListener#onExecutorMetricsUpdate` is called, peak memory metrics for Executors, stages and executors in a stage are updated but currently, the metrics only for Executors are not updated on task end. ### Does this PR introduce _any_ user-facing change? Yes. Executor peak memory metrics is updated more accurately. ### How was this patch tested? After I run a job with `local-cluster[1,1,1024]` and visited `/api/v1//executors`, I confirmed `peakExecutorMemory` metrics is shown for an Executor even though the life time of each job is very short . I also modify the json files for `HistoryServerSuite`. Closes #31261 from sarutak/SPARK-34005-branch-3.1. Authored-by: Kousuke Saruta Signed-off-by: Gengliang Wang --- .../apache/spark/status/AppStatusListener.scala| 1 + .../executor_list_json_expectation.json| 22 ++ .../executor_memory_usage_expectation.json | 88 ++ ...executor_node_excludeOnFailure_expectation.json | 88 ++ ...e_excludeOnFailure_unexcluding_expectation.json | 88 ++ 5 files changed, 287 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala index 6cb013b..52d41cd 100644 --- a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala +++ b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala @@ -759,6 +759,7 @@ private[spark] class AppStatusListener( exec.completedTasks += completedDelta exec.failedTasks += failedDelta exec.totalDuration += event.taskInfo.duration + exec.peakExecutorMetrics.compareAndUpdatePeakValues(event.taskExecutorMetrics) // Note: For resubmitted tasks, we continue to use the metrics that belong to the // first attempt of this task. This may not be 100% accurate because the first attempt diff --git a/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json b/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json index c18a2e3..be12507 100644 --- a/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json +++ b/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json @@ -21,6 +21,28 @@ "addTime" : "2015-02-03T16:43:00.906GMT", "executorLogs" : { }, "blacklistedInStages" : [ ], + "peakMemoryMetrics" : { +"JVMHeapMemory" : 0, +"JVMOffHeapMemory" : 0, +"OnHeapExecutionMemory" : 0, +"OffHeapExecutionMemory" : 0, +"OnHeapStorageMemory" : 0, +"OffHeapStorageMemory" : 0, +"OnHeapUnifiedMemory" : 0, +"OffHeapUnifiedMemory" : 0, +"DirectPoolMemory" : 0, +"MappedPoolMemory" : 0, +"ProcessTreeJVMVMemory" : 0, +"ProcessTreeJVMRSSMemory" : 0, +"ProcessTreePythonVMemory" : 0, +"ProcessTreePythonRSSMemory" : 0, +"ProcessTreeOtherVMemory" : 0, +"ProcessTreeOtherRSSMemory" : 0, +"MinorGCCount" : 0, +"MinorGCTime" : 0, +"MajorGCCount" : 0, +"MajorGCTime" : 0 + }, "attributes" : { }, "resources" : { }, "resourceProfileId" : 0, diff --git a/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json b/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json index 5144934..0a3eb81 100644 --- a/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json +++ b/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json @@ -64,6 +64,28 @@ "totalOffHeapStorageMemory" : 524288000 }, "blacklistedI
[spark] branch master updated (f2b22d1 -> bd9eeeb)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f2b22d1 [SPARK-34289][SQL] Parquet vectorized reader support column index add bd9eeeb [SPARK-34288][WEBUI] Add a tip info for the `resources` column in the executors page No new revisions were added by this update. Summary of changes: .../resources/org/apache/spark/ui/static/executorspage-template.html| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e79dd89 -> 1b1a8e4)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e79dd89 [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function add 1b1a8e4 [SPARK-30993][FOLLOWUP][SQL] Refactor LocalDateTimeUDT as YearUDT in UserDefinedTypeSuite No new revisions were added by this update. Summary of changes: .../apache/spark/sql/UserDefinedTypeSuite.scala| 34 ++ 1 file changed, 16 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (825b620 -> 84c5ca3)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 825b620 [SPARK-35687][SQL][TEST] PythonUDFSuite move assume into its methods add 84c5ca3 [SPARK-35664][SQL] Support java.time.LocalDateTime as an external type of TimestampWithoutTZ type No new revisions were added by this update. Summary of changes: .../expressions/SpecializedGettersReader.java | 3 ++ .../main/scala/org/apache/spark/sql/Encoders.scala | 8 + .../sql/catalyst/CatalystTypeConverters.scala | 21 +++- .../sql/catalyst/DeserializerBuildHelper.scala | 9 ++ .../apache/spark/sql/catalyst/InternalRow.scala| 4 +-- .../spark/sql/catalyst/JavaTypeInference.scala | 7 .../spark/sql/catalyst/ScalaReflection.scala | 10 ++ .../spark/sql/catalyst/SerializerBuildHelper.scala | 9 ++ .../apache/spark/sql/catalyst/dsl/package.scala| 4 +++ .../spark/sql/catalyst/encoders/RowEncoder.scala | 9 ++ .../expressions/InterpretedUnsafeProjection.scala | 2 +- .../catalyst/expressions/SpecificInternalRow.scala | 4 +-- .../expressions/codegen/CodeGenerator.scala| 5 +-- .../spark/sql/catalyst/expressions/literals.scala | 10 -- .../spark/sql/catalyst/util/DateTimeUtils.scala| 8 + .../org/apache/spark/sql/types/DataType.scala | 2 +- .../sql/catalyst/CatalystTypeConvertersSuite.scala | 31 +- .../sql/catalyst/encoders/RowEncoderSuite.scala| 10 ++ .../expressions/LiteralExpressionSuite.scala | 11 +++ .../sql/catalyst/util/DateTimeUtilsSuite.scala | 37 -- .../scala/org/apache/spark/sql/SQLImplicits.scala | 3 ++ .../org/apache/spark/sql/JavaDatasetSuite.java | 13 +--- .../scala/org/apache/spark/sql/DatasetSuite.scala | 5 +++ 23 files changed, 206 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ebb4858 -> 43f6b4a)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ebb4858 [SPARK-35058][SQL] Group exception messages in hive/client add 43f6b4a [SPARK-35674][SQL][TESTS] Test timestamp without time zone in UDF No new revisions were added by this update. Summary of changes: .../test/scala/org/apache/spark/sql/UDFSuite.scala | 28 ++ 1 file changed, 28 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (43f6b4a -> 0b5683a)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 43f6b4a [SPARK-35674][SQL][TESTS] Test timestamp without time zone in UDF add 0b5683a [SPARK-35694][INFRA] Increase the default JVM stack size of SBT/Maven No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 2 +- build/sbt| 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88f1d82 -> 4180692)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88f1d82 [SPARK-34524][SQL][FOLLOWUP] Remove unused checkAlterTablePartition in CheckAnalysis.scala add 4180692 [SPARK-35711][SQL] Support casting of timestamp without time zone to timestamp type No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/Cast.scala | 5 .../spark/sql/catalyst/expressions/CastSuite.scala | 32 ++ 2 files changed, 26 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (79362c4 -> 05e2b76)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 79362c4 [SPARK-34898][CORE] We should log SparkListenerExecutorMetricsUpdateEvent of `driver` appropriately when `spark.eventLog.logStageExecutorMetrics` is true add 05e2b76 [SPARK-35720][SQL] Support casting of String to timestamp without time zone type No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/Cast.scala | 28 + .../spark/sql/catalyst/util/DateTimeUtils.scala| 123 ++--- .../catalyst/expressions/AnsiCastSuiteBase.scala | 13 +++ .../spark/sql/catalyst/expressions/CastSuite.scala | 11 ++ .../sql/catalyst/expressions/CastSuiteBase.scala | 20 5 files changed, 180 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2f537a8 -> 2c4598d)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2f537a8 [SPARK-35469][PYTHON] Fix disallow_untyped_defs mypy checks add 2c4598d [SPARK-35608][SQL] Support AQE optimizer side transformUpWithPruning No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/trees/TreePatterns.scala | 1 + .../sql/execution/adaptive/AQEPropagateEmptyRelation.scala | 10 -- .../spark/sql/execution/adaptive/LogicalQueryStage.scala | 2 ++ 3 files changed, 11 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2c91672 -> a100a01)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2c91672 [SPARK-35775][SQL][TESTS] Check all year-month interval types in aggregate expressions add a100a01 [SPARK-35842][INFRA] Ignore all .idea folders No new revisions were added by this update. Summary of changes: .gitignore | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a87ee5d -> 960a7e5)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a87ee5d [SPARK-35695][SQL][FOLLOWUP] Use AQE helper to simplify the code in CollectMetricsExec add 960a7e5 [SPARK-35856][SQL][TESTS] Move new interval type test cases from CastSuite to CastBaseSuite No new revisions were added by this update. Summary of changes: .../catalyst/expressions/AnsiCastSuiteBase.scala | 8 ++ .../spark/sql/catalyst/expressions/CastSuite.scala | 124 +-- .../sql/catalyst/expressions/CastSuiteBase.scala | 133 - .../sql/catalyst/expressions/TryCastSuite.scala| 2 +- 4 files changed, 142 insertions(+), 125 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35845][SQL] OuterReference resolution should reject ambiguous column names
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 20edfdd [SPARK-35845][SQL] OuterReference resolution should reject ambiguous column names 20edfdd is described below commit 20edfdd39a83c52813f91e4028f816d06a6be99e Author: Wenchen Fan AuthorDate: Wed Jun 23 14:32:34 2021 +0800 [SPARK-35845][SQL] OuterReference resolution should reject ambiguous column names ### What changes were proposed in this pull request? The current OuterReference resolution is a bit weird: when the outer plan has more than one child, it resolves OuterReference from the output of each child, one by one, left to right. This is incorrect in the case of join, as the column name can be ambiguous if both left and right sides output this column. This PR fixes this bug by resolving OuterReference with `outerPlan.resolveChildren`, instead of something like `outerPlan.children.foreach(_.resolve(...))` ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? The problem only occurs in join, and join condition doesn't support correlated subquery yet. So this PR only improves the error message. Before this PR, people see ``` java.lang.UnsupportedOperationException Cannot generate code for expression: outer(t1a#291) ``` ### How was this patch tested? a new test Closes #33004 from cloud-fan/outer-ref. Authored-by: Wenchen Fan Signed-off-by: Gengliang Wang --- .../spark/sql/catalyst/analysis/Analyzer.scala | 35 +++--- .../catalyst/optimizer/DecorrelateInnerQuery.scala | 10 ++- .../spark/sql/catalyst/optimizer/subquery.scala| 26 .../optimizer/DecorrelateInnerQuerySuite.scala | 6 ++-- .../negative-cases/invalid-correlation.sql | 9 ++ .../negative-cases/invalid-correlation.sql.out | 24 ++- 6 files changed, 68 insertions(+), 42 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 555be01..ba680ba 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -2285,8 +2285,8 @@ class Analyzer(override val catalogManager: CatalogManager) } /** - * Resolve the correlated expressions in a subquery by using the an outer plans' references. All - * resolved outer references are wrapped in an [[OuterReference]] + * Resolve the correlated expressions in a subquery, as if the expressions live in the outer + * plan. All resolved outer references are wrapped in an [[OuterReference]] */ private def resolveOuterReferences(plan: LogicalPlan, outer: LogicalPlan): LogicalPlan = { plan.resolveOperatorsDownWithPruning(_.containsPattern(UNRESOLVED_ATTRIBUTE)) { @@ -2295,7 +2295,7 @@ class Analyzer(override val catalogManager: CatalogManager) case u @ UnresolvedAttribute(nameParts) => withPosition(u) { try { - outer.resolve(nameParts, resolver) match { + outer.resolveChildren(nameParts, resolver) match { case Some(outerAttr) => wrapOuterReference(outerAttr) case None => u } @@ -2317,7 +2317,7 @@ class Analyzer(override val catalogManager: CatalogManager) */ private def resolveSubQuery( e: SubqueryExpression, -plans: Seq[LogicalPlan])( +outer: LogicalPlan)( f: (LogicalPlan, Seq[Expression]) => SubqueryExpression): SubqueryExpression = { // Step 1: Resolve the outer expressions. var previous: LogicalPlan = null @@ -2328,10 +2328,8 @@ class Analyzer(override val catalogManager: CatalogManager) current = executeSameContext(current) // Use the outer references to resolve the subquery plan if it isn't resolved yet. -val i = plans.iterator -val afterResolve = current -while (!current.resolved && current.fastEquals(afterResolve) && i.hasNext) { - current = resolveOuterReferences(current, i.next()) +if (!current.resolved) { + current = resolveOuterReferences(current, outer) } } while (!current.resolved && !current.fastEquals(previous)) @@ -2354,20 +2352,20 @@ class Analyzer(override val catalogManager: CatalogManager) * (2) Any aggregate expression(s) that reference outer attributes are pushed down to * outer plan to get evaluated. */ -private def
[spark] branch master updated (758b423 -> 6f51e37)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 758b423 [SPARK-35860][SQL] Support UpCast between different field of YearMonthIntervalType/DayTimeIntervalType add 6f51e37 [SPARK-35857][SQL] The ANSI flag of Cast should be kept after being copied No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../spark/sql/catalyst/analysis/StreamingJoinHelper.scala | 2 +- .../org/apache/spark/sql/catalyst/expressions/Cast.scala | 11 --- .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala | 8 .../org/apache/spark/sql/catalyst/optimizer/expressions.scala | 10 +- .../sql/catalyst/plans/logical/QueryPlanConstraints.scala | 4 ++-- sql/core/src/main/scala/org/apache/spark/sql/Column.scala | 2 +- .../apache/spark/sql/execution/SubqueryBroadcastExec.scala| 2 +- .../sql/execution/analysis/DetectAmbiguousSelfJoin.scala | 2 +- .../scala/org/apache/spark/sql/hive/client/HiveShim.scala | 2 +- 10 files changed, 25 insertions(+), 20 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35817][SQL] Restore performance of queries against wide Avro tables
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 66d5a00 [SPARK-35817][SQL] Restore performance of queries against wide Avro tables 66d5a00 is described below commit 66d5a0049a638cec7c70566ea880897651aa95f1 Author: Bruce Robbins AuthorDate: Wed Jun 23 22:36:56 2021 +0800 [SPARK-35817][SQL] Restore performance of queries against wide Avro tables ### What changes were proposed in this pull request? When creating a record writer in an AvroDeserializer, or creating a struct converter in an AvroSerializer, look up Avro fields using a map rather than scanning the entire list of Avro fields. ### Why are the changes needed? A query against an Avro table can be quite slow when all are true: * There are many columns in the Avro file * The query contains a wide projection * There are many splits in the input * Some of the splits are read serially (e.g., less executors than there are tasks) A write to an Avro table can be quite slow when all are true: * There are many columns in the new rows * The operation is creating many files For example, a single-threaded query against a 6000 column Avro data set with 50K rows and 20 files takes less than a minute with Spark 3.0.1 but over 7 minutes with Spark 3.2.0-SNAPSHOT. This PR restores the faster time. For the 1000 column read benchmark: Before patch: 108447 ms After patch: 35925 ms percent improvement: 66% For the 1000 column write benchmark: Before patch: 123307 After patch: 42313 percent improvement: 65% ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? * Ran existing unit tests * Added new unit tests * Added new benchmarks Closes #32969 from bersprockets/SPARK-35817. Authored-by: Bruce Robbins Signed-off-by: Gengliang Wang --- .../avro/benchmarks/AvroReadBenchmark-results.txt | 115 +++-- .../avro/benchmarks/AvroWriteBenchmark-results.txt | 20 ++-- .../apache/spark/sql/avro/AvroDeserializer.scala | 3 +- .../org/apache/spark/sql/avro/AvroSerializer.scala | 4 +- .../org/apache/spark/sql/avro/AvroUtils.scala | 47 + .../spark/sql/avro/AvroSchemaHelperSuite.scala | 67 .../execution/benchmark/AvroReadBenchmark.scala| 31 ++ .../execution/benchmark/AvroWriteBenchmark.scala | 32 ++ 8 files changed, 239 insertions(+), 80 deletions(-) diff --git a/external/avro/benchmarks/AvroReadBenchmark-results.txt b/external/avro/benchmarks/AvroReadBenchmark-results.txt index f77db2d..5483cf6 100644 --- a/external/avro/benchmarks/AvroReadBenchmark-results.txt +++ b/external/avro/benchmarks/AvroReadBenchmark-results.txt @@ -2,129 +2,140 @@ SQL Single Numeric Column Scan -OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 4.18.0-193.6.3.el8_2.x86_64 +Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Sum2802 2826 34 5.6 178.1 1.0X +Sum2648 2658 15 5.9 168.3 1.0X -OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 4.18.0-193.6.3.el8_2.x86_64 +Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Sum2786 2810 35 5.6 177.1 1.0X +Sum2584 2624 56 6.1 164.3 1.0X -OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 4.18.0-193.6.3.el8_2.x86_64 +Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz SQL Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative
[spark] branch master updated: [SPARK-35831][YARN][TEST-MAVEN] Handle PathOperationException in copyFileToRemote on the same src and dest
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2b9902d [SPARK-35831][YARN][TEST-MAVEN] Handle PathOperationException in copyFileToRemote on the same src and dest 2b9902d is described below commit 2b9902d26a5b7e3aeecfed3aa21744d1d2016d26 Author: Dongjoon Hyun AuthorDate: Mon Jun 21 23:28:27 2021 +0800 [SPARK-35831][YARN][TEST-MAVEN] Handle PathOperationException in copyFileToRemote on the same src and dest ### What changes were proposed in this pull request? This PR aims to be more robust on the underlying Hadoop library changes. Apache Spark's `copyFileToRemote` has an option, `force`, to invoke copying always and it can hit `org.apache.hadoop.fs.PathOperationException` in some Hadoop versions. From Apache Hadoop 3.3.1, we reverted [HADOOP-16878](https://issues.apache.org/jira/browse/HADOOP-16878) as the last revert commit on `branch-3.3.1`. However, it's still in Apache Hadoop 3.4.0. - https://github.com/apache/hadoop/commit/a3b9c37a397ad4188041dd80621bdeefc46885f2 ### Why are the changes needed? Currently, Apache Spark Jenkins hits a flakiness issue. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2/lastCompletedBuild/testReport/org.apache.spark.deploy.yarn/ClientSuite/distribute_jars_archive/history/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/2459/testReport/junit/org.apache.spark.deploy.yarn/ClientSuite/distribute_jars_archive/ ``` org.apache.hadoop.fs.PathOperationException: `Source (file:/home/jenkins/workspace/spark-master-test-maven-hadoop-3.2/resource-managers/yarn/target/tmp/spark-703b8e99-63cc-4ba6-a9bc-25c7cae8f5f9/testJar9120517778809167117.jar) and destination (/home/jenkins/workspace/spark-master-test-maven-hadoop-3.2/resource-managers/yarn/target/tmp/spark-703b8e99-63cc-4ba6-a9bc-25c7cae8f5f9/testJar9120517778809167117.jar) are equal in the copy command.': Operation not supported at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:403) ``` Apache Spark has three cases. - `!compareFs(srcFs, destFs)`: This is safe because we will not have this exception. - `"file".equals(srcFs.getScheme)`: This is safe because this cannot be a `false` alarm. - `force=true`: - For the `good` alarm part, Spark works in the same way. - For the `false` alarm part, Spark is safe because we use `force = true` only for copying `localConfArchive` instead of a general copy between two random clusters. ```scala val localConfArchive = new Path(createConfArchive(confsToOverride).toURI()) copyFileToRemote(destDir, localConfArchive, replication, symlinkCache, force = true, destName = Some(LOCALIZED_CONF_ARCHIVE)) ``` ### Does this PR introduce _any_ user-facing change? No. This preserves the previous Apache Spark behavior. ### How was this patch tested? Pass the Jenkins with Maven. Closes #32983 from dongjoon-hyun/SPARK-35831. Authored-by: Dongjoon Hyun Signed-off-by: Gengliang Wang --- .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index 427202f..364bc3b 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -401,7 +401,13 @@ private[spark] class Client( if (force || !compareFs(srcFs, destFs) || "file".equals(srcFs.getScheme)) { destPath = new Path(destDir, destName.getOrElse(srcPath.getName())) logInfo(s"Uploading resource $srcPath -> $destPath") - FileUtil.copy(srcFs, srcPath, destFs, destPath, false, hadoopConf) + try { +FileUtil.copy(srcFs, srcPath, destFs, destPath, false, hadoopConf) + } catch { +// HADOOP-16878 changes the behavior to throw exceptions when src equals to dest +case e: PathOperationException +if srcFs.makeQualified(srcPath).equals(destFs.makeQualified(destPath)) => + } destFs.setReplication(destPath, replication) destFs.setPermission(destPath, new FsPermission(APP_FILE_PERMISSION)) } else { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6ca56b0 -> 2bdd9fe)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6ca56b0 [SPARK-35614][PYTHON] Make the conversion to pandas data-type-based for ExtensionDtypes add 2bdd9fe [SPARK-35839][SQL] New SQL function: to_timestamp_ntz No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 1 + .../catalyst/expressions/datetimeExpressions.scala | 99 ++- .../catalyst/util/DateTimeFormatterHelper.scala| 2 +- .../sql/catalyst/util/TimestampFormatter.scala | 55 +++- .../expressions/DateExpressionsSuite.scala | 29 +- .../apache/spark/sql/execution/HiveResult.scala| 3 +- .../sql-functions/sql-expression-schema.md | 5 +- .../test/resources/sql-tests/inputs/datetime.sql | 46 +++ .../sql-tests/results/ansi/datetime.sql.out| 325 - .../sql-tests/results/datetime-legacy.sql.out | 317 +++- .../resources/sql-tests/results/datetime.sql.out | 317 +++- .../SparkExecuteStatementOperation.scala | 3 +- 12 files changed, 1187 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bc61b62 -> ce53b71)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bc61b62 [SPARK-35727][SQL] Return INTERVAL DAY from dates subtraction add ce53b71 [SPARK-35854][SQL] Improve the error message of to_timestamp_ntz with invalid format pattern No new revisions were added by this update. Summary of changes: .../catalyst/util/DateTimeFormatterHelper.scala| 7 +- .../sql/catalyst/util/TimestampFormatter.scala | 29 -- .../spark/sql/errors/QueryExecutionErrors.scala| 10 ++-- .../sql-tests/results/ansi/datetime.sql.out| 12 - .../sql-tests/results/datetime-legacy.sql.out | 12 - .../resources/sql-tests/results/datetime.sql.out | 12 - 6 files changed, 53 insertions(+), 29 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (8bcc6a4 -> 4ad6001)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8bcc6a4 [SPARK-35885][K8S][R] Use keyserver.ubuntu.com as a keyserver for CRAN add 4ad6001 [SPARK-35817][SQL][3.1] Restore performance of queries against wide Avro tables No new revisions were added by this update. Summary of changes: .../avro/benchmarks/AvroReadBenchmark-results.txt | 115 +++-- .../avro/benchmarks/AvroWriteBenchmark-results.txt | 20 ++-- .../apache/spark/sql/avro/AvroDeserializer.scala | 3 +- .../org/apache/spark/sql/avro/AvroSerializer.scala | 3 +- .../org/apache/spark/sql/avro/AvroUtils.scala | 50 + .../spark/sql/avro/AvroSchemaHelperSuite.scala | 67 .../execution/benchmark/AvroReadBenchmark.scala| 31 ++ .../execution/benchmark/AvroWriteBenchmark.scala | 32 ++ 8 files changed, 241 insertions(+), 80 deletions(-) create mode 100644 external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSchemaHelperSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35889][SQL] Support adding TimestampWithoutTZ with Interval types
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9814cf8 [SPARK-35889][SQL] Support adding TimestampWithoutTZ with Interval types 9814cf8 is described below commit 9814cf88533c049036cee5f6d62346f237dcec19 Author: Gengliang Wang AuthorDate: Fri Jun 25 19:58:42 2021 +0800 [SPARK-35889][SQL] Support adding TimestampWithoutTZ with Interval types ### What changes were proposed in this pull request? Supprot the following operations: - TimestampWithoutTZ + Calendar interval - TimestampWithoutTZ + Year-Month interval - TimestampWithoutTZ + Daytime interval ### Why are the changes needed? Support basic '+' operator for timestamp without time zone type. ### Does this PR introduce _any_ user-facing change? No, the timestamp without time zone type is not release yet. ### How was this patch tested? Unit tests Closes #33076 from gengliangwang/addForNewTS. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- .../spark/sql/catalyst/analysis/Analyzer.scala | 6 +- .../catalyst/expressions/datetimeExpressions.scala | 28 ++- .../apache/spark/sql/types/AbstractDataType.scala | 8 + .../expressions/DateExpressionsSuite.scala | 245 +++-- .../test/resources/sql-tests/inputs/datetime.sql | 11 + .../sql-tests/results/ansi/datetime.sql.out| 76 ++- .../sql-tests/results/datetime-legacy.sql.out | 76 ++- .../resources/sql-tests/results/datetime.sql.out | 76 ++- .../typeCoercion/native/dateTimeOperations.sql.out | 54 ++--- 9 files changed, 424 insertions(+), 156 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 0a3bd09..6737ed5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -357,8 +357,10 @@ class Analyzer(override val catalogManager: CatalogManager) case (_: DayTimeIntervalType, DateType) => TimeAdd(Cast(r, TimestampType), l) case (DateType, _: YearMonthIntervalType) => DateAddYMInterval(l, r) case (_: YearMonthIntervalType, DateType) => DateAddYMInterval(r, l) - case (TimestampType, _: YearMonthIntervalType) => TimestampAddYMInterval(l, r) - case (_: YearMonthIntervalType, TimestampType) => TimestampAddYMInterval(r, l) + case (TimestampType | TimestampWithoutTZType, _: YearMonthIntervalType) => +TimestampAddYMInterval(l, r) + case (_: YearMonthIntervalType, TimestampType | TimestampWithoutTZType) => +TimestampAddYMInterval(r, l) case (CalendarIntervalType, CalendarIntervalType) | (_: DayTimeIntervalType, _: DayTimeIntervalType) => a case (DateType, CalendarIntervalType) => DateAddInterval(l, r, ansiEnabled = f) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 63f6c03..d84b6eb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -59,6 +59,11 @@ trait TimeZoneAwareExpression extends Expression { def withTimeZone(timeZoneId: String): TimeZoneAwareExpression @transient lazy val zoneId: ZoneId = DateTimeUtils.getZoneId(timeZoneId.get) + + def zoneIdForType(dataType: DataType): ZoneId = dataType match { +case _: TimestampWithoutTZType => java.time.ZoneOffset.UTC +case _ => zoneId + } } trait TimestampFormatterHelper extends TimeZoneAwareExpression { @@ -1446,23 +1451,25 @@ case class TimeAdd(start: Expression, interval: Expression, timeZoneId: Option[S override def toString: String = s"$left + $right" override def sql: String = s"${left.sql} + ${right.sql}" override def inputTypes: Seq[AbstractDataType] = -Seq(TimestampType, TypeCollection(CalendarIntervalType, DayTimeIntervalType)) +Seq(TypeCollection.AllTimestampTypes, TypeCollection(CalendarIntervalType, DayTimeIntervalType)) - override def dataType: DataType = TimestampType + override def dataType: DataType = start.dataType override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = copy(timeZoneId = Option(timeZoneId)) + @transient private lazy val zoneIdInEval: ZoneId = zoneIdForType(left.dataType) + over
[spark] branch master updated (f49bf1a -> 74b3df8)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f49bf1a [SPARK-34382][SQL] Support LATERAL subqueries add 74b3df8 [SPARK-35698][SQL] Support casting of timestamp without time zone to strings No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/expressions/Cast.scala | 11 ++- .../spark/sql/catalyst/expressions/CastSuite.scala | 18 +- 2 files changed, 23 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b74260f -> c382d40)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b74260f [SPARK-35765][SQL] Distinct aggs are not duplicate sensitive add c382d40 [SPARK-35766][SQL][TESTS] Break down CastSuite/AnsiCastSuite into multiple files No new revisions were added by this update. Summary of changes: .../catalyst/expressions/AnsiCastSuiteBase.scala | 481 +++ .../spark/sql/catalyst/expressions/CastSuite.scala | 1357 +--- .../sql/catalyst/expressions/CastSuiteBase.scala | 930 ++ 3 files changed, 1412 insertions(+), 1356 deletions(-) create mode 100644 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AnsiCastSuiteBase.scala create mode 100644 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6c5fcac -> 02c99f1)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6c5fcac [SPARK-35373][BUILD] Check Maven artifact checksum in build/mvn add 02c99f1 [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 2 + .../spark/sql/catalyst/expressions/TryEval.scala | 110 + ...deterministicSuite.scala => TryEvalSuite.scala} | 32 -- .../sql-functions/sql-expression-schema.md | 4 +- .../resources/sql-tests/inputs/try_arithmetic.sql | 11 +++ .../sql-tests/results/try_arithmetic.sql.out | 66 + 6 files changed, 215 insertions(+), 10 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala copy sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/{NondeterministicSuite.scala => TryEvalSuite.scala} (56%) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/try_arithmetic.sql create mode 100644 sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c4ca232 -> 7c9a9ec)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c4ca232 [SPARK-35363][SQL] Refactor sort merge join code-gen be agnostic to join type add 7c9a9ec [SPARK-35146][SQL] Migrate to transformWithPruning or resolveWithPruning for rules in finishAnalysis.scala No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/expressions/Expression.scala| 3 +++ .../spark/sql/catalyst/expressions/aggregate/CountIf.scala| 3 +++ .../sql/catalyst/expressions/aggregate/UnevaluableAggs.scala | 3 +++ .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 5 + .../org/apache/spark/sql/catalyst/expressions/misc.scala | 3 +++ .../apache/spark/sql/catalyst/optimizer/finishAnalysis.scala | 11 +++ .../org/apache/spark/sql/catalyst/trees/TreePatterns.scala| 4 7 files changed, 28 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d92018e [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala d92018e is described below commit d92018ee358b0009dac626e2c5568db8363f53ee Author: Yingyi Bu AuthorDate: Wed May 12 20:42:47 2021 +0800 [SPARK-35298][SQL] Migrate to transformWithPruning for rules in Optimizer.scala ### What changes were proposed in this pull request? Added the following TreePattern enums: - ALIAS - AND_OR - AVERAGE - GENERATE - INTERSECT - SORT - SUM - DISTINCT_LIKE - PROJECT - REPARTITION_OPERATION - UNION Added tree traversal pruning to the following rules in Optimizer.scala: - EliminateAggregateFilter - RemoveRedundantAggregates - RemoveNoopOperators - RemoveNoopUnion - LimitPushDown - ColumnPruning - CollapseRepartition - OptimizeRepartition - OptimizeWindowFunctions - CollapseWindow - TransposeWindow - InferFiltersFromGenerate - InferFiltersFromConstraints - CombineUnions - CombineFilters - EliminateSorts - PruneFilters - EliminateLimits - DecimalAggregates - ConvertToLocalRelation - ReplaceDistinctWithAggregate - ReplaceIntersectWithSemiJoin - ReplaceExceptWithAntiJoin - RewriteExceptAll - RewriteIntersectAll - RemoveLiteralFromGroupExpressions - RemoveRepetitionFromGroupExpressions - OptimizeLimitZero ### Why are the changes needed? Reduce the number of tree traversals and hence improve the query compilation latency. perf diff: Rule name | Total Time (baseline) | Total Time (experiment) | experiment/baseline RemoveRedundantAggregates | 51290766 | 67070477 | 1.31 RemoveNoopOperators | 192371141 | 196631275 | 1.02 RemoveNoopUnion | 49222561 | 43266681 | 0.88 LimitPushDown | 40885185 | 21672646 | 0.53 ColumnPruning | 2003406120 | 1285562149 | 0.64 CollapseRepartition | 40648048 | 72646515 | 1.79 OptimizeRepartition | 37813850 | 20600803 | 0.54 OptimizeWindowFunctions | 174426904 | 46741409 | 0.27 CollapseWindow | 38959957 | 24542426 | 0.63 TransposeWindow | 33533191 | 20414930 | 0.61 InferFiltersFromGenerate | 21758688 | 15597344 | 0.72 InferFiltersFromConstraints | 518009794 | 493282321 | 0.95 CombineUnions | 67694022 | 70550382 | 1.04 CombineFilters | 35265060 | 29005424 | 0.82 EliminateSorts | 57025509 | 19795776 | 0.35 PruneFilters | 433964815 | 465579200 | 1.07 EliminateLimits | 44275393 | 24476859 | 0.55 DecimalAggregates | 83143172 | 28816090 | 0.35 ReplaceDistinctWithAggregate | 21783760 | 18287489 | 0.84 ReplaceIntersectWithSemiJoin | 22311271 | 16566393 | 0.74 ReplaceExceptWithAntiJoin | 23838520 | 16588808 | 0.70 RewriteExceptAll | 32750296 | 29421957 | 0.90 RewriteIntersectAll | 29760454 | 21243599 | 0.71 RemoveLiteralFromGroupExpressions | 28151861 | 25270947 | 0.90 RemoveRepetitionFromGroupExpressions | 29587030 | 23447041 | 0.79 OptimizeLimitZero | 18081943 | 15597344 | 0.86 **Accumulated | 4129959311 | 3112676285 | 0.75** ### How was this patch tested? Existing tests. Closes #32439 from sigmod/optimizer. Authored-by: Yingyi Bu Signed-off-by: Gengliang Wang --- .../catalyst/expressions/aggregate/Average.scala | 3 + .../sql/catalyst/expressions/aggregate/Sum.scala | 3 + .../catalyst/expressions/namedExpressions.scala| 2 + .../spark/sql/catalyst/optimizer/Optimizer.scala | 113 ++--- .../plans/logical/basicLogicalOperators.scala | 10 ++ .../sql/catalyst/rules/RuleIdCollection.scala | 24 + .../spark/sql/catalyst/trees/TreePatterns.scala| 11 +- 7 files changed, 128 insertions(+), 38 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala index 8ae24e5..82ad2df 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions.aggregate import org.apache.spark.sql.catalyst.analysis.{DecimalPrecision, FunctionRegistry, TypeCheckResult} import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.trees.TreePattern.{AVERAGE, TreePattern} import org.apache.spark.sql.catalyst.trees.UnaryLike import org.apache.spark.sql.catalyst.util.TypeUtils import
[spark] branch master updated: [SPARK-35144][SQL] Migrate to transformWithPruning for object rules
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 72d3266 [SPARK-35144][SQL] Migrate to transformWithPruning for object rules 72d3266 is described below commit 72d32662d470e286a639783fed8dcf6c3948 Author: Yingyi Bu AuthorDate: Fri May 7 18:36:28 2021 +0800 [SPARK-35144][SQL] Migrate to transformWithPruning for object rules ### What changes were proposed in this pull request? Added the following TreePattern enums: - APPEND_COLUMNS - DESERIALIZE_TO_OBJECT - LAMBDA_VARIABLE - MAP_OBJECTS - SERIALIZE_FROM_OBJECT - PROJECT - TYPED_FILTER Added tree traversal pruning to the following rules dealing with objects: - EliminateSerialization - CombineTypedFilters - EliminateMapObjects - ObjectSerializerPruning ### Why are the changes needed? Reduce the number of tree traversals and hence improve the query compilation latency. ### How was this patch tested? Existing tests. Closes #32451 from sigmod/object. Authored-by: Yingyi Bu Signed-off-by: Gengliang Wang --- .../spark/sql/catalyst/expressions/objects/objects.scala | 6 +- .../org/apache/spark/sql/catalyst/optimizer/objects.scala | 15 ++- .../catalyst/plans/logical/basicLogicalOperators.scala| 2 ++ .../apache/spark/sql/catalyst/plans/logical/object.scala | 8 .../spark/sql/catalyst/rules/RuleIdCollection.scala | 5 + .../apache/spark/sql/catalyst/trees/TreePatterns.scala| 7 +++ 6 files changed, 37 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala index 469c895..40378a3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala @@ -33,7 +33,7 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.expressions.codegen.Block._ import org.apache.spark.sql.catalyst.trees.TernaryLike -import org.apache.spark.sql.catalyst.trees.TreePattern.{NULL_CHECK, TreePattern} +import org.apache.spark.sql.catalyst.trees.TreePattern._ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, ArrayData, GenericArrayData, MapData} import org.apache.spark.sql.errors.QueryExecutionErrors import org.apache.spark.sql.types._ @@ -669,6 +669,8 @@ case class LambdaVariable( private val accessor: (InternalRow, Int) => Any = InternalRow.getAccessor(dataType, nullable) + final override val nodePatterns: Seq[TreePattern] = Seq(LAMBDA_VARIABLE) + // Interpreted execution of `LambdaVariable` always get the 0-index element from input row. override def eval(input: InternalRow): Any = { assert(input.numFields == 1, @@ -781,6 +783,8 @@ case class MapObjects private( override def second: Expression = lambdaFunction override def third: Expression = inputData + final override val nodePatterns: Seq[TreePattern] = Seq(MAP_OBJECTS) + // The data with UserDefinedType are actually stored with the data type of its sqlType. // When we want to apply MapObjects on it, we have to use it. lazy private val inputDataType = inputData.dataType match { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala index 97712a0..52544ff 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala @@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.objects._ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.rules._ +import org.apache.spark.sql.catalyst.trees.TreePattern._ import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructType, UserDefinedType} /* @@ -35,7 +36,8 @@ import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructType, Use * representation of data item. For example back to back map operations. */ object EliminateSerialization extends Rule[LogicalPlan] { - def apply(plan: LogicalPlan): LogicalPlan = plan transform { + def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning( +_.containsAnyPattern(DESERIALIZE_TO_OBJECT, APPEND_COLUMNS, TYPED_FILTER), ruleId) { case d @ DeserializeToObject(_, _, s: SerializeFromObj
[spark] branch master updated (7182f8c -> d2a535f)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7182f8c [SPARK-35360][SQL] RepairTableCommand respects `spark.sql.addPartitionInBatch.size` too add d2a535f [SPARK-34246][FOLLOWUP] Change the definition of `findTightestCommonType` for backward compatibility No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/AnsiTypeCoercion.scala | 53 ++ .../spark/sql/catalyst/analysis/TypeCoercion.scala | 6 +-- 2 files changed, 27 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4fe4b65 -> 7970318)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4fe4b65 [SPARK-35315][TESTS] Keep benchmark result consistent between spark-submit and SBT add 7970318 [SPARK-35155][SQL] Add rule id pruning to Analyzer rules No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 105 + .../catalyst/analysis/DeduplicateRelations.scala | 4 +- .../spark/sql/catalyst/analysis/ResolveHints.scala | 8 +- .../catalyst/analysis/ResolveInlineTables.scala| 4 +- .../spark/sql/catalyst/analysis/ResolveUnion.scala | 4 +- .../analysis/SubstituteUnresolvedOrdinals.scala| 4 +- .../catalyst/analysis/higherOrderFunctions.scala | 3 +- .../sql/catalyst/analysis/timeZoneAnalysis.scala | 3 +- .../sql/catalyst/optimizer/UpdateFields.scala | 4 +- .../sql/catalyst/rules/RuleIdCollection.scala | 41 10 files changed, 133 insertions(+), 47 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7c9a9ec -> 2b6640a)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7c9a9ec [SPARK-35146][SQL] Migrate to transformWithPruning or resolveWithPruning for rules in finishAnalysis.scala add 2b6640a [SPARK-35229][WEBUI] Limit the maximum number of items on the timeline view No new revisions were added by this update. Summary of changes: .../org/apache/spark/internal/config/UI.scala | 15 + .../org/apache/spark/ui/jobs/AllJobsPage.scala | 39 -- .../scala/org/apache/spark/ui/jobs/JobPage.scala | 39 -- .../scala/org/apache/spark/ui/jobs/JobsTab.scala | 1 + .../scala/org/apache/spark/ui/jobs/StagePage.scala | 3 +- docs/configuration.md | 32 ++ 6 files changed, 121 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2e9936d -> e1296ea)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2e9936d [SPARK-35456][CORE] Print the invalid value in config validation error message add e1296ea [SPARK-35445][SQL] Reduce the execution time of DeduplicateRelations No new revisions were added by this update. Summary of changes: .../catalyst/analysis/DeduplicateRelations.scala | 88 ++ 1 file changed, 56 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-35514][INFRA] Automatically update version index of DocSearch via release-tag.sh
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new ac4d95e [SPARK-35514][INFRA] Automatically update version index of DocSearch via release-tag.sh ac4d95e is described below commit ac4d95e465c28cc42c0c3f9adba42457ce763f51 Author: Gengliang Wang AuthorDate: Wed May 26 00:30:44 2021 +0800 [SPARK-35514][INFRA] Automatically update version index of DocSearch via release-tag.sh ### What changes were proposed in this pull request? Automatically update version index of DocSearch via release-tag.sh for releasing new documentation site, instead of the current manual update. ### Why are the changes needed? Simplify the release process. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually run the following command and check the diff ``` R_NEXT_VERSION=3.2.0 sed -i".tmp8" "s/'facetFilters':.*$/'facetFilters': [\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml ``` Closes #32662 from gengliangwang/updateDocsearchInRelease. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang (cherry picked from commit 321c6545b38976b8b051ac1e80666f96922d5950) Signed-off-by: Gengliang Wang --- dev/create-release/release-tag.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/dev/create-release/release-tag.sh b/dev/create-release/release-tag.sh index a9a518f..4be1f9a 100755 --- a/dev/create-release/release-tag.sh +++ b/dev/create-release/release-tag.sh @@ -106,6 +106,8 @@ sed -i".tmp5" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' py sed -i".tmp6" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' docs/_config.yml # Use R version for short version sed -i".tmp7" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: '"$R_NEXT_VERSION"'/g' docs/_config.yml +# Update the version index of DocSearch as the short version +sed -i".tmp8" "s/'facetFilters':.*$/'facetFilters': [\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml git commit -a -m "Preparing development version $NEXT_VERSION" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35514][INFRA] Automatically update version index of DocSearch via release-tag.sh
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 321c654 [SPARK-35514][INFRA] Automatically update version index of DocSearch via release-tag.sh 321c654 is described below commit 321c6545b38976b8b051ac1e80666f96922d5950 Author: Gengliang Wang AuthorDate: Wed May 26 00:30:44 2021 +0800 [SPARK-35514][INFRA] Automatically update version index of DocSearch via release-tag.sh ### What changes were proposed in this pull request? Automatically update version index of DocSearch via release-tag.sh for releasing new documentation site, instead of the current manual update. ### Why are the changes needed? Simplify the release process. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually run the following command and check the diff ``` R_NEXT_VERSION=3.2.0 sed -i".tmp8" "s/'facetFilters':.*$/'facetFilters': [\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml ``` Closes #32662 from gengliangwang/updateDocsearchInRelease. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- dev/create-release/release-tag.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/dev/create-release/release-tag.sh b/dev/create-release/release-tag.sh index a9a518f..4be1f9a 100755 --- a/dev/create-release/release-tag.sh +++ b/dev/create-release/release-tag.sh @@ -106,6 +106,8 @@ sed -i".tmp5" 's/__version__ = .*$/__version__ = "'"$R_NEXT_VERSION.dev0"'"/' py sed -i".tmp6" 's/SPARK_VERSION:.*$/SPARK_VERSION: '"$NEXT_VERSION"'/g' docs/_config.yml # Use R version for short version sed -i".tmp7" 's/SPARK_VERSION_SHORT:.*$/SPARK_VERSION_SHORT: '"$R_NEXT_VERSION"'/g' docs/_config.yml +# Update the version index of DocSearch as the short version +sed -i".tmp8" "s/'facetFilters':.*$/'facetFilters': [\"version:$R_NEXT_VERSION\"]/g" docs/_config.yml git commit -a -m "Preparing development version $NEXT_VERSION" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0ad5ae5 -> 9d0d4ed)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0ad5ae5 [SPARK-35539][PYTHON] Restore to_koalas to keep the backward compatibility add 9d0d4ed [SPARK-35595][TESTS] Support multiple loggers in testing method withLogAppender No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/SparkFunSuite.scala | 24 ++ .../catalyst/expressions/CodeGenerationSuite.scala | 2 +- .../adaptive/AdaptiveQueryExecSuite.scala | 6 -- 3 files changed, 21 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c2de0a6 -> 3f6322f)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c2de0a6 [SPARK-35100][ML] Refactor AFT - support virtual centering add 3f6322f [SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/optimizer/ComplexTypes.scala | 6 -- .../sql/catalyst/optimizer/NormalizeFloatingNumbers.scala| 3 ++- .../org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 9 +++-- .../org/apache/spark/sql/catalyst/optimizer/joins.scala | 5 +++-- .../apache/spark/sql/catalyst/rules/RuleIdCollection.scala | 1 + .../dynamicpruning/CleanupDynamicPruningFilters.scala| 8 ++-- .../spark/sql/execution/python/ExtractPythonUDFs.scala | 12 +--- 7 files changed, 32 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 54e [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join 54e is described below commit 54ed39823c4fc236f328fe55e46607515cd0 Author: Cheng Su AuthorDate: Wed Jun 2 14:01:34 2021 +0800 [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join ### What changes were proposed in this pull request? The condition check for FULL OUTER sort merge join (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L1368 ) has unnecessary trip when `leftIndex == leftMatches.size` or `rightIndex == rightMatches.size`. Though this does not affect correctness (`scanNextInBuffered()` returns false anyway). But we can avoid it in the first place. ### Why are the changes needed? Better readability for developers and avoid unnecessary execution. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests, such as `OuterJoinSuite.scala`. Closes #32736 from c21/join-bug. Authored-by: Cheng Su Signed-off-by: Gengliang Wang --- .../scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala index c565f91..5873754 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala @@ -1365,7 +1365,7 @@ private class SortMergeFullOuterJoinScanner( def advanceNext(): Boolean = { // If we already buffered some matching rows, use them directly -if (leftIndex <= leftMatches.size || rightIndex <= rightMatches.size) { +if (leftIndex < leftMatches.size || rightIndex < rightMatches.size) { if (scanNextInBuffered()) { return true } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (264ce7b -> 92fb23e)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 264ce7b [SPARK-35573][R][TESTSt] Make SparkR tests pass with R 4.1+ add 92fb23e [SPARK-35576][SQL][3.1] Redact the sensitive info in the result of Set command No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/internal/SQLConf.scala | 9 - .../org/apache/spark/sql/execution/command/SetCommand.scala | 6 -- .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 13 + 3 files changed, 25 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (73d4f67 -> 1dd0ca2)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 73d4f67 [SPARK-35433][DOCS] Move CSV data source options from Python and Scala into a single page add 1dd0ca2 [SPARK-35544][SQL] Add tree pattern pruning to Analyzer rules No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 99 -- .../sql/catalyst/analysis/CTESubstitution.scala| 9 +- .../catalyst/analysis/DeduplicateRelations.scala | 4 +- .../analysis/ResolveCommandsWithIfExists.scala | 4 +- .../spark/sql/catalyst/analysis/ResolveHints.scala | 10 ++- .../catalyst/analysis/ResolvePartitionSpec.scala | 4 +- .../spark/sql/catalyst/analysis/ResolveUnion.scala | 4 +- .../analysis/SubstituteUnresolvedOrdinals.scala| 4 +- .../analysis/UpdateAttributeNullability.scala | 4 +- .../catalyst/analysis/higherOrderFunctions.scala | 8 +- .../sql/catalyst/analysis/timeZoneAnalysis.scala | 6 +- .../spark/sql/catalyst/analysis/unresolved.scala | 12 +++ .../sql/catalyst/analysis/v2ResolutionPlans.scala | 2 + .../spark/sql/catalyst/expressions/Cast.scala | 6 +- .../spark/sql/catalyst/expressions/PythonUDF.scala | 3 + .../spark/sql/catalyst/expressions/ScalaUDF.scala | 3 + .../sql/catalyst/expressions/TimeWindow.scala | 2 + .../expressions/aggregate/interfaces.scala | 3 + .../catalyst/expressions/datetimeExpressions.scala | 12 ++- .../sql/catalyst/expressions/generators.scala | 3 + .../spark/sql/catalyst/expressions/grouping.scala | 4 + .../expressions/higherOrderFunctions.scala | 5 ++ .../sql/catalyst/expressions/jsonExpressions.scala | 2 +- .../sql/catalyst/expressions/objects/objects.scala | 2 + .../spark/sql/catalyst/plans/logical/Command.scala | 2 + .../plans/logical/EventTimeWatermark.scala | 3 + .../plans/logical/basicLogicalOperators.scala | 4 + .../spark/sql/catalyst/plans/logical/hints.scala | 2 + .../sql/catalyst/rules/RuleIdCollection.scala | 2 + .../spark/sql/catalyst/trees/TreePatterns.scala| 32 ++- 30 files changed, 188 insertions(+), 72 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7bc364b -> 510bde4)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7bc364b [SPARK-35621][SQL] Add rule id pruning to the TypeCoercion rule add 510bde4 [SPARK-35655][BUILD] Upgrade HtmlUnit and its related artifacts to 2.50 No new revisions were added by this update. Summary of changes: pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b5678be -> 7bc364b)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b5678be [SPARK-35446] Override getJDBCType in MySQLDialect to map FloatType to FLOAT add 7bc364b [SPARK-35621][SQL] Add rule id pruning to the TypeCoercion rule No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/TypeCoercion.scala | 25 +++--- .../sql/catalyst/rules/RuleIdCollection.scala | 29 - .../apache/spark/sql/catalyst/trees/TreeNode.scala | 38 ++ .../catalyst/analysis/AnsiTypeCoercionSuite.scala | 7 .../sql/catalyst/analysis/TypeCoercionSuite.scala | 7 5 files changed, 91 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (53a758b -> c7fb0e1)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 53a758b [SPARK-35636][SQL] Lambda keys should not be referenced outside of the lambda function add c7fb0e1 [SPARK-35629][SQL] Use better exception type if database doesn't exist on `drop database` No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/catalog/SessionCatalog.scala | 3 +++ .../spark/sql/catalyst/catalog/SessionCatalogSuite.scala| 13 ++--- .../org/apache/spark/sql/execution/command/DDLSuite.scala | 7 +-- 3 files changed, 6 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35552][SQL] Make query stage materialized more readable
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3b94aad [SPARK-35552][SQL] Make query stage materialized more readable 3b94aad is described below commit 3b94aad5e72a6b96e4a8f517ac60e0a2fed2590b Author: ulysses-you AuthorDate: Fri May 28 20:42:11 2021 +0800 [SPARK-35552][SQL] Make query stage materialized more readable ### What changes were proposed in this pull request? Add a new method `isMaterialized` in `QueryStageExec`. ### Why are the changes needed? Currently, we use `resultOption().get.isDefined` to check if a query stage has materialized. The code is not readable at a glance. It's better to use a new method like `isMaterialized` to define it. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass CI. Closes #32689 from ulysses-you/SPARK-35552. Authored-by: ulysses-you Signed-off-by: Gengliang Wang --- .../spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala | 5 ++--- .../spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala | 6 +++--- .../apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala | 2 +- .../org/apache/spark/sql/execution/adaptive/QueryStageExec.scala | 7 +-- 4 files changed, 11 insertions(+), 9 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala index 614fc78..648d2e7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala @@ -37,14 +37,13 @@ object AQEPropagateEmptyRelation extends PropagateEmptyRelationBase { super.nonEmpty(plan) || getRowCount(plan).exists(_ > 0) private def getRowCount(plan: LogicalPlan): Option[BigInt] = plan match { -case LogicalQueryStage(_, stage: QueryStageExec) if stage.resultOption.get().isDefined => +case LogicalQueryStage(_, stage: QueryStageExec) if stage.isMaterialized => stage.getRuntimeStatistics.rowCount case _ => None } private def isRelationWithAllNullKeys(plan: LogicalPlan): Boolean = plan match { -case LogicalQueryStage(_, stage: BroadcastQueryStageExec) - if stage.resultOption.get().isDefined => +case LogicalQueryStage(_, stage: BroadcastQueryStageExec) if stage.isMaterialized => stage.broadcast.relationFuture.get().value == HashedRelationWithAllNullKeys case _ => false } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index 556c036..ebff790 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -420,7 +420,7 @@ case class AdaptiveSparkPlanExec( context.stageCache.get(e.canonicalized) match { case Some(existingStage) if conf.exchangeReuseEnabled => val stage = reuseQueryStage(existingStage, e) - val isMaterialized = stage.resultOption.get().isDefined + val isMaterialized = stage.isMaterialized CreateStageResult( newPlan = stage, allChildStagesMaterialized = isMaterialized, @@ -442,7 +442,7 @@ case class AdaptiveSparkPlanExec( newStage = reuseQueryStage(queryStage, e) } } -val isMaterialized = newStage.resultOption.get().isDefined +val isMaterialized = newStage.isMaterialized CreateStageResult( newPlan = newStage, allChildStagesMaterialized = isMaterialized, @@ -455,7 +455,7 @@ case class AdaptiveSparkPlanExec( case q: QueryStageExec => CreateStageResult(newPlan = q, -allChildStagesMaterialized = q.resultOption.get().isDefined, newStages = Seq.empty) +allChildStagesMaterialized = q.isMaterialized, newStages = Seq.empty) case _ => if (plan.children.isEmpty) { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala index 61124f0..a8c74b5 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala @@ -53,7 +53,7 @@ object Dyn
[spark] branch branch-3.2 updated: [SPARK-36025][SQL][TESTS] Reduce the run time of DateExpressionsSuite
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 8f26722 [SPARK-36025][SQL][TESTS] Reduce the run time of DateExpressionsSuite 8f26722 is described below commit 8f267226e45f18c8fe6b6a252a50e204a1a0731c Author: Gengliang Wang AuthorDate: Tue Jul 6 20:17:02 2021 +0800 [SPARK-36025][SQL][TESTS] Reduce the run time of DateExpressionsSuite ### What changes were proposed in this pull request? Some of the test cases in `DateExpressionsSuite` are quite slow: - `Hour`: 24s - `Minute`: 26s - `Day / DayOfMonth`: 8s - `Year`: 4s Each test case has a large loop. We should improve them. ### Why are the changes needed? Save test running time ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified the run times on local: - `Hour`: 2s - `Minute`: 3.2 - `Day / DayOfMonth`:0.5s - `Year`: 2s Total reduced time: 54.3s Closes #33229 from gengliangwang/improveTest. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang (cherry picked from commit d5d12226861f67243dd575c9240238bcd08e1a91) Signed-off-by: Gengliang Wang --- .../expressions/DateExpressionsSuite.scala | 49 ++ 1 file changed, 23 insertions(+), 26 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala index d33fb7d..afcc729 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala @@ -25,7 +25,9 @@ import java.time.temporal.ChronoUnit import java.util.{Calendar, Locale, TimeZone} import java.util.concurrent.TimeUnit._ +import scala.language.postfixOps import scala.reflect.ClassTag +import scala.util.Random import org.apache.spark.{SparkFunSuite, SparkUpgradeException} import org.apache.spark.sql.catalyst.InternalRow @@ -122,8 +124,8 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { (2000 to 2002).foreach { y => (0 to 11 by 11).foreach { m => c.set(y, m, 28) -(0 to 5 * 24).foreach { i => - c.add(Calendar.HOUR_OF_DAY, 1) +(0 to 12).foreach { i => + c.add(Calendar.HOUR_OF_DAY, 10) checkEvaluation(Year(Literal(new Date(c.getTimeInMillis))), c.get(Calendar.YEAR)) } @@ -195,8 +197,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { val c = Calendar.getInstance() (1999 to 2000).foreach { y => c.set(y, 0, 1, 0, 0, 0) - (0 to 365).foreach { d => -c.add(Calendar.DATE, 1) + val random = new Random(System.nanoTime) + random.shuffle(0 to 365 toList).take(10).foreach { d => +c.set(Calendar.DAY_OF_YEAR, d) checkEvaluation(DayOfMonth(Literal(new Date(c.getTimeInMillis))), c.get(Calendar.DAY_OF_MONTH)) } @@ -332,19 +335,15 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { val timeZoneId = Option(zid.getId) c.setTimeZone(TimeZone.getTimeZone(zid)) (0 to 24 by 5).foreach { h => -(0 to 60 by 29).foreach { m => - (0 to 60 by 29).foreach { s => -// validate timestamp with local time zone -c.set(2015, 18, 3, h, m, s) -checkEvaluation( - Hour(Literal(new Timestamp(c.getTimeInMillis)), timeZoneId), - c.get(Calendar.HOUR_OF_DAY)) +// validate timestamp with local time zone +c.set(2015, 18, 3, h, 29, 59) +checkEvaluation( + Hour(Literal(new Timestamp(c.getTimeInMillis)), timeZoneId), + c.get(Calendar.HOUR_OF_DAY)) -// validate timestamp without time zone -val localDateTime = LocalDateTime.of(2015, 1, 3, h, m, s) -checkEvaluation(Hour(Literal(localDateTime), timeZoneId), h) - } -} +// validate timestamp without time zone +val localDateTime = LocalDateTime.of(2015, 1, 3, h, 29, 59) +checkEvaluation(Hour(Literal(localDateTime), timeZoneId), h) } Seq(TimestampType, TimestampNTZType).foreach { dt => checkConsistencyBetweenInterpretedAndCodegen( @@ -367,17 +366,15 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { val timeZoneId = Option(zid.getId) c.setTimeZone(TimeZone.getTimeZone(zid)) (0 to 59 by 5).foreach { m => -(0 to 59 by 15).f
[spark] branch master updated: [SPARK-36025][SQL][TESTS] Reduce the run time of DateExpressionsSuite
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d5d1222 [SPARK-36025][SQL][TESTS] Reduce the run time of DateExpressionsSuite d5d1222 is described below commit d5d12226861f67243dd575c9240238bcd08e1a91 Author: Gengliang Wang AuthorDate: Tue Jul 6 20:17:02 2021 +0800 [SPARK-36025][SQL][TESTS] Reduce the run time of DateExpressionsSuite ### What changes were proposed in this pull request? Some of the test cases in `DateExpressionsSuite` are quite slow: - `Hour`: 24s - `Minute`: 26s - `Day / DayOfMonth`: 8s - `Year`: 4s Each test case has a large loop. We should improve them. ### Why are the changes needed? Save test running time ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified the run times on local: - `Hour`: 2s - `Minute`: 3.2 - `Day / DayOfMonth`:0.5s - `Year`: 2s Total reduced time: 54.3s Closes #33229 from gengliangwang/improveTest. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- .../expressions/DateExpressionsSuite.scala | 49 ++ 1 file changed, 23 insertions(+), 26 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala index d33fb7d..afcc729 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala @@ -25,7 +25,9 @@ import java.time.temporal.ChronoUnit import java.util.{Calendar, Locale, TimeZone} import java.util.concurrent.TimeUnit._ +import scala.language.postfixOps import scala.reflect.ClassTag +import scala.util.Random import org.apache.spark.{SparkFunSuite, SparkUpgradeException} import org.apache.spark.sql.catalyst.InternalRow @@ -122,8 +124,8 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { (2000 to 2002).foreach { y => (0 to 11 by 11).foreach { m => c.set(y, m, 28) -(0 to 5 * 24).foreach { i => - c.add(Calendar.HOUR_OF_DAY, 1) +(0 to 12).foreach { i => + c.add(Calendar.HOUR_OF_DAY, 10) checkEvaluation(Year(Literal(new Date(c.getTimeInMillis))), c.get(Calendar.YEAR)) } @@ -195,8 +197,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { val c = Calendar.getInstance() (1999 to 2000).foreach { y => c.set(y, 0, 1, 0, 0, 0) - (0 to 365).foreach { d => -c.add(Calendar.DATE, 1) + val random = new Random(System.nanoTime) + random.shuffle(0 to 365 toList).take(10).foreach { d => +c.set(Calendar.DAY_OF_YEAR, d) checkEvaluation(DayOfMonth(Literal(new Date(c.getTimeInMillis))), c.get(Calendar.DAY_OF_MONTH)) } @@ -332,19 +335,15 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { val timeZoneId = Option(zid.getId) c.setTimeZone(TimeZone.getTimeZone(zid)) (0 to 24 by 5).foreach { h => -(0 to 60 by 29).foreach { m => - (0 to 60 by 29).foreach { s => -// validate timestamp with local time zone -c.set(2015, 18, 3, h, m, s) -checkEvaluation( - Hour(Literal(new Timestamp(c.getTimeInMillis)), timeZoneId), - c.get(Calendar.HOUR_OF_DAY)) +// validate timestamp with local time zone +c.set(2015, 18, 3, h, 29, 59) +checkEvaluation( + Hour(Literal(new Timestamp(c.getTimeInMillis)), timeZoneId), + c.get(Calendar.HOUR_OF_DAY)) -// validate timestamp without time zone -val localDateTime = LocalDateTime.of(2015, 1, 3, h, m, s) -checkEvaluation(Hour(Literal(localDateTime), timeZoneId), h) - } -} +// validate timestamp without time zone +val localDateTime = LocalDateTime.of(2015, 1, 3, h, 29, 59) +checkEvaluation(Hour(Literal(localDateTime), timeZoneId), h) } Seq(TimestampType, TimestampNTZType).foreach { dt => checkConsistencyBetweenInterpretedAndCodegen( @@ -367,17 +366,15 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { val timeZoneId = Option(zid.getId) c.setTimeZone(TimeZone.getTimeZone(zid)) (0 to 59 by 5).foreach { m => -(0 to 59 by 15).foreach { s => - // validate timestamp with local time zone - c.set(2015, 18, 3, 3, m, s) -
[spark] branch branch-3.2 updated: [SPARK-36043][SQL][TESTS] Add end-to-end tests with default timestamp type as TIMESTAMP_NTZ
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new cafb829 [SPARK-36043][SQL][TESTS] Add end-to-end tests with default timestamp type as TIMESTAMP_NTZ cafb829 is described below commit cafb829c42fc60722bae621da47cac9602e40f4d Author: Gengliang Wang AuthorDate: Thu Jul 8 19:38:52 2021 +0800 [SPARK-36043][SQL][TESTS] Add end-to-end tests with default timestamp type as TIMESTAMP_NTZ ### What changes were proposed in this pull request? Run end-to-end tests with default timestamp type as TIMESTAMP_NTZ to increase test coverage. ### Why are the changes needed? Inrease test coverage. Also, there will be more and more expressions have different behaviors when the default timestamp type is TIMESTAMP_NTZ, for example, `to_timestamp`, `from_json`, `from_csv`, and so on. Having this new test suite helps future developments. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI tests. Closes #33259 from gengliangwang/ntzTest. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang (cherry picked from commit 57342dfc1dd7deaf60209127d93d416c096645ea) Signed-off-by: Gengliang Wang --- .../sql-tests/inputs/timestampNTZ/datetime.sql |1 + .../results/timestampNTZ/datetime.sql.out | 1595 .../org/apache/spark/sql/SQLQueryTestSuite.scala | 15 + .../thriftserver/ThriftServerQueryTestSuite.scala |6 + 4 files changed, 1617 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql b/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql new file mode 100644 index 000..58ecf80 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql @@ -0,0 +1 @@ +--IMPORT datetime.sql diff --git a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out new file mode 100644 index 000..131ad01 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out @@ -0,0 +1,1595 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 193 + + +-- !query +select TIMESTAMP_SECONDS(1230219000),TIMESTAMP_SECONDS(-1230219000),TIMESTAMP_SECONDS(null) +-- !query schema +struct +-- !query output +2008-12-25 07:30:001931-01-07 00:30:00 NULL + + +-- !query +select TIMESTAMP_SECONDS(1.23), TIMESTAMP_SECONDS(1.23d), TIMESTAMP_SECONDS(FLOAT(1.23)) +-- !query schema +struct +-- !query output +1969-12-31 16:00:01.23 1969-12-31 16:00:01.23 1969-12-31 16:00:01.23 + + +-- !query +select TIMESTAMP_MILLIS(1230219000123),TIMESTAMP_MILLIS(-1230219000123),TIMESTAMP_MILLIS(null) +-- !query schema +struct +-- !query output +2008-12-25 07:30:00.1231931-01-07 00:29:59.877 NULL + + +-- !query +select TIMESTAMP_MICROS(1230219000123123),TIMESTAMP_MICROS(-1230219000123123),TIMESTAMP_MICROS(null) +-- !query schema +struct +-- !query output +2008-12-25 07:30:00.123123 1931-01-07 00:29:59.876877 NULL + + +-- !query +select TIMESTAMP_SECONDS(1230219000123123) +-- !query schema +struct<> +-- !query output +java.lang.ArithmeticException +long overflow + + +-- !query +select TIMESTAMP_SECONDS(-1230219000123123) +-- !query schema +struct<> +-- !query output +java.lang.ArithmeticException +long overflow + + +-- !query +select TIMESTAMP_MILLIS(92233720368547758) +-- !query schema +struct<> +-- !query output +java.lang.ArithmeticException +long overflow + + +-- !query +select TIMESTAMP_MILLIS(-92233720368547758) +-- !query schema +struct<> +-- !query output +java.lang.ArithmeticException +long overflow + + +-- !query +select TIMESTAMP_SECONDS(0.1234567) +-- !query schema +struct<> +-- !query output +java.lang.ArithmeticException +Rounding necessary + + +-- !query +select TIMESTAMP_SECONDS(0.1234567d), TIMESTAMP_SECONDS(FLOAT(0.1234567)) +-- !query schema +struct +-- !query output +1969-12-31 16:00:00.123456 1969-12-31 16:00:00.123456 + + +-- !query +select UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08Z')), UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_SECONDS(null) +-- !query schema +struct +-- !query output +1606833008 1606833008 NULL + + +-- !query +select UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08Z')), UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_MILLIS(null) +-- !query schema +struct +-- !query output +1606833008000 1606833008999 NULL + + +-- !query +select UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08Z')), UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_MICROS(null) +-- !query schema +struct +-- !
[spark] branch master updated: [SPARK-36043][SQL][TESTS] Add end-to-end tests with default timestamp type as TIMESTAMP_NTZ
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 57342df [SPARK-36043][SQL][TESTS] Add end-to-end tests with default timestamp type as TIMESTAMP_NTZ 57342df is described below commit 57342dfc1dd7deaf60209127d93d416c096645ea Author: Gengliang Wang AuthorDate: Thu Jul 8 19:38:52 2021 +0800 [SPARK-36043][SQL][TESTS] Add end-to-end tests with default timestamp type as TIMESTAMP_NTZ ### What changes were proposed in this pull request? Run end-to-end tests with default timestamp type as TIMESTAMP_NTZ to increase test coverage. ### Why are the changes needed? Inrease test coverage. Also, there will be more and more expressions have different behaviors when the default timestamp type is TIMESTAMP_NTZ, for example, `to_timestamp`, `from_json`, `from_csv`, and so on. Having this new test suite helps future developments. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI tests. Closes #33259 from gengliangwang/ntzTest. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- .../sql-tests/inputs/timestampNTZ/datetime.sql |1 + .../results/timestampNTZ/datetime.sql.out | 1595 .../org/apache/spark/sql/SQLQueryTestSuite.scala | 15 + .../thriftserver/ThriftServerQueryTestSuite.scala |6 + 4 files changed, 1617 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql b/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql new file mode 100644 index 000..58ecf80 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/timestampNTZ/datetime.sql @@ -0,0 +1 @@ +--IMPORT datetime.sql diff --git a/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out new file mode 100644 index 000..131ad01 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/results/timestampNTZ/datetime.sql.out @@ -0,0 +1,1595 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 193 + + +-- !query +select TIMESTAMP_SECONDS(1230219000),TIMESTAMP_SECONDS(-1230219000),TIMESTAMP_SECONDS(null) +-- !query schema +struct +-- !query output +2008-12-25 07:30:001931-01-07 00:30:00 NULL + + +-- !query +select TIMESTAMP_SECONDS(1.23), TIMESTAMP_SECONDS(1.23d), TIMESTAMP_SECONDS(FLOAT(1.23)) +-- !query schema +struct +-- !query output +1969-12-31 16:00:01.23 1969-12-31 16:00:01.23 1969-12-31 16:00:01.23 + + +-- !query +select TIMESTAMP_MILLIS(1230219000123),TIMESTAMP_MILLIS(-1230219000123),TIMESTAMP_MILLIS(null) +-- !query schema +struct +-- !query output +2008-12-25 07:30:00.1231931-01-07 00:29:59.877 NULL + + +-- !query +select TIMESTAMP_MICROS(1230219000123123),TIMESTAMP_MICROS(-1230219000123123),TIMESTAMP_MICROS(null) +-- !query schema +struct +-- !query output +2008-12-25 07:30:00.123123 1931-01-07 00:29:59.876877 NULL + + +-- !query +select TIMESTAMP_SECONDS(1230219000123123) +-- !query schema +struct<> +-- !query output +java.lang.ArithmeticException +long overflow + + +-- !query +select TIMESTAMP_SECONDS(-1230219000123123) +-- !query schema +struct<> +-- !query output +java.lang.ArithmeticException +long overflow + + +-- !query +select TIMESTAMP_MILLIS(92233720368547758) +-- !query schema +struct<> +-- !query output +java.lang.ArithmeticException +long overflow + + +-- !query +select TIMESTAMP_MILLIS(-92233720368547758) +-- !query schema +struct<> +-- !query output +java.lang.ArithmeticException +long overflow + + +-- !query +select TIMESTAMP_SECONDS(0.1234567) +-- !query schema +struct<> +-- !query output +java.lang.ArithmeticException +Rounding necessary + + +-- !query +select TIMESTAMP_SECONDS(0.1234567d), TIMESTAMP_SECONDS(FLOAT(0.1234567)) +-- !query schema +struct +-- !query output +1969-12-31 16:00:00.123456 1969-12-31 16:00:00.123456 + + +-- !query +select UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08Z')), UNIX_SECONDS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_SECONDS(null) +-- !query schema +struct +-- !query output +1606833008 1606833008 NULL + + +-- !query +select UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08Z')), UNIX_MILLIS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_MILLIS(null) +-- !query schema +struct +-- !query output +1606833008000 1606833008999 NULL + + +-- !query +select UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08Z')), UNIX_MICROS(TIMESTAMP('2020-12-01 14:30:08.99Z')), UNIX_MICROS(null) +-- !query schema +struct +-- !query output +160683300800 160683300899NULL + + +-- !query +select DATE_FROM_UNIX_DATE(0), DATE_FROM_UNIX_DATE(10
[spark] branch branch-3.2 created (now 79a6e00)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git. at 79a6e00 [SPARK-35825][INFRA][FOLLOWUP] Increase it in build/mvn script No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (95d9494 -> 47485a3)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 95d9494 [SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations add 47485a3 [SPARK-35897][SS] Support user defined initial state with flatMapGroupsWithState in Structured Streaming No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 12 + .../spark/sql/catalyst/plans/logical/object.scala | 65 - .../analysis/UnsupportedOperationsSuite.scala | 116 ++--- .../apache/spark/sql/KeyValueGroupedDataset.scala | 164 .../spark/sql/execution/SparkStrategies.scala | 10 +- .../streaming/FlatMapGroupsWithStateExec.scala | 266 ++- .../execution/streaming/IncrementalExecution.scala | 6 +- .../execution/streaming/statefulOperators.scala| 4 +- .../apache/spark/sql/streaming/GroupState.scala| 5 + .../org/apache/spark/sql/JavaDatasetSuite.java | 66 + .../streaming/FlatMapGroupsWithStateSuite.scala| 283 - 11 files changed, 875 insertions(+), 122 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (47485a3 -> 1fda011)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 47485a3 [SPARK-35897][SS] Support user defined initial state with flatMapGroupsWithState in Structured Streaming add 1fda011 [SPARK-35955][SQL] Check for overflow in Average in ANSI mode No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/aggregate/Average.scala | 7 +-- .../scala/org/apache/spark/sql/DataFrameSuite.scala | 20 ++-- 2 files changed, 19 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the executors page
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dc85b0b [SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the executors page dc85b0b is described below commit dc85b0b51a02b9d6c52ffb1600f26ccdd7d7829a Author: Kevin Su AuthorDate: Thu Jul 1 12:32:54 2021 +0800 [SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the executors page ### What changes were proposed in this pull request? Update the executor's page, so it can successfully hide the "Exec Loss Reason" column. ### Why are the changes needed? When unselected the checkbox "Exec Loss Reason" on the executor page, the "Active tasks" column disappears instead of the "Exec Loss Reason" column. Before: ![Screenshot from 2021-06-30 15-55-05](https://user-images.githubusercontent.com/37936015/123930908-bd6f4180-d9c2-11eb-9aba-bbfe0a237776.png) After: ![Screenshot from 2021-06-30 22-21-38](https://user-images.githubusercontent.com/37936015/123977632-bf042e00-d9f1-11eb-910e-93d615d2db47.png) ### Does this PR introduce _any_ user-facing change? Yes, The Web UI is updated. ### How was this patch tested? Pass the CIs. Closes #33155 from pingsutw/SPARK-35950. Lead-authored-by: Kevin Su Co-authored-by: Kevin Su Signed-off-by: Gengliang Wang --- .../src/main/resources/org/apache/spark/ui/static/executorspage.js | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js index ab412a8..b7fbe04 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js +++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js @@ -140,7 +140,7 @@ function totalDurationColor(totalGCTime, totalDuration) { } var sumOptionalColumns = [3, 4]; -var execOptionalColumns = [5, 6, 7, 8, 9, 10, 13, 14, 15]; +var execOptionalColumns = [5, 6, 7, 8, 9, 10, 13, 14, 25]; var execDataTable; var sumDataTable; @@ -566,7 +566,8 @@ $(document).ready(function () { {"visible": false, "targets": 9}, {"visible": false, "targets": 10}, {"visible": false, "targets": 13}, -{"visible": false, "targets": 14} +{"visible": false, "targets": 14}, +{"visible": false, "targets": 25} ], "deferRender": true }; @@ -721,7 +722,7 @@ $(document).ready(function () { " Peak Pool Memory Direct / Mapped" + " Resources" + " Resource Profile Id" + - " Exec Loss Reason" + + " Exec Loss Reason" + ""); reselectCheckboxesBasedOnTaskTableState(); - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8d28839 -> ad4b679)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8d28839 [SPARK-35946][PYTHON] Respect Py4J server in InheritableThread API add ad4b679 [SPARK-35937][SQL] Extracting date field from timestamp should work in ANSI mode No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/AnsiTypeCoercion.scala | 18 +- .../spark/sql/catalyst/rules/RuleIdCollection.scala| 1 + .../sql/catalyst/analysis/AnsiTypeCoercionSuite.scala | 10 ++ .../sql-tests/results/postgreSQL/timestamp.sql.out | 9 ++--- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 8 5 files changed, 42 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6bbfb45 -> 4dd41b9)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6bbfb45 [SPARK-33298][CORE][FOLLOWUP] Add Unstable annotation to `FileCommitProtocol` add 4dd41b9 [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching No new revisions were added by this update. Summary of changes: docs/sql-data-sources-avro.md | 6 + .../apache/spark/sql/avro/AvroDeserializer.scala | 15 +- .../org/apache/spark/sql/avro/AvroFileFormat.scala | 1 + .../org/apache/spark/sql/avro/AvroOptions.scala| 8 + .../apache/spark/sql/avro/AvroOutputWriter.scala | 5 +- .../spark/sql/avro/AvroOutputWriterFactory.scala | 8 +- .../org/apache/spark/sql/avro/AvroSerializer.scala | 22 +-- .../org/apache/spark/sql/avro/AvroUtils.scala | 42 +- .../sql/v2/avro/AvroPartitionReaderFactory.scala | 1 + .../sql/avro/AvroCatalystDataConversionSuite.scala | 1 + .../apache/spark/sql/avro/AvroRowReaderSuite.scala | 1 + .../spark/sql/avro/AvroSchemaHelperSuite.scala | 24 ++- .../org/apache/spark/sql/avro/AvroSerdeSuite.scala | 164 ++--- .../org/apache/spark/sql/avro/AvroSuite.scala | 41 +- 14 files changed, 258 insertions(+), 81 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35951][DOCS] Add since versions for Avro options in Documentation
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c6afd6e [SPARK-35951][DOCS] Add since versions for Avro options in Documentation c6afd6e is described below commit c6afd6ed5296980e81160e441a4e9bea98c74196 Author: Gengliang Wang AuthorDate: Wed Jun 30 17:24:48 2021 +0800 [SPARK-35951][DOCS] Add since versions for Avro options in Documentation ### What changes were proposed in this pull request? There are two new Avro options `datetimeRebaseMode` and `positionalFieldMatching` after Spark 3.2. We should document the since version so that users can know whether the option works in their Spark version. ### Why are the changes needed? Better documentation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual preview on local setup. https://user-images.githubusercontent.com/1097932/123934000-ba833b00-d947-11eb-9ca5-ce8ff8add74b.png;> https://user-images.githubusercontent.com/1097932/123934126-d4bd1900-d947-11eb-8d80-69df8f3d9900.png;> Closes #33153 from gengliangwang/version. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- docs/sql-data-sources-avro.md | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md index 7fb0ef5..94dd7e1 100644 --- a/docs/sql-data-sources-avro.md +++ b/docs/sql-data-sources-avro.md @@ -224,7 +224,7 @@ Data source options of Avro can be set via: * the `options` parameter in function `from_avro`. - Property NameDefaultMeaningScope + Property NameDefaultMeaningScopeSince Version avroSchema None @@ -244,24 +244,28 @@ Data source options of Avro can be set via: read, write and function from_avro +2.4.0 recordName topLevelRecord Top level record name in write result, which is required in Avro spec. write +2.4.0 recordNamespace "" Record namespace in write result. write +2.4.0 ignoreExtension true The option controls ignoring of files without .avro extensions in read. If the option is enabled, all files (with and without .avro extension) are loaded. The option has been deprecated, and it will be removed in the future releases. Please use the general data source option pathGlobFilter for filtering file names. read +2.4.0 compression @@ -269,6 +273,7 @@ Data source options of Avro can be set via: The compression option allows to specify a compression codec used in write. Currently supported codecs are uncompressed, snappy, deflate, bzip2 and xz. If the option is not set, the configuration spark.sql.avro.compression.codec config is taken into account. write +2.4.0 mode @@ -282,6 +287,7 @@ Data source options of Avro can be set via: function from_avro +2.4.0 datetimeRebaseMode @@ -295,12 +301,14 @@ Data source options of Avro can be set via: read and function from_avro +3.2.0 positionalFieldMatching false This can be used in tandem with the `avroSchema` option to adjust the behavior for matching the fields in the provided Avro schema with those in the SQL schema. By default, the matching will be performed using field names, ignoring their positions. If this option is set to "true", the matching will be based on the position of the fields. read and write +3.2.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35971][SQL] Rename the type name of TimestampNTZType as "timestamp_ntz"
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3acc4b9 [SPARK-35971][SQL] Rename the type name of TimestampNTZType as "timestamp_ntz" 3acc4b9 is described below commit 3acc4b973b57f88fbe681c7db89cd55699750178 Author: Gengliang Wang AuthorDate: Thu Jul 1 20:50:19 2021 +0800 [SPARK-35971][SQL] Rename the type name of TimestampNTZType as "timestamp_ntz" ### What changes were proposed in this pull request? Rename the type name string of TimestampNTZType from "timestamp without time zone" to "timestamp_ntz". ### Why are the changes needed? This is to make the column header shorter and simpler. Snowflake and Flink uses similar approach: https://docs.snowflake.com/en/sql-reference/data-types-datetime.html https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/ ### Does this PR introduce _any_ user-facing change? No, the new timestamp type is not released yet. ### How was this patch tested? Unit tests Closes #33173 from gengliangwang/reviseTypeName. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- .../apache/spark/sql/types/TimestampNTZType.scala | 2 +- .../sql/catalyst/expressions/CastSuiteBase.scala | 4 +- .../sql-functions/sql-expression-schema.md | 4 +- .../sql-tests/results/ansi/datetime.sql.out| 92 +- .../sql-tests/results/datetime-legacy.sql.out | 108 ++--- .../resources/sql-tests/results/datetime.sql.out | 108 ++--- 6 files changed, 159 insertions(+), 159 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala index 347fd4a..f7d20a0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala @@ -48,7 +48,7 @@ class TimestampNTZType private() extends AtomicType { */ override def defaultSize: Int = 8 - override def typeName: String = "timestamp without time zone" + override def typeName: String = "timestamp_ntz" private[spark] override def asNullable: TimestampNTZType = this } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala index f6a628a..66f5b50 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala @@ -939,11 +939,11 @@ abstract class CastSuiteBase extends SparkFunSuite with ExpressionEvalHelper { test("disallow type conversions between Numeric types and Timestamp without time zone type") { import DataTypeTestUtils.numericTypes checkInvalidCastFromNumericType(TimestampNTZType) -var errorMsg = "cannot cast bigint to timestamp without time zone" +var errorMsg = "cannot cast bigint to timestamp_ntz" verifyCastFailure(cast(Literal(0L), TimestampNTZType), Some(errorMsg)) val timestampNTZLiteral = Literal.create(LocalDateTime.now(), TimestampNTZType) -errorMsg = "cannot cast timestamp without time zone to" +errorMsg = "cannot cast timestamp_ntz to" numericTypes.foreach { numericType => verifyCastFailure(cast(timestampNTZLiteral, numericType), Some(errorMsg)) } diff --git a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md index 5fa37c4..00fb172 100644 --- a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md +++ b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md @@ -206,7 +206,7 @@ | org.apache.spark.sql.catalyst.expressions.Overlay | overlay | SELECT overlay('Spark SQL' PLACING '_' FROM 6) | struct | | org.apache.spark.sql.catalyst.expressions.ParseToDate | to_date | SELECT to_date('2009-07-30 04:17:52') | struct | | org.apache.spark.sql.catalyst.expressions.ParseToTimestamp | to_timestamp | SELECT to_timestamp('2016-12-31 00:12:00') | struct | -| org.apache.spark.sql.catalyst.expressions.ParseToTimestampNTZ | to_timestamp_ntz | SELECT to_timestamp_ntz('2016-12-31 00:12:00') | struct | +| org.apache.spark.sql.catalyst.expressions.ParseToTimestampNTZ | to_timestamp_ntz | SELECT to_timestamp_ntz('2016-12-31 00:12:00') | struct | | org.apache.spark.sql.catalyst.expressions.ParseUrl | parse_url | SELECT pars
[spark] branch master updated (c6afd6e -> e88aa49)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c6afd6e [SPARK-35951][DOCS] Add since versions for Avro options in Documentation add e88aa49 [SPARK-35932][SQL] Support extracting hour/minute/second from timestamp without time zone No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/AnsiTypeCoercion.scala | 1 + .../spark/sql/catalyst/analysis/TypeCoercion.scala | 6 +- .../catalyst/expressions/datetimeExpressions.scala | 8 +- .../apache/spark/sql/types/AbstractDataType.scala | 2 +- .../expressions/DateExpressionsSuite.scala | 65 +--- .../test/resources/sql-tests/inputs/extract.sql| 66 .../resources/sql-tests/results/extract.sql.out| 182 ++--- 7 files changed, 181 insertions(+), 149 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0a7a6f7 -> 7635114)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0a7a6f7 [SPARK-35483][FOLLOWUP][TESTS] Update run-tests.py doctest add 7635114 [SPARK-35916][SQL] Support subtraction among Date/Timestamp/TimestampWithoutTZ No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 4 +- .../spark/sql/catalyst/analysis/TypeCoercion.scala | 13 +++-- .../catalyst/expressions/datetimeExpressions.scala | 8 ++- .../apache/spark/sql/types/AbstractDataType.scala | 11 .../expressions/DateExpressionsSuite.scala | 65 + .../test/resources/sql-tests/inputs/datetime.sql | 10 .../sql-tests/results/ansi/datetime.sql.out| 66 +- .../sql-tests/results/datetime-legacy.sql.out | 66 +- .../resources/sql-tests/results/datetime.sql.out | 66 +- .../typeCoercion/native/decimalPrecision.sql.out | 16 +++--- .../typeCoercion/native/promoteStrings.sql.out | 4 +- 11 files changed, 307 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5db51ef -> 78e6263)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5db51ef [SPARK-35721][PYTHON] Path level discover for python unittests add 78e6263 [SPARK-35927][SQL] Remove type collection AllTimestampTypes No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 5 ++--- .../main/scala/org/apache/spark/sql/types/AbstractDataType.scala | 8 2 files changed, 2 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-33603][SQL] Grouping exception messages in execution/command
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d03f716 [SPARK-33603][SQL] Grouping exception messages in execution/command d03f716 is described below commit d03f71657ed745247d026ca1e5de2a2d7c9a6a30 Author: dgd-contributor AuthorDate: Tue Jul 13 01:28:43 2021 +0800 [SPARK-33603][SQL] Grouping exception messages in execution/command ### What changes were proposed in this pull request? This PR group exception messages in sql/core/src/main/scala/org/apache/spark/sql/execution/command ### Why are the changes needed? It will largely help with standardization of error messages and its maintenance. ### Does this PR introduce any user-facing change? No. Error messages remain unchanged. ### How was this patch tested? No new tests - pass all original tests to make sure it doesn't break any existing behavior. Closes #32951 from dgd-contributor/SPARK-33603_grouping_execution/command. Authored-by: dgd-contributor Signed-off-by: Gengliang Wang --- .../spark/sql/errors/QueryCompilationErrors.scala | 368 - .../spark/sql/errors/QueryExecutionErrors.scala| 18 + .../execution/command/AnalyzeColumnCommand.scala | 16 +- .../command/AnalyzePartitionCommand.scala | 17 +- .../spark/sql/execution/command/CommandUtils.scala | 9 +- .../sql/execution/command/DataWritingCommand.scala | 9 +- .../command/InsertIntoDataSourceDirCommand.scala | 6 +- .../execution/command/createDataSourceTables.scala | 6 +- .../apache/spark/sql/execution/command/ddl.scala | 57 ++-- .../spark/sql/execution/command/functions.scala| 22 +- .../spark/sql/execution/command/tables.scala | 117 +++ .../apache/spark/sql/execution/command/views.scala | 44 ++- 12 files changed, 505 insertions(+), 184 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index 4f82e25..d1dcbbc 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -17,12 +17,15 @@ package org.apache.spark.sql.errors +import scala.collection.mutable + import org.apache.hadoop.fs.Path import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.catalyst.{FunctionIdentifier, QualifiedTableName, TableIdentifier} -import org.apache.spark.sql.catalyst.analysis.{CannotReplaceMissingTableException, NamespaceAlreadyExistsException, NoSuchNamespaceException, NoSuchTableException, ResolvedNamespace, ResolvedTable, ResolvedView, TableAlreadyExistsException} +import org.apache.spark.sql.catalyst.analysis.{CannotReplaceMissingTableException, NamespaceAlreadyExistsException, NoSuchFunctionException, NoSuchNamespaceException, NoSuchPartitionException, NoSuchTableException, ResolvedNamespace, ResolvedTable, ResolvedView, TableAlreadyExistsException} import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogTable, InvalidUDFClassException} +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, AttributeSet, CreateMap, Expression, GroupingID, NamedExpression, SpecifiedWindowFrame, WindowFrame, WindowFunction, WindowSpecDefinition} import org.apache.spark.sql.catalyst.plans.JoinType import org.apache.spark.sql.catalyst.plans.logical.{InsertIntoStatement, Join, LogicalPlan, SerdeInfo, Window} @@ -1696,6 +1699,369 @@ private[spark] object QueryCompilationErrors { s"Found duplicate column(s) $colType: ${duplicateCol.sorted.mkString(", ")}") } + def noSuchTableError(db: String, table: String): Throwable = { +new NoSuchTableException(db = db, table = table) + } + + def tempViewNotCachedForAnalyzingColumnsError(tableIdent: TableIdentifier): Throwable = { +new AnalysisException(s"Temporary view $tableIdent is not cached for analyzing columns.") + } + + def columnTypeNotSupportStatisticsCollectionError( + name: String, + tableIdent: TableIdentifier, + dataType: DataType): Throwable = { +new AnalysisException(s"Column $name in table $tableIdent is of type $dataType, " + + "and Spark does not support statistics collection on this column type.") + } + + def analyzeTableNotSupportedOnViewsError(): Throwable = { +new AnalysisException("ANALYZE TABLE is not supported on views.") + } + + def unexpectedPartitionColumnPrefixError( + table: String, + database: String, + schemaColumns:
[spark] branch branch-3.2 updated: [SPARK-33603][SQL] Grouping exception messages in execution/command
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 12aecb4 [SPARK-33603][SQL] Grouping exception messages in execution/command 12aecb4 is described below commit 12aecb43302fcb9ddbdd3ab0633291ccf3e91f6b Author: dgd-contributor AuthorDate: Tue Jul 13 01:28:43 2021 +0800 [SPARK-33603][SQL] Grouping exception messages in execution/command ### What changes were proposed in this pull request? This PR group exception messages in sql/core/src/main/scala/org/apache/spark/sql/execution/command ### Why are the changes needed? It will largely help with standardization of error messages and its maintenance. ### Does this PR introduce any user-facing change? No. Error messages remain unchanged. ### How was this patch tested? No new tests - pass all original tests to make sure it doesn't break any existing behavior. Closes #32951 from dgd-contributor/SPARK-33603_grouping_execution/command. Authored-by: dgd-contributor Signed-off-by: Gengliang Wang (cherry picked from commit d03f71657ed745247d026ca1e5de2a2d7c9a6a30) Signed-off-by: Gengliang Wang --- .../spark/sql/errors/QueryCompilationErrors.scala | 368 - .../spark/sql/errors/QueryExecutionErrors.scala| 18 + .../execution/command/AnalyzeColumnCommand.scala | 16 +- .../command/AnalyzePartitionCommand.scala | 17 +- .../spark/sql/execution/command/CommandUtils.scala | 9 +- .../sql/execution/command/DataWritingCommand.scala | 9 +- .../command/InsertIntoDataSourceDirCommand.scala | 6 +- .../execution/command/createDataSourceTables.scala | 6 +- .../apache/spark/sql/execution/command/ddl.scala | 57 ++-- .../spark/sql/execution/command/functions.scala| 22 +- .../spark/sql/execution/command/tables.scala | 117 +++ .../apache/spark/sql/execution/command/views.scala | 44 ++- 12 files changed, 505 insertions(+), 184 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index 4f82e25..d1dcbbc 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -17,12 +17,15 @@ package org.apache.spark.sql.errors +import scala.collection.mutable + import org.apache.hadoop.fs.Path import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.catalyst.{FunctionIdentifier, QualifiedTableName, TableIdentifier} -import org.apache.spark.sql.catalyst.analysis.{CannotReplaceMissingTableException, NamespaceAlreadyExistsException, NoSuchNamespaceException, NoSuchTableException, ResolvedNamespace, ResolvedTable, ResolvedView, TableAlreadyExistsException} +import org.apache.spark.sql.catalyst.analysis.{CannotReplaceMissingTableException, NamespaceAlreadyExistsException, NoSuchFunctionException, NoSuchNamespaceException, NoSuchPartitionException, NoSuchTableException, ResolvedNamespace, ResolvedTable, ResolvedView, TableAlreadyExistsException} import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogTable, InvalidUDFClassException} +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, AttributeSet, CreateMap, Expression, GroupingID, NamedExpression, SpecifiedWindowFrame, WindowFrame, WindowFunction, WindowSpecDefinition} import org.apache.spark.sql.catalyst.plans.JoinType import org.apache.spark.sql.catalyst.plans.logical.{InsertIntoStatement, Join, LogicalPlan, SerdeInfo, Window} @@ -1696,6 +1699,369 @@ private[spark] object QueryCompilationErrors { s"Found duplicate column(s) $colType: ${duplicateCol.sorted.mkString(", ")}") } + def noSuchTableError(db: String, table: String): Throwable = { +new NoSuchTableException(db = db, table = table) + } + + def tempViewNotCachedForAnalyzingColumnsError(tableIdent: TableIdentifier): Throwable = { +new AnalysisException(s"Temporary view $tableIdent is not cached for analyzing columns.") + } + + def columnTypeNotSupportStatisticsCollectionError( + name: String, + tableIdent: TableIdentifier, + dataType: DataType): Throwable = { +new AnalysisException(s"Column $name in table $tableIdent is of type $dataType, " + + "and Spark does not support statistics collection on this column type.") + } + + def analyzeTableNotSupportedOnViewsError(): Throwable = { +new AnalysisException("ANALYZE TABLE is not supported on views.") + } + + de
[spark] branch master updated: [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c605ba2 [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType c605ba2 is described below commit c605ba2d46742ca13db794ca1be136a4b10b652e Author: gengjiaan AuthorDate: Mon Jul 5 18:48:00 2021 +0800 [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType ### What changes were proposed in this pull request? This PR fix the incorrect comment for `TimestampNTZType`. ### Why are the changes needed? Fix the incorrect comment ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? No need. Closes #33218 from beliefer/SPARK-35664-followup. Authored-by: gengjiaan Signed-off-by: Gengliang Wang --- sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala index 15a93a7..f23f3c6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala @@ -116,7 +116,7 @@ object Encoders { /** * Creates an encoder that serializes instances of the `java.time.LocalDateTime` class - * to the internal representation of nullable Catalyst's DateType. + * to the internal representation of nullable Catalyst's TimestampNTZType. * * @since 3.2.0 */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new d3e8c9c [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType d3e8c9c is described below commit d3e8c9c78b364580523e3f915ee51369ca7df0bf Author: gengjiaan AuthorDate: Mon Jul 5 18:48:00 2021 +0800 [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType ### What changes were proposed in this pull request? This PR fix the incorrect comment for `TimestampNTZType`. ### Why are the changes needed? Fix the incorrect comment ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? No need. Closes #33218 from beliefer/SPARK-35664-followup. Authored-by: gengjiaan Signed-off-by: Gengliang Wang (cherry picked from commit c605ba2d46742ca13db794ca1be136a4b10b652e) Signed-off-by: Gengliang Wang --- sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala index 15a93a7..f23f3c6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala @@ -116,7 +116,7 @@ object Encoders { /** * Creates an encoder that serializes instances of the `java.time.LocalDateTime` class - * to the internal representation of nullable Catalyst's DateType. + * to the internal representation of nullable Catalyst's TimestampNTZType. * * @since 3.2.0 */ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-35979][SQL] Return different timestamp literals based on the default timestamp type
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new a9947cb [SPARK-35979][SQL] Return different timestamp literals based on the default timestamp type a9947cb is described below commit a9947cbd716b83e2f65dfec035c7abf29ea40922 Author: Gengliang Wang AuthorDate: Tue Jul 6 00:54:58 2021 +0800 [SPARK-35979][SQL] Return different timestamp literals based on the default timestamp type ### What changes were proposed in this pull request? For the timestamp literal, it should have the following behavior. 1. When `spark.sql.timestampType` is TIMESTAMP_NTZ: if there is no time zone part, return timestamp without time zone literal; otherwise, return timestamp with local time zone literal 2. When `spark.sql.timestampType` is TIMESTAMP_LTZ: return timestamp with local time zone literal ### Why are the changes needed? When the default timestamp type is TIMESTAMP_NTZ, the result of type literal should return TIMESTAMP_NTZ when there is no time zone part in the string. From setion 5.3 "literal" of ANSI SQL standard 2011: ``` 27) The declared type of a that does not specify is TIMESTAMP(P) WITHOUT TIME ZONE, where P is the number of digits in , if specified, and 0 (zero) otherwise. The declared type of a that specifies is TIMESTAMP(P) WITH TIME ZONE, where P is the number of digits in , if specified, and 0 (zero) otherwise. ``` Since we don't have "timestamp with time zone", we use timestamp with local time zone instead. ### Does this PR introduce _any_ user-facing change? No, the new timestmap type and the default timestamp configuration is not released yet. ### How was this patch tested? Unit test Closes #33215 from gengliangwang/tsLiteral. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang (cherry picked from commit 2fffec7de8d31bd01c8acd8bca72acacaf189c97) Signed-off-by: Gengliang Wang --- .../spark/sql/catalyst/parser/AstBuilder.scala | 32 ++ .../spark/sql/catalyst/util/DateTimeUtils.scala| 31 + .../org/apache/spark/sql/internal/SQLConf.scala| 6 ++-- .../catalyst/parser/ExpressionParserSuite.scala| 12 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 15 ++ 5 files changed, 83 insertions(+), 13 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 361ecc1..5b9107f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -38,8 +38,8 @@ import org.apache.spark.sql.catalyst.parser.SqlBaseParser._ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.trees.CurrentOrigin -import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, IntervalUtils} -import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, convertSpecialTimestamp, getZoneId, stringToDate, stringToTimestamp} +import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, IntervalUtils} +import org.apache.spark.sql.catalyst.util.DateTimeUtils.{convertSpecialDate, convertSpecialTimestamp, convertSpecialTimestampNTZ, getZoneId, stringToDate, stringToTimestamp, stringToTimestampWithoutTimeZone} import org.apache.spark.sql.catalyst.util.IntervalUtils.IntervalUnit import org.apache.spark.sql.connector.catalog.{SupportsNamespaces, TableCatalog} import org.apache.spark.sql.connector.catalog.TableChange.ColumnPosition @@ -2126,9 +2126,31 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg val specialDate = convertSpecialDate(value, zoneId).map(Literal(_, DateType)) specialDate.getOrElse(toLiteral(stringToDate, DateType)) case "TIMESTAMP" => - val zoneId = getZoneId(conf.sessionLocalTimeZone) - val specialTs = convertSpecialTimestamp(value, zoneId).map(Literal(_, TimestampType)) - specialTs.getOrElse(toLiteral(stringToTimestamp(_, zoneId), TimestampType)) + def constructTimestampLTZLiteral(value: String): Literal = { +val zoneId = getZoneId(conf.sessionLocalTimeZone) +val specialTs = convertSpecialTimestamp(value, zoneId).map(Literal(_, TimestampType)) +specialTs.getOrElse(toLiteral(stringToTimestamp(_, zoneId), TimestampType)) + } + + SQLConf.get.timestampType match { +case TimestampNTZType => + val sp
[spark] branch master updated (c605ba2 -> 2fffec7)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c605ba2 [SPARK-35664][SQL][FOLLOWUP] Fix incorrect comment for TimestampNTZType add 2fffec7 [SPARK-35979][SQL] Return different timestamp literals based on the default timestamp type No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/parser/AstBuilder.scala | 32 ++ .../spark/sql/catalyst/util/DateTimeUtils.scala| 31 + .../org/apache/spark/sql/internal/SQLConf.scala| 6 ++-- .../catalyst/parser/ExpressionParserSuite.scala| 12 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 15 ++ 5 files changed, 83 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-35978][SQL] Support non-reserved keyword TIMESTAMP_LTZ
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new e09feda [SPARK-35978][SQL] Support non-reserved keyword TIMESTAMP_LTZ e09feda is described below commit e09feda1d23a89a6f15a900f8001405f47b7e058 Author: Gengliang Wang AuthorDate: Tue Jul 6 14:33:22 2021 +0800 [SPARK-35978][SQL] Support non-reserved keyword TIMESTAMP_LTZ ### What changes were proposed in this pull request? Support new keyword `TIMESTAMP_LTZ`, which can be used for: - timestamp with local time zone data type in DDL - timestamp with local time zone data type in Cast clause. - timestamp with local time zone data type literal ### Why are the changes needed? Users can use `TIMESTAMP_LTZ` in DDL/Cast/Literals for the timestamp with local time zone type directly. The new keyword is independent of the SQL configuration `spark.sql.timestampType`. ### Does this PR introduce _any_ user-facing change? No, the new timestamp type is not released yet. ### How was this patch tested? Unit test Closes #33224 from gengliangwang/TIMESTAMP_LTZ. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang (cherry picked from commit b0b9643cd76da48ed90e958e40717a664bc7494b) Signed-off-by: Gengliang Wang --- .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 16 ++-- .../sql/catalyst/parser/DataTypeParserSuite.scala | 1 + .../sql/catalyst/parser/ExpressionParserSuite.scala | 19 +++ 3 files changed, 26 insertions(+), 10 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 680d781..d6363b5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -2119,6 +2119,13 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg throw QueryParsingErrors.cannotParseValueTypeError(valueType, value, ctx) } } + +def constructTimestampLTZLiteral(value: String): Literal = { + val zoneId = getZoneId(conf.sessionLocalTimeZone) + val specialTs = convertSpecialTimestamp(value, zoneId).map(Literal(_, TimestampType)) + specialTs.getOrElse(toLiteral(stringToTimestamp(_, zoneId), TimestampType)) +} + try { valueType match { case "DATE" => @@ -2128,13 +2135,9 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg case "TIMESTAMP_NTZ" => val specialTs = convertSpecialTimestampNTZ(value).map(Literal(_, TimestampNTZType)) specialTs.getOrElse(toLiteral(stringToTimestampWithoutTimeZone, TimestampNTZType)) +case "TIMESTAMP_LTZ" => + constructTimestampLTZLiteral(value) case "TIMESTAMP" => - def constructTimestampLTZLiteral(value: String): Literal = { -val zoneId = getZoneId(conf.sessionLocalTimeZone) -val specialTs = convertSpecialTimestamp(value, zoneId).map(Literal(_, TimestampType)) -specialTs.getOrElse(toLiteral(stringToTimestamp(_, zoneId), TimestampType)) - } - SQLConf.get.timestampType match { case TimestampNTZType => val specialTs = convertSpecialTimestampNTZ(value).map(Literal(_, TimestampNTZType)) @@ -2529,6 +2532,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg case ("date", Nil) => DateType case ("timestamp", Nil) => SQLConf.get.timestampType case ("timestamp_ntz", Nil) => TimestampNTZType + case ("timestamp_ltz", Nil) => TimestampType case ("string", Nil) => StringType case ("character" | "char", length :: Nil) => CharType(length.getText.toInt) case ("varchar", length :: Nil) => VarcharType(length.getText.toInt) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala index d34..97dd0db 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala @@ -59,6 +59,7 @@ class DataTypeParserSuite extends SparkFunSuite with SQLHelper { checkDataType("DATE", DateType) checkDataType("timestamp", TimestampType)
[spark] branch master updated (9544277 -> b0b9643)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9544277 [SPARK-35788][SS] Metrics support for RocksDB instance add b0b9643 [SPARK-35978][SQL] Support non-reserved keyword TIMESTAMP_LTZ No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 16 ++-- .../sql/catalyst/parser/DataTypeParserSuite.scala | 1 + .../sql/catalyst/parser/ExpressionParserSuite.scala | 19 +++ 3 files changed, 26 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2febd5c -> 733e85f1)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2febd5c [SPARK-35735][SQL] Take into account day-time interval fields in cast add 733e85f1 [SPARK-35953][SQL] Support extracting date fields from timestamp without time zone No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/AnsiTypeCoercion.scala | 2 +- .../spark/sql/catalyst/analysis/TypeCoercion.scala | 4 +- .../test/resources/sql-tests/inputs/extract.sql| 92 +++ .../resources/sql-tests/results/extract.sql.out| 276 ++--- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 8 +- 5 files changed, 192 insertions(+), 190 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9b387a1 -> 7fd3f8f)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9b387a1 [SPARK-35308][TESTS] Fix bug in SPARK-35266 that creates benchmark files in invalid path with wrong name add 7fd3f8f [SPARK-35294][SQL] Add tree traversal pruning in rules with dedicated files under optimizer No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/complexTypeCreator.scala | 3 +++ .../sql/catalyst/expressions/complexTypeExtractors.scala | 5 - .../spark/sql/catalyst/expressions/jsonExpressions.scala | 3 +++ .../spark/sql/catalyst/expressions/namedExpressions.scala| 3 ++- .../spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala | 3 ++- .../sql/catalyst/optimizer/LimitPushDownThroughWindow.scala | 4 +++- .../spark/sql/catalyst/optimizer/OptimizeCsvJsonExprs.scala | 12 +--- .../sql/catalyst/optimizer/PropagateEmptyRelation.scala | 4 +++- .../sql/catalyst/optimizer/PullOutGroupingExpressions.scala | 3 ++- .../sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala| 3 ++- .../sql/catalyst/optimizer/ReplaceExceptWithFilter.scala | 3 ++- .../sql/catalyst/optimizer/RewriteDistinctAggregates.scala | 4 +++- .../catalyst/optimizer/SimplifyConditionalsInPredicate.scala | 4 +++- .../catalyst/optimizer/UnwrapCastInBinaryComparison.scala| 6 -- .../spark/sql/catalyst/plans/logical/LocalRelation.scala | 3 +++ .../sql/catalyst/plans/logical/basicLogicalOperators.scala | 10 ++ .../apache/spark/sql/catalyst/rules/RuleIdCollection.scala | 8 +++- .../org/apache/spark/sql/catalyst/trees/TreePatterns.scala | 9 + 18 files changed, 74 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (712a62c -> 2298ceb)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 712a62c [SPARK-34832][SQL][TEST] Set EXECUTOR_ALLOW_SPARK_CONTEXT to true to ensure ExternalAppendOnlyUnsafeRowArrayBenchmark run successfully add 2298ceb [SPARK-34477][CORE] Register KryoSerializers for Avro GenericData classes No new revisions were added by this update. Summary of changes: .../spark/serializer/GenericAvroSerializer.scala | 29 .../apache/spark/serializer/KryoSerializer.scala | 16 - .../serializer/GenericAvroSerializerSuite.scala| 78 +++--- 3 files changed, 81 insertions(+), 42 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34856][SQL] ANSI mode: Allow casting complex types as string type
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0515f49 [SPARK-34856][SQL] ANSI mode: Allow casting complex types as string type 0515f49 is described below commit 0515f490189466c5f13aa4f647e81aeb6c24d0bf Author: Gengliang Wang AuthorDate: Fri Mar 26 00:17:43 2021 +0800 [SPARK-34856][SQL] ANSI mode: Allow casting complex types as string type ### What changes were proposed in this pull request? Allow casting complex types as string type in ANSI mode. ### Why are the changes needed? Currently, complex types are not allowed to cast as string type. This breaks the DataFrame.show() API. E.g ``` scala> sql(“select array(1, 2, 2)“).show(false) org.apache.spark.sql.AnalysisException: cannot resolve ‘CAST(`array(1, 2, 2)` AS STRING)’ due to data type mismatch: cannot cast array to string with ANSI mode on. ``` We should allow the conversion as the extension of the ANSI SQL standard, so that the DataFrame.show() still work in ANSI mode. ### Does this PR introduce _any_ user-facing change? Yes, casting complex types as string type is now allowed in ANSI mode. ### How was this patch tested? Unit tests. Closes #31954 from gengliangwang/fixExplicitCast. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- docs/sql-ref-ansi-compliance.md| 9 +- .../spark/sql/catalyst/expressions/Cast.scala | 9 +- .../spark/sql/catalyst/expressions/CastSuite.scala | 228 ++--- 3 files changed, 119 insertions(+), 127 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index 557f27b..f4fd712 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -76,6 +76,9 @@ The type conversion of Spark ANSI mode follows the syntax rules of section 6.13 straightforward type conversions which are disallowed as per the ANSI standard: * NumericType <=> BooleanType * StringType <=> BinaryType +* ArrayType => String +* MapType => String +* StructType => String The valid combinations of target data type and source data type in a `CAST` expression are given by the following table. “Y” indicates that the combination is syntactically valid without restriction and “N” indicates that the combination is not valid. @@ -89,9 +92,9 @@ The type conversion of Spark ANSI mode follows the syntax rules of section 6.13 | Interval | N | Y | N| N | Y| N | N | N | N | N | | Boolean | Y | Y | N| N | N| Y | N | N | N | N | | Binary| N | Y | N| N | N| N | Y | N | N | N | -| Array | N | N | N| N | N| N | N | **Y** | N | N | -| Map | N | N | N| N | N| N | N | N | **Y** | N | -| Struct| N | N | N| N | N| N | N | N | N | **Y** | +| Array | N | Y | N| N | N| N | N | **Y** | N | N | +| Map | N | Y | N| N | N| N | N | N | **Y** | N | +| Struct| N | Y | N| N | N| N | N | N | N | **Y** | In the table above, all the `CAST`s that can cause runtime exceptions are marked as red **Y**: * CAST(Numeric AS Numeric): raise an overflow exception if the value is out of the target data type's range. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala index 9135e6c..7599947 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala @@ -1873,6 +1873,8 @@ object AnsiCast { case (NullType, _) => true +case (_, StringType) => true + case (StringType, _: BinaryType) => true case (StringType, BooleanType) => true @@ -1890,13 +1892,6 @@ object AnsiCast { case (StringType, _: NumericType) => true case (BooleanType, _: NumericType) => true -case (_: NumericType, StringType) => true -case (_: DateType, StringType) => true -case (_: TimestampType, StringType) => true -case (_: CalendarIntervalType, StringType) => true -case (BooleanType, StringType) => true -case (BinaryType, StringType) => true - case (ArrayType(fromType, fn), ArrayType(toType, tn)) => canCast(fromType
[spark] branch master updated (1c3bdab -> 48ef9bd)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1c3bdab [SPARK-34911][SQL] Fix code not close issue in monitoring.md add 48ef9bd [SPARK-34915][INFRA] Cache Maven, SBT and Scala in all jobs that use them No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 44 1 file changed, 44 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34881][SQL] New SQL Function: TRY_CAST
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3951e33 [SPARK-34881][SQL] New SQL Function: TRY_CAST 3951e33 is described below commit 3951e3371a83578a81474ed99fb50d59f27aac62 Author: Gengliang Wang AuthorDate: Wed Mar 31 20:47:04 2021 +0800 [SPARK-34881][SQL] New SQL Function: TRY_CAST ### What changes were proposed in this pull request? Add a new SQL function `try_cast`. `try_cast` is identical to `AnsiCast` (or `Cast` when `spark.sql.ansi.enabled` is true), except it returns NULL instead of raising an error. This expression has one major difference from `cast` with `spark.sql.ansi.enabled` as true: when the source value can't be stored in the target integral(Byte/Short/Int/Long) type, `try_cast` returns null instead of returning the low order bytes of the source value. Note that the result of `try_cast` is not affected by the configuration `spark.sql.ansi.enabled`. This is learned from Google BigQuery and Snowflake: https://docs.snowflake.com/en/sql-reference/functions/try_cast.html https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#safe_casting ### Why are the changes needed? This is an useful for the following scenarios: 1. When ANSI mode is on, users can choose `try_cast` an alternative way to run SQL without errors for certain operations. 2. When ANSI mode is off, users can use `try_cast` to get a more reasonable result for casting a value to an integral type: when an overflow error happens, `try_cast` returns null while `cast` returns the low order bytes of the source value. ### Does this PR introduce _any_ user-facing change? Yes, adding a new function `try_cast` ### How was this patch tested? Unit tests. Closes #31982 from gengliangwang/tryCast. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- docs/sql-ref-ansi-compliance.md| 1 + .../apache/spark/sql/catalyst/parser/SqlBase.g4| 5 +- .../spark/sql/catalyst/expressions/Cast.scala | 27 +-- .../spark/sql/catalyst/expressions/TryCast.scala | 85 .../spark/sql/catalyst/parser/AstBuilder.scala | 8 +- .../spark/sql/catalyst/expressions/CastSuite.scala | 52 +++-- .../sql/catalyst/expressions/TryCastSuite.scala| 51 + .../test/resources/sql-tests/inputs/try_cast.sql | 54 + .../resources/sql-tests/results/try_cast.sql.out | 234 + 9 files changed, 486 insertions(+), 31 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index f4fd712..70a1fa3 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -434,6 +434,7 @@ Below is a list of all the keywords in Spark SQL. |TRIM|non-reserved|non-reserved|non-reserved| |TRUE|non-reserved|non-reserved|reserved| |TRUNCATE|non-reserved|non-reserved|reserved| +|TRY_CAST|non-reserved|non-reserved|non-reserved| |TYPE|non-reserved|non-reserved|non-reserved| |UNARCHIVE|non-reserved|non-reserved|non-reserved| |UNBOUNDED|non-reserved|non-reserved|non-reserved| diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index e694eda..55ba375 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -805,7 +805,7 @@ primaryExpression : name=(CURRENT_DATE | CURRENT_TIMESTAMP) #currentDatetime | CASE whenClause+ (ELSE elseExpression=expression)? END #searchedCase | CASE value=expression whenClause+ (ELSE elseExpression=expression)? END #simpleCase -| CAST '(' expression AS dataType ')' #cast +| name=(CAST | TRY_CAST) '(' expression AS dataType ')' #cast | STRUCT '(' (argument+=namedExpression (',' argument+=namedExpression)*)? ')' #struct | FIRST '(' expression (IGNORE NULLS)? ')' #first | LAST '(' expression (IGNORE NULLS)? ')' #last @@ -1199,6 +1199,7 @@ ansiNonReserved | TRIM | TRUE | TRUNCATE +| TRY_CAST | TYPE | UNARCHIVE | UNBOUNDED @@ -1461,6 +1462,7 @@ nonReserved | TRIM | TRUE | TRUNCATE +| TRY_CAST | TYPE | UNARCHIVE | UNBOUNDED @@ -1720,6 +1722,7 @@ TRANSFORM: 'TRANSFORM'; TRIM: 'TRIM'; TRUE: 'TRUE'; TRUNCATE
[spark] branch master updated (3c7d6c3 -> f208d80)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3c7d6c3 [SPARK-27658][SQL] Add FunctionCatalog API add f208d80 [SPARK-34970][SQL][SERCURITY] Redact map-type options in the output of explain() No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/trees/TreeNode.scala | 17 ++- .../resources/sql-tests/results/describe.sql.out | 2 +- .../scala/org/apache/spark/sql/ExplainSuite.scala | 53 ++ 3 files changed, 69 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34881][SQL][FOLLOW-UP] Use multiline string for TryCast' expression description
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8a2138d [SPARK-34881][SQL][FOLLOW-UP] Use multiline string for TryCast' expression description 8a2138d is described below commit 8a2138d09f489512e229c6a9e9860d7bf9ac6445 Author: Hyukjin Kwon AuthorDate: Thu Apr 1 14:50:05 2021 +0800 [SPARK-34881][SQL][FOLLOW-UP] Use multiline string for TryCast' expression description ### What changes were proposed in this pull request? This PR fixes JDK 11 compilation failed: ``` /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala:35: error: annotation argument needs to be a constant; found: "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`. ".+("This expression is identical to CAST with configuration `spark.sql.ansi.enabled` as ").+("true, except it returns NULL instead of raising an error. Note that the behavior of this ").+("expression doesn\'t depend on configuration [...] "true, except it returns NULL instead of raising an error. Note that the behavior of this " + ``` For whatever reason, it doesn't know that the string is actually a constant. This PR simply switches it to multi-line style (which is actually more correct). Reference: https://github.com/apache/spark/blob/bd0990e3e813d17065c593fc74f383b494fe8146/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L53-L57 ### Why are the changes needed? To recover the build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? CI in this PR Closes #32019 from HyukjinKwon/SPARK-34881. Lead-authored-by: Hyukjin Kwon Co-authored-by: HyukjinKwon Signed-off-by: Gengliang Wang --- .../org/apache/spark/sql/catalyst/expressions/TryCast.scala| 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala index aba76db..cae25a2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryCast.scala @@ -30,10 +30,12 @@ import org.apache.spark.sql.types.DataType * session local timezone by an analyzer [[ResolveTimeZone]]. */ @ExpressionDescription( - usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`. " + -"This expression is identical to CAST with configuration `spark.sql.ansi.enabled` as " + -"true, except it returns NULL instead of raising an error. Note that the behavior of this " + -"expression doesn't depend on configuration `spark.sql.ansi.enabled`.", + usage = """ +_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`. + This expression is identical to CAST with configuration `spark.sql.ansi.enabled` as + true, except it returns NULL instead of raising an error. Note that the behavior of this + expression doesn't depend on configuration `spark.sql.ansi.enabled`. + """, examples = """ Examples: > SELECT _FUNC_('10' as int); - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (53e4dba -> 2b1c170)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 53e4dba [SPARK-34599][SQL] Fix the issue that INSERT INTO OVERWRITE doesn't support partition columns containing dot for DSv2 add 2b1c170 [SPARK-34614][SQL] ANSI mode: Casting String to Boolean should throw exception on parse error No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md| 1 + .../spark/sql/catalyst/expressions/Cast.scala | 14 +- .../spark/sql/catalyst/expressions/CastSuite.scala | 244 + .../sql-tests/results/postgreSQL/boolean.sql.out | 85 +++ 4 files changed, 264 insertions(+), 80 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org