[spark] branch 3.0.1 created (now 3fdfce3)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch 3.0.1 in repository https://gitbox.apache.org/repos/asf/spark.git. at 3fdfce3 Preparing Spark release v3.0.0-rc3 No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (37ef7bb -> f80be41)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 37ef7bb [SPARK-35840][SQL] Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType` add f80be41 [SPARK-34565][SQL] Collapse Window nodes with Project between them No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/Optimizer.scala | 25 --- .../catalyst/optimizer/CollapseWindowSuite.scala | 50 +- 2 files changed, 68 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5c96d64 -> b08cf6e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5c96d64 [SPARK-35707][ML] optimize sparse GEMM by skipping bound checking add b08cf6e [SPARK-35203][SQL] Improve Repartition statistics estimation No new revisions were added by this update. Summary of changes: .../logical/statsEstimation/BasicStatsPlanVisitor.scala | 4 ++-- .../SizeInBytesOnlyStatsPlanVisitor.scala | 4 ++-- .../statsEstimation/BasicStatsEstimationSuite.scala | 17 - 3 files changed, 16 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ac228d4 -> 11e96dc)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ac228d4 [SPARK-35691][CORE] addFile/addJar/addDirectory should put CanonicalFile add 11e96dc [SPARK-35669][SQL] Quote the pushed column name only when nested column predicate pushdown is enabled No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/sources/filters.scala | 5 ++-- .../execution/datasources/DataSourceStrategy.scala | 31 +- .../spark/sql/FileBasedDataSourceSuite.scala | 10 +++ 3 files changed, 31 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (864ff67 -> 9709ee5)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 864ff67 [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs add 9709ee5 [SPARK-35760][SQL] Fix the max rows check for broadcast exchange No new revisions were added by this update. Summary of changes: .../execution/exchange/BroadcastExchangeExec.scala | 25 +++--- 1 file changed, 17 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e9af457 -> c463472)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e9af457 [SPARK-35718][SQL] Support casting of Date to timestamp without time zone type add c463472 [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions No new revisions were added by this update. Summary of changes: .../expressions/EquivalentExpressions.scala| 45 -- .../SubexpressionEliminationSuite.scala| 21 ++ 2 files changed, 45 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf07036 -> 912d60b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf07036 [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs add 912d60b [SPARK-35709][DOCS] Remove the reference to third party Nomad integration project No new revisions were added by this update. Summary of changes: docs/cluster-overview.md | 3 --- 1 file changed, 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a59063d -> 08e6f63)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a59063d [SPARK-35581][SQL] Support special datetime values in typed literals only add 08e6f63 [SPARK-35577][TESTS] Allow to log container output for docker integration tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fdd7ca5 -> 548e37b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fdd7ca5 [SPARK-35498][PYTHON] Add thread target wrapper API for pyspark pin thread mode add 548e37b [SPARK-33122][SQL][FOLLOWUP] Extend RemoveRedundantAggregates optimizer rule to apply to more cases No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/Optimizer.scala | 43 + .../optimizer/RemoveRedundantAggregates.scala | 70 ++ .../optimizer/RemoveRedundantAggregatesSuite.scala | 16 - 3 files changed, 86 insertions(+), 43 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bdd8e1d -> e170e63)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bdd8e1d [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory add e170e63 [SPARK-35457][BUILD] Bump ANTLR runtime version to 4.8 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d1b24d8 -> 586caae)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d1b24d8 [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures add 586caae [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/execution/window/WindowExec.scala | 2 +- .../scala/org/apache/spark/sql/execution/window/WindowExecBase.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9283beb -> 1214213)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9283beb [SPARK-35418][SQL] Add sentences function to functions.{scala,py} add 1214213 [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation No new revisions were added by this update. Summary of changes: .../logical/statsEstimation/FilterEstimation.scala | 2 +- .../logical/statsEstimation/UnionEstimation.scala | 97 ++ .../BasicStatsEstimationSuite.scala| 2 +- .../statsEstimation/UnionEstimationSuite.scala | 65 +-- 4 files changed, 122 insertions(+), 44 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a72d05c -> 46f7d78)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a72d05c [SPARK-35106][CORE][SQL] Avoid failing rename caused by destination directory not exist add 46f7d78 [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation No new revisions were added by this update. Summary of changes: .../plans/logical/basicLogicalOperators.scala | 43 +++- .../BasicStatsEstimationSuite.scala| 81 ++ 2 files changed, 108 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (186477c -> b1493d8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 186477c [SPARK-35263][TEST] Refactor ShuffleBlockFetcherIteratorSuite to reduce duplicated code add b1493d8 [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/codegen/CodeGenerator.scala | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7b942d5 -> cce0048)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7b942d5 [SPARK-35425][BUILD] Pin jinja2 in `spark-rm/Dockerfile` and add as a required dependency in the release README.md add cce0048 [SPARK-35351][SQL] Add code-gen for left anti sort merge join No new revisions were added by this update. Summary of changes: .../sql/execution/joins/SortMergeJoinExec.scala| 97 ++ .../approved-plans-v1_4/q16.sf100/explain.txt | 4 +- .../approved-plans-v1_4/q16.sf100/simplified.txt | 5 +- .../approved-plans-v1_4/q16/explain.txt| 4 +- .../approved-plans-v1_4/q16/simplified.txt | 5 +- .../approved-plans-v1_4/q69.sf100/explain.txt | 36 +++ .../approved-plans-v1_4/q69.sf100/simplified.txt | 110 +++-- .../approved-plans-v1_4/q87.sf100/explain.txt | 8 +- .../approved-plans-v1_4/q87.sf100/simplified.txt | 10 +- .../approved-plans-v1_4/q94.sf100/explain.txt | 4 +- .../approved-plans-v1_4/q94.sf100/simplified.txt | 5 +- .../approved-plans-v1_4/q94/explain.txt| 4 +- .../approved-plans-v1_4/q94/simplified.txt | 5 +- .../sql/execution/WholeStageCodegenSuite.scala | 22 + .../sql/execution/metric/SQLMetricsSuite.scala | 4 +- 15 files changed, 208 insertions(+), 115 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8ebc1d3 [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit 8ebc1d3 is described below commit 8ebc1d317f978e524d55449ecc88daa806dde009 Author: Takeshi Yamamuro AuthorDate: Mon May 17 09:26:04 2021 +0900 [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit ### What changes were proposed in this pull request? This PR proposes to use the SHA of the latest commit ([2a5078a782192ddb6efbcead8de9973d6ab4f069](https://github.com/databricks/tpcds-kit/commit/2a5078a782192ddb6efbcead8de9973d6ab4f069)) when checking out `databricks/tpcds-kit`. This can prevent the test workflow from breaking accidentally if the repository changes drastically. ### Why are the changes needed? For better test workflow. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GA passed. Closes #32561 from maropu/UseRefInCheckout. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit 2390b9dbcbc0b0377d694d2c3c2c0fa78179cbd6) Signed-off-by: Takeshi Yamamuro --- .github/workflows/build_and_test.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 936a256..77a2c79 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -428,6 +428,7 @@ jobs: uses: actions/checkout@v2 with: repository: databricks/tpcds-kit +ref: 2a5078a782192ddb6efbcead8de9973d6ab4f069 path: ./tpcds-kit - name: Build tpcds-kit if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true' - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new f9a396c [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit f9a396c is described below commit f9a396c37cb340671666379cb8d8a85435c7ad87 Author: Takeshi Yamamuro AuthorDate: Mon May 17 09:26:04 2021 +0900 [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit ### What changes were proposed in this pull request? This PR proposes to use the SHA of the latest commit ([2a5078a782192ddb6efbcead8de9973d6ab4f069](https://github.com/databricks/tpcds-kit/commit/2a5078a782192ddb6efbcead8de9973d6ab4f069)) when checking out `databricks/tpcds-kit`. This can prevent the test workflow from breaking accidentally if the repository changes drastically. ### Why are the changes needed? For better test workflow. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GA passed. Closes #32561 from maropu/UseRefInCheckout. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit 2390b9dbcbc0b0377d694d2c3c2c0fa78179cbd6) Signed-off-by: Takeshi Yamamuro --- .github/workflows/build_and_test.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 173cc0e..c8b4c77 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -481,6 +481,7 @@ jobs: uses: actions/checkout@v2 with: repository: databricks/tpcds-kit +ref: 2a5078a782192ddb6efbcead8de9973d6ab4f069 path: ./tpcds-kit - name: Build tpcds-kit if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true' - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2eef2f9 -> 2390b9d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2eef2f9 [SPARK-35412][SQL] Fix a bug in groupBy of year-month/day-time intervals add 2390b9d [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae0579a -> 3241aeb)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae0579a [SPARK-35369][DOC] Document ExecutorAllocationManager metrics add 3241aeb [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests No new revisions were added by this update. Summary of changes: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala | 10 +- .../test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala | 6 -- 2 files changed, 9 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (44bd0a8 -> c4ca232)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 44bd0a8 [SPARK-35088][SQL][FOLLOWUP] Improve the error message for Sequence expression add c4ca232 [SPARK-35363][SQL] Refactor sort merge join code-gen be agnostic to join type No new revisions were added by this update. Summary of changes: .../spark/sql/execution/joins/ShuffledJoin.scala | 2 +- .../sql/execution/joins/SortMergeJoinExec.scala| 163 +++-- 2 files changed, 84 insertions(+), 81 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (620f072 -> 38eb5a6)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 620f072 [SPARK-35231][SQL] logical.Range override maxRowsPerPartition add 38eb5a6 [SPARK-35354][SQL] Replace BaseJoinExec with ShuffledJoin in CoalesceBucketsInJoin No new revisions were added by this update. Summary of changes: .../sql/execution/bucketing/CoalesceBucketsInJoin.scala | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5b65d8a -> 620f072)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5b65d8a [SPARK-35347][SQL] Use MethodUtils for looking up methods in Invoke and StaticInvoke add 620f072 [SPARK-35231][SQL] logical.Range override maxRowsPerPartition No new revisions were added by this update. Summary of changes: .../sql/catalyst/plans/logical/basicLogicalOperators.scala | 12 .../apache/spark/sql/catalyst/plans/LogicalPlanSuite.scala | 11 ++- 2 files changed, 22 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b025780 -> 06c4009)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b025780 [SPARK-35331][SQL] Support resolving missing attrs for distribute/cluster by/repartition hint add 06c4009 [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results No new revisions were added by this update. Summary of changes: .../resources/tpcds-query-results/v1_4/q6.sql.out | 51 -- .../resources/tpcds-query-results/v1_4/q75.sql.out | 105 - .../scala/org/apache/spark/sql/TPCDSBase.scala | 2 +- .../org/apache/spark/sql/TPCDSQueryTestSuite.scala | 6 ++ 4 files changed, 7 insertions(+), 157 deletions(-) delete mode 100644 sql/core/src/test/resources/tpcds-query-results/v1_4/q6.sql.out delete mode 100644 sql/core/src/test/resources/tpcds-query-results/v1_4/q75.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2634dba -> 6f0ef93)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2634dba [SPARK-35175][BUILD] Add linter for JavaScript source files add 6f0ef93 [SPARK-35297][CORE][DOC][MINOR] Modify the comment about the executor No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/executor/Executor.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (19661f6 -> 5c67d0c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 19661f6 [SPARK-35325][SQL][TESTS] Add nested column ORC encryption test case add 5c67d0c [SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml |6 +- .../resources/tpcds-query-results/v1_4/q1.sql.out | 184 +- .../resources/tpcds-query-results/v1_4/q10.sql.out | 11 +- .../resources/tpcds-query-results/v1_4/q11.sql.out |6 + .../resources/tpcds-query-results/v1_4/q12.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q13.sql.out |2 +- .../tpcds-query-results/v1_4/q14a.sql.out | 200 +- .../tpcds-query-results/v1_4/q14b.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q15.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q16.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q17.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q18.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q19.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q2.sql.out | 5026 +-- .../resources/tpcds-query-results/v1_4/q20.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q21.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q22.sql.out | 200 +- .../tpcds-query-results/v1_4/q23a.sql.out |2 +- .../tpcds-query-results/v1_4/q23b.sql.out |5 +- .../tpcds-query-results/v1_4/q24a.sql.out |8 +- .../tpcds-query-results/v1_4/q24b.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q25.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q26.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q27.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q28.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q29.sql.out |3 +- .../resources/tpcds-query-results/v1_4/q3.sql.out | 172 +- .../resources/tpcds-query-results/v1_4/q30.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q31.sql.out | 112 +- .../resources/tpcds-query-results/v1_4/q32.sql.out |2 - .../resources/tpcds-query-results/v1_4/q33.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q34.sql.out | 434 +- .../resources/tpcds-query-results/v1_4/q35.sql.out | 188 +- .../resources/tpcds-query-results/v1_4/q36.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q37.sql.out |3 +- .../resources/tpcds-query-results/v1_4/q38.sql.out |2 +- .../tpcds-query-results/v1_4/q39a.sql.out | 449 +- .../tpcds-query-results/v1_4/q39b.sql.out | 24 +- .../resources/tpcds-query-results/v1_4/q4.sql.out | 10 +- .../resources/tpcds-query-results/v1_4/q40.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q41.sql.out |9 +- .../resources/tpcds-query-results/v1_4/q42.sql.out | 21 +- .../resources/tpcds-query-results/v1_4/q43.sql.out | 12 +- .../resources/tpcds-query-results/v1_4/q44.sql.out | 20 +- .../resources/tpcds-query-results/v1_4/q45.sql.out | 39 +- .../resources/tpcds-query-results/v1_4/q46.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q47.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q48.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q49.sql.out | 64 +- .../resources/tpcds-query-results/v1_4/q5.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q50.sql.out | 12 +- .../resources/tpcds-query-results/v1_4/q51.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q52.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q53.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q54.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q55.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q56.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q57.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q58.sql.out |4 +- .../resources/tpcds-query-results/v1_4/q59.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q6.sql.out | 91 +- .../resources/tpcds-query-results/v1_4/q60.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q61.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q62.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q63.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q64.sql.out | 19 +- .../resources/tpcds-query-results/v1_4/q65.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q66.sql.out | 10 +- .../resources/tpcds-query-results/v1_4/q67.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q68.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q69.sql.out | 182 +- .../resources/tpcds-query-results/v1_4/q7.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q70.sql.out |6 +- .../resources/tpcds-query-results/v1_4
[spark] 09/09: Fix
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 0c4e71e00129bcffe933a8cceffab7cf51cf33ce Author: Takeshi Yamamuro AuthorDate: Thu May 6 10:14:25 2021 +0900 Fix --- docs/sql-ref-syntax-hive-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-hive-format.md b/docs/sql-ref-syntax-hive-format.md index 8092e58..01b8d3f 100644 --- a/docs/sql-ref-syntax-hive-format.md +++ b/docs/sql-ref-syntax-hive-format.md @@ -30,7 +30,7 @@ There are two ways to define a row format in `row_format` of `CREATE TABLE` and ```sql row_format: -SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ] +SERDE serde_class [ WITH SERDEPROPERTIES (k1 [=] v1, k2 [=] v2, ... ) ] | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] [ MAP KEYS TERMINATED BY map_key_terminated_char ] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 08/09: remove space
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 2ebb2aac7a0c87c929e72bc7c8c080096c55a8f1 Author: Niklas Riekenbrauck AuthorDate: Tue Mar 30 13:20:32 2021 +0200 remove space --- docs/sql-ref-syntax-ddl-alter-table.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 915ccf8..ae40fe4 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** +* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... )** Specifies the SERDE properties to be set. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch pull/31899 created (now 0c4e71e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git. at 0c4e71e Fix This branch includes the following new commits: new c685abe Update docs to reflect alternative key value notation new 2ff9703 Update docs other create table docs new 8245a55 Fix alternatives with subrule grammar new 1d157ed Update to eaasier KV syntax new 83ec2ee Commit missing doc updates new fff449b Some more fixes new 42cd52e Remove unnecessary change new 2ebb2aa remove space new 0c4e71e Fix The 9 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 06/09: Some more fixes
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit fff449bd54f2204d7cfc7a5fcf5c8877aa37a992 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:22:11 2021 +0100 Some more fixes --- docs/sql-ref-syntax-ddl-alter-table.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 866b596..915ccf8 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -169,7 +169,7 @@ this overrides the old value with the new one. ```sql -- Set Table Properties -ALTER TABLE table_identifier SET TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ) +ALTER TABLE table_identifier SET TBLPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) -- Unset Table Properties ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ) @@ -219,7 +219,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' Specifies the partition on which the property has to be set. Note that one can use a typed literal (e.g., date'2019-01-02') in the partition spec. -**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` +**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` * **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 48d089d..3231b66 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ] +[ TBLPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ] [ AS select_statement ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 07/09: Remove unnecessary change
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 42cd52e297b141a8b837a8315ca4c84a5ffc3def Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:27:52 2021 +0100 Remove unnecessary change --- docs/sql-ref-syntax-ddl-alter-database.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-alter-database.md b/docs/sql-ref-syntax-ddl-alter-database.md index 2de9675..6ac6863 100644 --- a/docs/sql-ref-syntax-ddl-alter-database.md +++ b/docs/sql-ref-syntax-ddl-alter-database.md @@ -31,7 +31,7 @@ for a database and may be used for auditing purposes. ```sql ALTER { DATABASE | SCHEMA } database_name -SET DBPROPERTIES ( ( property_name [=] property_value [ , ... ] | ( property_name property_value [ , ... ] ) +SET DBPROPERTIES ( property_name [=] property_value [ , ... ] ) ``` ### Parameters - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 04/09: Update to eaasier KV syntax
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 1d157ed7209b355294b8e07e672ba8b5916e93f5 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:17:26 2021 +0100 Update to eaasier KV syntax --- docs/sql-ref-syntax-ddl-alter-database.md | 2 +- docs/sql-ref-syntax-ddl-alter-table.md | 8 docs/sql-ref-syntax-ddl-alter-view.md | 2 +- docs/sql-ref-syntax-ddl-create-database.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 7 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-database.md b/docs/sql-ref-syntax-ddl-alter-database.md index fbc454e..2de9675 100644 --- a/docs/sql-ref-syntax-ddl-alter-database.md +++ b/docs/sql-ref-syntax-ddl-alter-database.md @@ -31,7 +31,7 @@ for a database and may be used for auditing purposes. ```sql ALTER { DATABASE | SCHEMA } database_name -SET DBPROPERTIES ( property_name = property_value [ , ... ] ) +SET DBPROPERTIES ( ( property_name [=] property_value [ , ... ] | ( property_name property_value [ , ... ] ) ``` ### Parameters diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 2d42eb4..912de0f 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -169,7 +169,7 @@ this overrides the old value with the new one. ```sql -- Set Table Properties -ALTER TABLE table_identifier SET TBLPROPERTIES ( key1 = val1, key2 = val2, ... ) +ALTER TABLE table_identifier SET TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ) -- Unset Table Properties ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ) @@ -184,10 +184,10 @@ ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ```sql -- Set SERDE Properties ALTER TABLE table_identifier [ partition_spec ] -SET SERDEPROPERTIES ( key1 = val1, key2 = val2, ... ) +SET SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ALTER TABLE table_identifier [ partition_spec ] SET SERDE serde_class_name -[ WITH SERDEPROPERTIES ( key1 = val1, key2 = val2, ... ) ] +[ WITH SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] ``` SET LOCATION And SET FILE FORMAT @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( key1 = val1, key2 = val2, ... )** +* **SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ** Specifies the SERDE properties to be set. diff --git a/docs/sql-ref-syntax-ddl-alter-view.md b/docs/sql-ref-syntax-ddl-alter-view.md index d69f246..25280c4 100644 --- a/docs/sql-ref-syntax-ddl-alter-view.md +++ b/docs/sql-ref-syntax-ddl-alter-view.md @@ -49,7 +49,7 @@ the properties. Syntax ```sql -ALTER VIEW view_identifier SET TBLPROPERTIES ( property_key = property_val [ , ... ] ) +ALTER VIEW view_identifier SET TBLPROPERTIES ( property_key [=] property_val [ , ... ] ) ``` Parameters diff --git a/docs/sql-ref-syntax-ddl-create-database.md b/docs/sql-ref-syntax-ddl-create-database.md index 9d8bf47..7db410e 100644 --- a/docs/sql-ref-syntax-ddl-create-database.md +++ b/docs/sql-ref-syntax-ddl-create-database.md @@ -29,7 +29,7 @@ Creates a database with the specified name. If database with the same name alrea CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name [ COMMENT database_comment ] [ LOCATION database_directory ] -[ WITH DBPROPERTIES ( property_name = property_value [ , ... ] ) ] +[ WITH DBPROPERTIES ( property_name [=] property_value [ , ... ] ) ] ``` ### Parameters @@ -50,7 +50,7 @@ CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name Specifies the description for the database. -* **WITH DBPROPERTIES ( property_name=property_value [ , ... ] )** +* **WITH DBPROPERTIES ( property_name [=] property_value [ , ... ] )** Specifies the properties for the database in key-value pairs. diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index 9926bc6..7d8e692 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS ( ( key1=val1, key2=val2, ... ) | ( key1 val1,
[spark] 01/09: Update docs to reflect alternative key value notation
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit c685abe33681fcbf0bfa6aa86ba229f19e4d451f Author: Niklas Riekenbrauck AuthorDate: Fri Mar 19 14:53:53 2021 +0100 Update docs to reflect alternative key value notation --- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index ba0516a..82d3a09 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS ( key1=val1, key2=val2, ... ) ] +[ OPTIONS [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ] [ LOCATION path ] [ COMMENT table_comment ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ AS select_statement ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 03/09: Fix alternatives with subrule grammar
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 8245a55dd1092fe9ef3fbcacb5cf07d1888ac23a Author: Niklas Riekenbrauck AuthorDate: Sat Mar 20 13:00:08 2021 +0100 Fix alternatives with subrule grammar --- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index 82d3a09..9926bc6 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ OPTIONS ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ] [ LOCATION path ] [ COMMENT table_comment ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 63880d5..2e05e64 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-like.md b/docs/sql-ref-syntax-ddl-create-table-like.md index a374296a..772b299 100644 --- a/docs/sql-ref-syntax-ddl-create-table-like.md +++ b/docs/sql-ref-syntax-ddl-create-table-like.md @@ -30,7 +30,7 @@ CREATE TABLE [IF NOT EXISTS] table_identifier LIKE source_table_identifier USING data_source [ ROW FORMAT row_format ] [ STORED AS file_format ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ LOCATION path ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/09: Update docs other create table docs
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 2ff970350427835a7b7f7f9d0ec7bc8f1049f7fd Author: Niklas Riekenbrauck AuthorDate: Fri Mar 19 15:17:48 2021 +0100 Update docs other create table docs --- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index b2f5957..63880d5 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-like.md b/docs/sql-ref-syntax-ddl-create-table-like.md index cfb959c..a374296a 100644 --- a/docs/sql-ref-syntax-ddl-create-table-like.md +++ b/docs/sql-ref-syntax-ddl-create-table-like.md @@ -30,7 +30,7 @@ CREATE TABLE [IF NOT EXISTS] table_identifier LIKE source_table_identifier USING data_source [ ROW FORMAT row_format ] [ STORED AS file_format ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ LOCATION path ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 05/09: Commit missing doc updates
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 83ec2ee71751142220464ea54ffc6e47ccc35ad4 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:19:27 2021 +0100 Commit missing doc updates --- docs/sql-ref-syntax-ddl-alter-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 912de0f..866b596 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -184,10 +184,10 @@ ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ```sql -- Set SERDE Properties ALTER TABLE table_identifier [ partition_spec ] -SET SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) +SET SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ALTER TABLE table_identifier [ partition_spec ] SET SERDE serde_class_name -[ WITH SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] +[ WITH SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ] ``` SET LOCATION And SET FILE FORMAT @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ** +* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** Specifies the SERDE properties to be set. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 09/09: Fix
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 0c4e71e00129bcffe933a8cceffab7cf51cf33ce Author: Takeshi Yamamuro AuthorDate: Thu May 6 10:14:25 2021 +0900 Fix --- docs/sql-ref-syntax-hive-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-hive-format.md b/docs/sql-ref-syntax-hive-format.md index 8092e58..01b8d3f 100644 --- a/docs/sql-ref-syntax-hive-format.md +++ b/docs/sql-ref-syntax-hive-format.md @@ -30,7 +30,7 @@ There are two ways to define a row format in `row_format` of `CREATE TABLE` and ```sql row_format: -SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ] +SERDE serde_class [ WITH SERDEPROPERTIES (k1 [=] v1, k2 [=] v2, ... ) ] | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] [ MAP KEYS TERMINATED BY map_key_terminated_char ] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 08/09: remove space
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 2ebb2aac7a0c87c929e72bc7c8c080096c55a8f1 Author: Niklas Riekenbrauck AuthorDate: Tue Mar 30 13:20:32 2021 +0200 remove space --- docs/sql-ref-syntax-ddl-alter-table.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 915ccf8..ae40fe4 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** +* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... )** Specifies the SERDE properties to be set. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 05/09: Commit missing doc updates
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 83ec2ee71751142220464ea54ffc6e47ccc35ad4 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:19:27 2021 +0100 Commit missing doc updates --- docs/sql-ref-syntax-ddl-alter-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 912de0f..866b596 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -184,10 +184,10 @@ ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ```sql -- Set SERDE Properties ALTER TABLE table_identifier [ partition_spec ] -SET SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) +SET SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ALTER TABLE table_identifier [ partition_spec ] SET SERDE serde_class_name -[ WITH SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] +[ WITH SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ] ``` SET LOCATION And SET FILE FORMAT @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ** +* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** Specifies the SERDE properties to be set. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 06/09: Some more fixes
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit fff449bd54f2204d7cfc7a5fcf5c8877aa37a992 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:22:11 2021 +0100 Some more fixes --- docs/sql-ref-syntax-ddl-alter-table.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 866b596..915ccf8 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -169,7 +169,7 @@ this overrides the old value with the new one. ```sql -- Set Table Properties -ALTER TABLE table_identifier SET TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ) +ALTER TABLE table_identifier SET TBLPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) -- Unset Table Properties ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ) @@ -219,7 +219,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' Specifies the partition on which the property has to be set. Note that one can use a typed literal (e.g., date'2019-01-02') in the partition spec. -**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` +**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` * **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 48d089d..3231b66 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ] +[ TBLPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ] [ AS select_statement ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 03/09: Fix alternatives with subrule grammar
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 8245a55dd1092fe9ef3fbcacb5cf07d1888ac23a Author: Niklas Riekenbrauck AuthorDate: Sat Mar 20 13:00:08 2021 +0100 Fix alternatives with subrule grammar --- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index 82d3a09..9926bc6 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ OPTIONS ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ] [ LOCATION path ] [ COMMENT table_comment ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 63880d5..2e05e64 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-like.md b/docs/sql-ref-syntax-ddl-create-table-like.md index a374296a..772b299 100644 --- a/docs/sql-ref-syntax-ddl-create-table-like.md +++ b/docs/sql-ref-syntax-ddl-create-table-like.md @@ -30,7 +30,7 @@ CREATE TABLE [IF NOT EXISTS] table_identifier LIKE source_table_identifier USING data_source [ ROW FORMAT row_format ] [ STORED AS file_format ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ LOCATION path ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/09: Update docs other create table docs
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 2ff970350427835a7b7f7f9d0ec7bc8f1049f7fd Author: Niklas Riekenbrauck AuthorDate: Fri Mar 19 15:17:48 2021 +0100 Update docs other create table docs --- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index b2f5957..63880d5 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-like.md b/docs/sql-ref-syntax-ddl-create-table-like.md index cfb959c..a374296a 100644 --- a/docs/sql-ref-syntax-ddl-create-table-like.md +++ b/docs/sql-ref-syntax-ddl-create-table-like.md @@ -30,7 +30,7 @@ CREATE TABLE [IF NOT EXISTS] table_identifier LIKE source_table_identifier USING data_source [ ROW FORMAT row_format ] [ STORED AS file_format ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ LOCATION path ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch pr31899 created (now 0c4e71e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git. at 0c4e71e Fix This branch includes the following new commits: new c685abe Update docs to reflect alternative key value notation new 2ff9703 Update docs other create table docs new 8245a55 Fix alternatives with subrule grammar new 1d157ed Update to eaasier KV syntax new 83ec2ee Commit missing doc updates new fff449b Some more fixes new 42cd52e Remove unnecessary change new 2ebb2aa remove space new 0c4e71e Fix The 9 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 07/09: Remove unnecessary change
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 42cd52e297b141a8b837a8315ca4c84a5ffc3def Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:27:52 2021 +0100 Remove unnecessary change --- docs/sql-ref-syntax-ddl-alter-database.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-alter-database.md b/docs/sql-ref-syntax-ddl-alter-database.md index 2de9675..6ac6863 100644 --- a/docs/sql-ref-syntax-ddl-alter-database.md +++ b/docs/sql-ref-syntax-ddl-alter-database.md @@ -31,7 +31,7 @@ for a database and may be used for auditing purposes. ```sql ALTER { DATABASE | SCHEMA } database_name -SET DBPROPERTIES ( ( property_name [=] property_value [ , ... ] | ( property_name property_value [ , ... ] ) +SET DBPROPERTIES ( property_name [=] property_value [ , ... ] ) ``` ### Parameters - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/09: Update docs to reflect alternative key value notation
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit c685abe33681fcbf0bfa6aa86ba229f19e4d451f Author: Niklas Riekenbrauck AuthorDate: Fri Mar 19 14:53:53 2021 +0100 Update docs to reflect alternative key value notation --- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index ba0516a..82d3a09 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS ( key1=val1, key2=val2, ... ) ] +[ OPTIONS [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ] [ LOCATION path ] [ COMMENT table_comment ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ AS select_statement ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 04/09: Update to eaasier KV syntax
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 1d157ed7209b355294b8e07e672ba8b5916e93f5 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:17:26 2021 +0100 Update to eaasier KV syntax --- docs/sql-ref-syntax-ddl-alter-database.md | 2 +- docs/sql-ref-syntax-ddl-alter-table.md | 8 docs/sql-ref-syntax-ddl-alter-view.md | 2 +- docs/sql-ref-syntax-ddl-create-database.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 7 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-database.md b/docs/sql-ref-syntax-ddl-alter-database.md index fbc454e..2de9675 100644 --- a/docs/sql-ref-syntax-ddl-alter-database.md +++ b/docs/sql-ref-syntax-ddl-alter-database.md @@ -31,7 +31,7 @@ for a database and may be used for auditing purposes. ```sql ALTER { DATABASE | SCHEMA } database_name -SET DBPROPERTIES ( property_name = property_value [ , ... ] ) +SET DBPROPERTIES ( ( property_name [=] property_value [ , ... ] | ( property_name property_value [ , ... ] ) ``` ### Parameters diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 2d42eb4..912de0f 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -169,7 +169,7 @@ this overrides the old value with the new one. ```sql -- Set Table Properties -ALTER TABLE table_identifier SET TBLPROPERTIES ( key1 = val1, key2 = val2, ... ) +ALTER TABLE table_identifier SET TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ) -- Unset Table Properties ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ) @@ -184,10 +184,10 @@ ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ```sql -- Set SERDE Properties ALTER TABLE table_identifier [ partition_spec ] -SET SERDEPROPERTIES ( key1 = val1, key2 = val2, ... ) +SET SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ALTER TABLE table_identifier [ partition_spec ] SET SERDE serde_class_name -[ WITH SERDEPROPERTIES ( key1 = val1, key2 = val2, ... ) ] +[ WITH SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] ``` SET LOCATION And SET FILE FORMAT @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( key1 = val1, key2 = val2, ... )** +* **SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ** Specifies the SERDE properties to be set. diff --git a/docs/sql-ref-syntax-ddl-alter-view.md b/docs/sql-ref-syntax-ddl-alter-view.md index d69f246..25280c4 100644 --- a/docs/sql-ref-syntax-ddl-alter-view.md +++ b/docs/sql-ref-syntax-ddl-alter-view.md @@ -49,7 +49,7 @@ the properties. Syntax ```sql -ALTER VIEW view_identifier SET TBLPROPERTIES ( property_key = property_val [ , ... ] ) +ALTER VIEW view_identifier SET TBLPROPERTIES ( property_key [=] property_val [ , ... ] ) ``` Parameters diff --git a/docs/sql-ref-syntax-ddl-create-database.md b/docs/sql-ref-syntax-ddl-create-database.md index 9d8bf47..7db410e 100644 --- a/docs/sql-ref-syntax-ddl-create-database.md +++ b/docs/sql-ref-syntax-ddl-create-database.md @@ -29,7 +29,7 @@ Creates a database with the specified name. If database with the same name alrea CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name [ COMMENT database_comment ] [ LOCATION database_directory ] -[ WITH DBPROPERTIES ( property_name = property_value [ , ... ] ) ] +[ WITH DBPROPERTIES ( property_name [=] property_value [ , ... ] ) ] ``` ### Parameters @@ -50,7 +50,7 @@ CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name Specifies the description for the database. -* **WITH DBPROPERTIES ( property_name=property_value [ , ... ] )** +* **WITH DBPROPERTIES ( property_name [=] property_value [ , ... ] )** Specifies the properties for the database in key-value pairs. diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index 9926bc6..7d8e692 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS ( ( key1=val1, key2=val2, ... ) | ( key1 val1,
[spark] branch branch-3.0 updated: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8ef4023 [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions 8ef4023 is described below commit 8ef4023683dee537a40d376d93c329a802a929bd Author: dsolow AuthorDate: Wed May 5 12:46:13 2021 +0900 [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions ### What changes were proposed in this pull request? To fix lambda variable name issues in nested DataFrame functions, this PR modifies code to use a global counter for `LambdaVariables` names created by higher order functions. This is the rework of #31887. Closes #31887. ### Why are the changes needed? This moves away from the current hard-coded variable names which break on nested function calls. There is currently a bug where nested transforms in particular fail (the inner variable shadows the outer variable) For this query: ``` val df = Seq( (Seq(1,2,3), Seq("a", "b", "c")) ).toDF("numbers", "letters") df.select( f.flatten( f.transform( $"numbers", (number: Column) => { f.transform( $"letters", (letter: Column) => { f.struct( number.as("number"), letter.as("letter") ) } ) } ) ).as("zipped") ).show(10, false) ``` This is the current (incorrect) output: ``` ++ |zipped | ++ |[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]| ++ ``` And this is the correct output after fix: ``` ++ |zipped | ++ |[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]| ++ ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added the new test in `DataFrameFunctionsSuite`. Closes #32424 from maropu/pr31887. Lead-authored-by: dsolow Co-authored-by: Takeshi Yamamuro Co-authored-by: dmsolow Signed-off-by: Takeshi Yamamuro (cherry picked from commit f550e03b96638de93381734c4eada2ace02d9a4f) Signed-off-by: Takeshi Yamamuro --- .../expressions/higherOrderFunctions.scala | 12 ++- .../scala/org/apache/spark/sql/functions.scala | 12 +-- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 23 ++ 3 files changed, 40 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala index e5cf8c0..a530ce5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql.catalyst.expressions import java.util.Comparator -import java.util.concurrent.atomic.AtomicReference +import java.util.concurrent.atomic.{AtomicInteger, AtomicReference} import scala.collection.mutable @@ -52,6 +52,16 @@ case class UnresolvedNamedLambdaVariable(nameParts: Seq[String]) override def sql: String = name } +object UnresolvedNamedLambdaVariable { + + // Counter to ensure lambda variable names are unique + private val nextVarNameId = new AtomicInteger(0) + + def freshVarName(name: String): String = { +s"${name}_${nextVarNameId.getAndIncrement()}" + } +} + /** * A named lambda variable. */ diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index bb77c7e..f6d6200 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -3489,22 +3489,22 @@ object functions { } private def createLambda(f: Column
[spark] branch branch-3.1 updated: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 6df4ec0 [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions 6df4ec0 is described below commit 6df4ec09a17077c2a0b114a7bf5736711ba268e4 Author: dsolow AuthorDate: Wed May 5 12:46:13 2021 +0900 [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions ### What changes were proposed in this pull request? To fix lambda variable name issues in nested DataFrame functions, this PR modifies code to use a global counter for `LambdaVariables` names created by higher order functions. This is the rework of #31887. Closes #31887. ### Why are the changes needed? This moves away from the current hard-coded variable names which break on nested function calls. There is currently a bug where nested transforms in particular fail (the inner variable shadows the outer variable) For this query: ``` val df = Seq( (Seq(1,2,3), Seq("a", "b", "c")) ).toDF("numbers", "letters") df.select( f.flatten( f.transform( $"numbers", (number: Column) => { f.transform( $"letters", (letter: Column) => { f.struct( number.as("number"), letter.as("letter") ) } ) } ) ).as("zipped") ).show(10, false) ``` This is the current (incorrect) output: ``` ++ |zipped | ++ |[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]| ++ ``` And this is the correct output after fix: ``` ++ |zipped | ++ |[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]| ++ ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added the new test in `DataFrameFunctionsSuite`. Closes #32424 from maropu/pr31887. Lead-authored-by: dsolow Co-authored-by: Takeshi Yamamuro Co-authored-by: dmsolow Signed-off-by: Takeshi Yamamuro (cherry picked from commit f550e03b96638de93381734c4eada2ace02d9a4f) Signed-off-by: Takeshi Yamamuro --- .../expressions/higherOrderFunctions.scala | 12 ++- .../scala/org/apache/spark/sql/functions.scala | 12 +-- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 23 ++ 3 files changed, 40 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala index ba447ea..a4e069d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql.catalyst.expressions import java.util.Comparator -import java.util.concurrent.atomic.AtomicReference +import java.util.concurrent.atomic.{AtomicInteger, AtomicReference} import scala.collection.mutable @@ -52,6 +52,16 @@ case class UnresolvedNamedLambdaVariable(nameParts: Seq[String]) override def sql: String = name } +object UnresolvedNamedLambdaVariable { + + // Counter to ensure lambda variable names are unique + private val nextVarNameId = new AtomicInteger(0) + + def freshVarName(name: String): String = { +s"${name}_${nextVarNameId.getAndIncrement()}" + } +} + /** * A named lambda variable. */ diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index e6b41cd..6bc49b6 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -3644,22 +3644,22 @@ object functions { } private def createLambda(f: Column
[spark] branch master updated (7fd3f8f -> f550e03)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fd3f8f [SPARK-35294][SQL] Add tree traversal pruning in rules with dedicated files under optimizer add f550e03 [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions No new revisions were added by this update. Summary of changes: .../expressions/higherOrderFunctions.scala | 12 ++- .../scala/org/apache/spark/sql/functions.scala | 12 +-- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 23 ++ 3 files changed, 40 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (caa46ce -> cd689c9)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from caa46ce [SPARK-35112][SQL] Support Cast string to day-second interval add cd689c9 [SPARK-35192][SQL][TESTS] Port minimal TPC-DS datagen code from databricks/spark-sql-perf No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 31 +- .../scala/org/apache/spark/sql/GenTPCDSData.scala | 445 + .../scala/org/apache/spark/sql/TPCDSBase.scala | 537 + .../sql/{TPCDSBase.scala => TPCDSSchema.scala} | 92 +--- 4 files changed, 466 insertions(+), 639 deletions(-) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala copy sql/core/src/test/scala/org/apache/spark/sql/{TPCDSBase.scala => TPCDSSchema.scala} (83%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (86d3bb5 -> 403e479)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 86d3bb5 [SPARK-34981][SQL] Implement V2 function resolution and evaluation add 403e479 [SPARK-35244][SQL][FOLLOWUP] Add null check for the exception cause No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/expressions/objects/objects.scala| 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (a556bc8 -> c6659e6)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from a556bc8 [SPARK-33976][SQL][DOCS][3.0] Add a SQL doc page for a TRANSFORM clause add c6659e6 [SPARK-35159][SQL][DOCS][3.0] Extract hive format doc No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 52 +-- docs/sql-ref-syntax-hive-format.md | 73 ++ docs/sql-ref-syntax-qry-select-transform.md| 48 +- 3 files changed, 77 insertions(+), 96 deletions(-) create mode 100644 docs/sql-ref-syntax-hive-format.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (361e684 -> db8204e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 361e684 [SPARK-33976][SQL][DOCS][3.1] Add a SQL doc page for a TRANSFORM clause add db8204e [SPARK-35159][SQL][DOCS][3.1] Extract hive format doc No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 52 +-- docs/sql-ref-syntax-hive-format.md | 73 ++ docs/sql-ref-syntax-qry-select-transform.md| 48 +- 3 files changed, 77 insertions(+), 96 deletions(-) create mode 100644 docs/sql-ref-syntax-hive-format.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (26a5e33 -> 8b62c29)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 26a5e33 [SPARK-33976][SQL][DOCS][FOLLOWUP] Fix syntax error in select doc page add 8b62c29 [SPARK-35214][SQL] OptimizeSkewedJoin support ShuffledHashJoinExec No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 4 +- .../execution/adaptive/OptimizeSkewedJoin.scala| 189 - .../execution/exchange/EnsureRequirements.scala| 9 +- .../sql/execution/joins/ShuffledHashJoinExec.scala | 3 +- .../spark/sql/execution/joins/ShuffledJoin.scala | 18 +- .../sql/execution/joins/SortMergeJoinExec.scala| 17 -- .../adaptive/AdaptiveQueryExecSuite.scala | 130 +++--- 7 files changed, 204 insertions(+), 166 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (6e83789b -> a556bc8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 6e83789b [SPARK-35244][SQL] Invoke should throw the original exception add a556bc8 [SPARK-33976][SQL][DOCS][3.0] Add a SQL doc page for a TRANSFORM clause No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml| 2 + docs/sql-ref-syntax-qry-select-transform.md | 235 docs/sql-ref-syntax-qry-select.md | 7 +- docs/sql-ref-syntax-qry.md | 1 + docs/sql-ref-syntax.md | 1 + 5 files changed, 245 insertions(+), 1 deletion(-) create mode 100644 docs/sql-ref-syntax-qry-select-transform.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (e58055b -> 361e684)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from e58055b [SPARK-35244][SQL] Invoke should throw the original exception add 361e684 [SPARK-33976][SQL][DOCS][3.1] Add a SQL doc page for a TRANSFORM clause No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml| 2 + docs/sql-ref-syntax-qry-select-transform.md | 235 docs/sql-ref-syntax-qry-select.md | 7 +- docs/sql-ref-syntax-qry.md | 1 + docs/sql-ref-syntax.md | 1 + 5 files changed, 245 insertions(+), 1 deletion(-) create mode 100644 docs/sql-ref-syntax-qry-select-transform.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-33976][SQL][DOCS][FOLLOWUP] Fix syntax error in select doc page
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 26a5e33 [SPARK-33976][SQL][DOCS][FOLLOWUP] Fix syntax error in select doc page 26a5e33 is described below commit 26a5e339a61ab06fb2949166db705f1b575addd3 Author: Angerszh AuthorDate: Wed Apr 28 16:47:02 2021 +0900 [SPARK-33976][SQL][DOCS][FOLLOWUP] Fix syntax error in select doc page ### What changes were proposed in this pull request? Add doc about `TRANSFORM` and related function. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #32257 from AngersZh/SPARK-33976-followup. Authored-by: Angerszh Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index 62a7f5f..500eda1 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -41,7 +41,7 @@ select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_stat While `select_statement` is defined as ```sql -SELECT [ hints , ... ] [ ALL | DISTINCT ] { [[ named_expression | regex_column_names ] [ , ... ] | TRANSFORM (...)) ] } +SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ [ named_expression | regex_column_names ] [ , ... ] | TRANSFORM (...) ] } FROM { from_item [ , ... ] } [ PIVOT clause ] [ LATERAL VIEW clause ] [ ... ] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9af338c -> e503b9c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9af338c [SPARK-35078][SQL] Add tree traversal pruning in expression rules add e503b9c [SPARK-35201][SQL] Format empty grouping set exception in CUBE/ROLLUP No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 6 ++ .../main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala | 3 +++ 2 files changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (034ba76 -> 5f48abe)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 034ba76 [SPARK-35080][SQL] Only allow a subset of correlated equality predicates when a subquery is aggregated add 5f48abe [SPARK-34639][SQL][3.1] RelationalGroupedDataset.alias should not create UnresolvedAlias No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala | 6 +- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala| 3 +++ 2 files changed, 4 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (978cd0b -> fd08c93)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 978cd0b [SPARK-35092][UI] the auto-generated rdd's name in the storage tab should be truncated if it is too long add fd08c93 [SPARK-35109][SQL] Fix minor exception messages of HashedRelation and HashJoin No new revisions were added by this update. Summary of changes: .../apache/spark/sql/errors/QueryExecutionErrors.scala | 18 ++ .../spark/sql/execution/joins/HashedRelation.scala | 6 ++ 2 files changed, 8 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12abfe7 -> 074f770)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12abfe7 [SPARK-34716][SQL] Support ANSI SQL intervals by the aggregate function `sum` add 074f770 [SPARK-35115][SQL][TESTS] Check ANSI intervals in `MutableProjectionSuite` No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/MutableProjectionSuite.scala | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (26f312e -> caf33be)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 26f312e [SPARK-35037][SQL] Recognize sign before the interval string in literals add caf33be [SPARK-33411][SQL] Cardinality estimation of union, sort and range operator No new revisions were added by this update. Summary of changes: .../plans/logical/LogicalPlanVisitor.scala | 3 + .../plans/logical/basicLogicalOperators.scala | 22 ++- .../statsEstimation/BasicStatsPlanVisitor.scala| 12 +- .../SizeInBytesOnlyStatsPlanVisitor.scala | 2 + .../logical/statsEstimation/UnionEstimation.scala | 120 + .../BasicStatsEstimationSuite.scala| 136 +-- .../statsEstimation/UnionEstimationSuite.scala | 194 + .../spark/sql/StatisticsCollectionSuite.scala | 4 +- 8 files changed, 473 insertions(+), 20 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/UnionEstimation.scala create mode 100644 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/UnionEstimationSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9c1f807 -> 278203d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9c1f807 [SPARK-35031][PYTHON] Port Koalas operations on different frames tests into PySpark add 278203d [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 6 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 9 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 79 -- .../sql/catalyst/parser/PlanParserSuite.scala | 18 +- .../test/resources/sql-tests/inputs/transform.sql | 132 + .../resources/sql-tests/results/transform.sql.out | 316 - .../spark/sql/execution/SparkSqlParserSuite.scala | 164 +-- .../sql/execution/command/DDLParserSuite.scala | 14 +- 8 files changed, 662 insertions(+), 76 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b9ee41f [SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO b9ee41f is described below commit b9ee41fa9957631ca0f859ee928358c108fbd9a9 Author: Tanel Kiis AuthorDate: Thu Apr 8 11:03:59 2021 +0900 [SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO ### What changes were proposed in this pull request? Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes. ### Why are the changes needed? In #30965 we changed to CBO cost comparison function so it would be "symetric": `A.betterThan(B)` now implies, that `!B.betterThan(A)`. With that we caused a performance regressions in some queries - TPCDS q19 for example. The original cost comparison function used the ratios `relativeRows = A.rowCount / B.rowCount` and `relativeSize = A.size / B.size`. The changed function compared "absolute" cost values `costA = w*A.rowCount + (1-w)*A.size` and `costB = w*B.rowCount + (1-w)*B.size`. Given the input from wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios. Originally `A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1` was used. Besides being "non-symteric", this also can exhibit one overwhelming other. For `w=0.5` If `A` size (bytes) is at least 2x larger than `B`, then no matter how many times more rows does the `B` plan have, `B` will allways be considered to be better - `0.5*2 + 0.5*0.01 > 1`. When working with ratios, then it would be better to multiply them. The proposed cost comparison function is: `A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1`. ### Does this PR introduce _any_ user-facing change? Comparison of the changed TPCDS v1.4 query execution times at sf=10: Â | absolute | multiplicative | Â | additive | Â -- | -- | -- | -- | -- | -- q12 | 145 | 137 | -5.52% | 141 | -2.76% q13 | 264 | 271 | 2.65% | 271 | 2.65% q17 | 4521 | 4243 | -6.15% | 4348 | -3.83% q18 | 758 | 466 | -38.52% | 480 | -36.68% q19 | 38503 | 2167 | -94.37% | 2176 | -94.35% q20 | 119 | 120 | 0.84% | 126 | 5.88% q24a | 16429 | 16838 | 2.49% | 17103 | 4.10% q24b | 16592 | 16999 | 2.45% | 17268 | 4.07% q25 | 3558 | 3556 | -0.06% | 3675 | 3.29% q33 | 362 | 361 | -0.28% | 380 | 4.97% q52 | 1020 | 1032 | 1.18% | 1052 | 3.14% q55 | 927 | 938 | 1.19% | 961 | 3.67% q72 | 24169 | 13377 | -44.65% | 24306 | 0.57% q81 | 1285 | 1185 | -7.78% | 1168 | -9.11% q91 | 324 | 336 | 3.70% | 337 | 4.01% q98 | 126 | 129 | 2.38% | 131 | 3.97% All times are in ms, the change is compared to the situation in the master branch (absolute). The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved. ### How was this patch tested? PlanStabilitySuite Closes #32076 from tanelk/SPARK-34922_cbo_better_cost_function_3.0. Lead-authored-by: Tanel Kiis Co-authored-by: tanel.k...@gmail.com Signed-off-by: Takeshi Yamamuro --- .../catalyst/optimizer/CostBasedJoinReorder.scala | 28 ++ .../org/apache/spark/sql/internal/SQLConf.scala| 6 +++-- .../sql/catalyst/optimizer/JoinReorderSuite.scala | 3 --- .../optimizer/StarJoinCostBasedReorderSuite.scala | 9 +++ 4 files changed, 32 insertions(+), 14 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala index 93c608dc..ed7d92e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala @@ -343,12 +343,30 @@ object JoinReorderDP extends PredicateHelper with Logging { } } +/** + * To identify the plan with smaller computational cost, + * we use the weighted geometric mean of ratio of rows and the ratio of sizes in bytes. + * + * There are other ways to combine these values as a cost comparison function. + * Some of these, that we have experimented with, but have gotten worse result, + * than with the current one: + * 1) Weighted ar
[spark] branch branch-3.1 updated (f6b5c6f -> 84d96e8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from f6b5c6f [SPARK-34970][SQL][SERCURITY][3.1] Redact map-type options in the output of explain() add 84d96e8 [SPARK-34922][SQL][3.1] Use a relative cost comparison function in the CBO No new revisions were added by this update. Summary of changes: .../catalyst/optimizer/CostBasedJoinReorder.scala | 28 +- .../org/apache/spark/sql/internal/SQLConf.scala| 6 +- .../optimizer/joinReorder/JoinReorderSuite.scala | 3 - .../StarJoinCostBasedReorderSuite.scala| 9 +- .../approved-plans-modified/q73.sf100/explain.txt | 8 +- .../approved-plans-v1_4/q12.sf100/explain.txt | 174 ++--- .../approved-plans-v1_4/q12.sf100/simplified.txt | 52 +- .../approved-plans-v1_4/q13.sf100/explain.txt | 138 ++-- .../approved-plans-v1_4/q13.sf100/simplified.txt | 34 +- .../approved-plans-v1_4/q18.sf100/explain.txt | 303 .../approved-plans-v1_4/q18.sf100/simplified.txt | 50 +- .../approved-plans-v1_4/q19.sf100/explain.txt | 368 - .../approved-plans-v1_4/q19.sf100/simplified.txt | 116 +-- .../approved-plans-v1_4/q20.sf100/explain.txt | 174 ++--- .../approved-plans-v1_4/q20.sf100/simplified.txt | 52 +- .../approved-plans-v1_4/q24a.sf100/explain.txt | 832 +++-- .../approved-plans-v1_4/q24a.sf100/simplified.txt | 34 +- .../approved-plans-v1_4/q24b.sf100/explain.txt | 832 +++-- .../approved-plans-v1_4/q24b.sf100/simplified.txt | 34 +- .../approved-plans-v1_4/q25.sf100/explain.txt | 186 ++--- .../approved-plans-v1_4/q25.sf100/simplified.txt | 130 ++-- .../approved-plans-v1_4/q33.sf100/explain.txt | 395 +- .../approved-plans-v1_4/q33.sf100/simplified.txt | 58 +- .../approved-plans-v1_4/q52.sf100/explain.txt | 138 ++-- .../approved-plans-v1_4/q52.sf100/simplified.txt | 26 +- .../approved-plans-v1_4/q55.sf100/explain.txt | 134 ++-- .../approved-plans-v1_4/q55.sf100/simplified.txt | 26 +- .../approved-plans-v1_4/q72.sf100/explain.txt | 260 +++ .../approved-plans-v1_4/q72.sf100/simplified.txt | 150 ++-- .../approved-plans-v1_4/q81.sf100/explain.txt | 570 +++--- .../approved-plans-v1_4/q81.sf100/simplified.txt | 142 ++-- .../approved-plans-v1_4/q91.sf100/explain.txt | 304 .../approved-plans-v1_4/q91.sf100/simplified.txt | 62 +- .../approved-plans-v1_4/q98.sf100/explain.txt | 182 ++--- .../approved-plans-v1_4/q98.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q12.sf100/explain.txt | 174 ++--- .../approved-plans-v2_7/q12.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q18a.sf100/explain.txt | 737 +- .../approved-plans-v2_7/q18a.sf100/simplified.txt | 54 +- .../approved-plans-v2_7/q20.sf100/explain.txt | 174 ++--- .../approved-plans-v2_7/q20.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q72.sf100/explain.txt | 260 +++ .../approved-plans-v2_7/q72.sf100/simplified.txt | 150 ++-- .../approved-plans-v2_7/q98.sf100/explain.txt | 178 ++--- .../approved-plans-v2_7/q98.sf100/simplified.txt | 52 +- 45 files changed, 4024 insertions(+), 3921 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (390d5bd -> 7c8dc5e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 390d5bd [SPARK-34968][TEST][PYTHON] Add the `-fr` argument to xargs rm add 7c8dc5e [SPARK-34922][SQL] Use a relative cost comparison function in the CBO No new revisions were added by this update. Summary of changes: .../catalyst/optimizer/CostBasedJoinReorder.scala | 28 +- .../org/apache/spark/sql/internal/SQLConf.scala| 6 +- .../optimizer/joinReorder/JoinReorderSuite.scala | 3 - .../StarJoinCostBasedReorderSuite.scala| 9 +- .../approved-plans-modified/q73.sf100/explain.txt | 86 +-- .../q73.sf100/simplified.txt | 20 +- .../approved-plans-v1_4/q12.sf100/explain.txt | 178 +++ .../approved-plans-v1_4/q12.sf100/simplified.txt | 52 +- .../approved-plans-v1_4/q13.sf100/explain.txt | 134 ++--- .../approved-plans-v1_4/q13.sf100/simplified.txt | 38 +- .../approved-plans-v1_4/q18.sf100/explain.txt | 152 +++--- .../approved-plans-v1_4/q18.sf100/simplified.txt | 50 +- .../approved-plans-v1_4/q19.sf100/explain.txt | 376 ++--- .../approved-plans-v1_4/q19.sf100/simplified.txt | 118 ++--- .../approved-plans-v1_4/q20.sf100/explain.txt | 178 +++ .../approved-plans-v1_4/q20.sf100/simplified.txt | 52 +- .../approved-plans-v1_4/q24a.sf100/explain.txt | 116 ++-- .../approved-plans-v1_4/q24a.sf100/simplified.txt | 34 +- .../approved-plans-v1_4/q24b.sf100/explain.txt | 116 ++-- .../approved-plans-v1_4/q24b.sf100/simplified.txt | 34 +- .../approved-plans-v1_4/q25.sf100/explain.txt | 192 +++ .../approved-plans-v1_4/q25.sf100/simplified.txt | 138 ++--- .../approved-plans-v1_4/q33.sf100/explain.txt | 264 +- .../approved-plans-v1_4/q33.sf100/simplified.txt | 58 +- .../approved-plans-v1_4/q52.sf100/explain.txt | 146 +++--- .../approved-plans-v1_4/q52.sf100/simplified.txt | 30 +- .../approved-plans-v1_4/q55.sf100/explain.txt | 142 ++--- .../approved-plans-v1_4/q55.sf100/simplified.txt | 30 +- .../approved-plans-v1_4/q72.sf100/explain.txt | 326 ++-- .../approved-plans-v1_4/q72.sf100/simplified.txt | 154 +++--- .../approved-plans-v1_4/q81.sf100/explain.txt | 582 ++--- .../approved-plans-v1_4/q81.sf100/simplified.txt | 146 +++--- .../approved-plans-v1_4/q91.sf100/explain.txt | 312 +-- .../approved-plans-v1_4/q91.sf100/simplified.txt | 66 +-- .../approved-plans-v1_4/q98.sf100/explain.txt | 186 +++ .../approved-plans-v1_4/q98.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q12.sf100/explain.txt | 178 +++ .../approved-plans-v2_7/q12.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q18a.sf100/explain.txt | 172 +++--- .../approved-plans-v2_7/q18a.sf100/simplified.txt | 54 +- .../approved-plans-v2_7/q20.sf100/explain.txt | 178 +++ .../approved-plans-v2_7/q20.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q72.sf100/explain.txt | 326 ++-- .../approved-plans-v2_7/q72.sf100/simplified.txt | 154 +++--- .../approved-plans-v2_7/q98.sf100/explain.txt | 182 +++ .../approved-plans-v2_7/q98.sf100/simplified.txt | 52 +- 46 files changed, 3011 insertions(+), 2993 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39d5677 -> 7cfface)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39d5677 [SPARK-34932][SQL] deprecate GROUP BY ... GROUPING SETS (...) and promote GROUP BY GROUPING SETS (...) add 7cfface [SPARK-34935][SQL] CREATE TABLE LIKE should respect the reserved table properties No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 2 ++ .../scala/org/apache/spark/sql/execution/SparkSqlParser.scala | 3 ++- .../org/apache/spark/sql/execution/SparkSqlParserSuite.scala | 8 3 files changed, 12 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a9ca197 -> 39d5677)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a9ca197 [SPARK-34949][CORE] Prevent BlockManager reregister when Executor is shutting down add 39d5677 [SPARK-34932][SQL] deprecate GROUP BY ... GROUPING SETS (...) and promote GROUP BY GROUPING SETS (...) No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-groupby.md | 34 +++ .../spark/sql/catalyst/analysis/Analyzer.scala | 36 +++ .../spark/sql/catalyst/expressions/grouping.scala | 46 --- .../spark/sql/catalyst/parser/AstBuilder.scala | 13 +++--- .../analysis/ResolveGroupingAnalyticsSuite.scala | 51 +- .../sql/catalyst/parser/PlanParserSuite.scala | 2 +- 6 files changed, 72 insertions(+), 110 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3951e33 -> 90f2d4d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3951e33 [SPARK-34881][SQL] New SQL Function: TRY_CAST add 90f2d4d [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates No new revisions were added by this update. Summary of changes: .../optimizer/RewriteDistinctAggregates.scala | 47 ++ .../org/apache/spark/sql/DataFrameSuite.scala | 29 - 2 files changed, 49 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b2bfe98 -> fcef237)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b2bfe98 [SPARK-34845][CORE] ProcfsMetricsGetter shouldn't return partial procfs metrics add fcef237 [SPARK-34622][SQL] Push down limit through Project with Join No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/Optimizer.scala | 33 +- .../catalyst/optimizer/LimitPushdownSuite.scala| 9 ++ 2 files changed, 29 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34833][SQL][FOLLOWUP] Handle outer references in all the places
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new f3c1298 [SPARK-34833][SQL][FOLLOWUP] Handle outer references in all the places f3c1298 is described below commit f3c129827986ba06c8a9ab00bd687e8d025103d1 Author: Wenchen Fan AuthorDate: Fri Mar 26 09:10:03 2021 +0900 [SPARK-34833][SQL][FOLLOWUP] Handle outer references in all the places ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/31940 . This PR generalizes the matching of attributes and outer references, so that outer references are handled everywhere. Note that, currently correlated subquery has a lot of limitations in Spark, and the newly covered cases are not possible to happen. So this PR is a code refactor. ### Why are the changes needed? code cleanup ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #31959 from cloud-fan/follow. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro (cherry picked from commit 658e95c345d5aa2a98b8d2a854e003a5c77ed581) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/analysis/Analyzer.scala | 67 +- 1 file changed, 41 insertions(+), 26 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index d490845..600a5af 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -3919,6 +3919,14 @@ object UpdateOuterReferences extends Rule[LogicalPlan] { */ object ApplyCharTypePadding extends Rule[LogicalPlan] { + object AttrOrOuterRef { +def unapply(e: Expression): Option[Attribute] = e match { + case a: Attribute => Some(a) + case OuterReference(a: Attribute) => Some(a) + case _ => None +} + } + override def apply(plan: LogicalPlan): LogicalPlan = { plan.resolveOperatorsUp { case operator => operator.transformExpressionsUp { @@ -3926,27 +3934,17 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { // String literal is treated as char type when it's compared to a char type column. // We should pad the shorter one to the longer length. -case b @ BinaryComparison(attr: Attribute, lit) if lit.foldable => - padAttrLitCmp(attr, attr.metadata, lit).map { newChildren => -b.withNewChildren(newChildren) - }.getOrElse(b) - -case b @ BinaryComparison(lit, attr: Attribute) if lit.foldable => - padAttrLitCmp(attr, attr.metadata, lit).map { newChildren => -b.withNewChildren(newChildren.reverse) - }.getOrElse(b) - -case b @ BinaryComparison(or @ OuterReference(attr: Attribute), lit) if lit.foldable => - padAttrLitCmp(or, attr.metadata, lit).map { newChildren => +case b @ BinaryComparison(e @ AttrOrOuterRef(attr), lit) if lit.foldable => + padAttrLitCmp(e, attr.metadata, lit).map { newChildren => b.withNewChildren(newChildren) }.getOrElse(b) -case b @ BinaryComparison(lit, or @ OuterReference(attr: Attribute)) if lit.foldable => - padAttrLitCmp(or, attr.metadata, lit).map { newChildren => +case b @ BinaryComparison(lit, e @ AttrOrOuterRef(attr)) if lit.foldable => + padAttrLitCmp(e, attr.metadata, lit).map { newChildren => b.withNewChildren(newChildren.reverse) }.getOrElse(b) -case i @ In(attr: Attribute, list) +case i @ In(e @ AttrOrOuterRef(attr), list) if attr.dataType == StringType && list.forall(_.foldable) => CharVarcharUtils.getRawType(attr.metadata).flatMap { case CharType(length) => @@ -3955,7 +3953,7 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { val literalCharLengths = literalChars.map(_.numChars()) val targetLen = (length +: literalCharLengths).max Some(i.copy( -value = addPadding(attr, length, targetLen), +value = addPadding(e, length, targetLen), list = list.zip(literalCharLengths).map { case (lit, charLength) => addPadding(lit, charLength, targetLen) } ++ nulls.map(Literal.create(_, StringType @@ -3963,19 +3961,36 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { }.getOrElse(i) // For char type colum
[spark] branch master updated (6d88212 -> 658e95c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6d88212 [SPARK-34840][SHUFFLE] Fixes cases of corruption in merged shuffle … add 658e95c [SPARK-34833][SQL][FOLLOWUP] Handle outer references in all the places No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 67 +- 1 file changed, 41 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 5ecf306 [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries 5ecf306 is described below commit 5ecf306245d17053e25b68c844828878a66b593a Author: Takeshi Yamamuro AuthorDate: Thu Mar 25 08:31:57 2021 +0900 [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries ### What changes were proposed in this pull request? This PR intends to fix the bug that does not apply right-padding for char types inside correlated subquries. For example, a query below returns nothing in master, but a correct result is `c`. ``` scala> sql(s"CREATE TABLE t1(v VARCHAR(3), c CHAR(5)) USING parquet") scala> sql(s"CREATE TABLE t2(v VARCHAR(5), c CHAR(7)) USING parquet") scala> sql("INSERT INTO t1 VALUES ('c', 'b')") scala> sql("INSERT INTO t2 VALUES ('a', 'b')") scala> val df = sql(""" |SELECT v FROM t1 |WHERE 'a' IN (SELECT v FROM t2 WHERE t2.c = t1.c )""".stripMargin) scala> df.show() +---+ | v| +---+ +---+ ``` This is because `ApplyCharTypePadding` does not handle the case above to apply right-padding into `'abc'`. This PR modifies the code in `ApplyCharTypePadding` for handling it correctly. ``` // Before this PR: scala> df.explain(true) == Analyzed Logical Plan == v: string Project [v#13] +- Filter a IN (list#12 [c#14]) : +- Project [v#15] : +- Filter (c#16 = outer(c#14)) :+- SubqueryAlias spark_catalog.default.t2 : +- Relation default.t2[v#15,c#16] parquet +- SubqueryAlias spark_catalog.default.t1 +- Relation default.t1[v#13,c#14] parquet scala> df.show() +---+ | v| +---+ +---+ // After this PR: scala> df.explain(true) == Analyzed Logical Plan == v: string Project [v#43] +- Filter a IN (list#42 [c#44]) : +- Project [v#45] : +- Filter (c#46 = rpad(outer(c#44), 7, )) :+- SubqueryAlias spark_catalog.default.t2 : +- Relation default.t2[v#45,c#46] parquet +- SubqueryAlias spark_catalog.default.t1 +- Relation default.t1[v#43,c#44] parquet scala> df.show() +---+ | v| +---+ | c| +---+ ``` This fix is lated to TPCDS q17; the query returns nothing because of this bug: https://github.com/apache/spark/pull/31886/files#r599333799 ### Why are the changes needed? Bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests added. Closes #31940 from maropu/FixCharPadding. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit 150769bcedb6e4a97596e0f04d686482cd09e92a) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/analysis/Analyzer.scala | 45 ++--- .../apache/spark/sql/CharVarcharTestSuite.scala| 57 -- 2 files changed, 79 insertions(+), 23 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index f4cdeab..d490845 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -3921,16 +3921,28 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = { plan.resolveOperatorsUp { - case operator if operator.resolved => operator.transformExpressionsUp { + case operator => operator.transformExpressionsUp { +case e if !e.childrenResolved => e + // String literal is treated as char type when it's compared to a char type column. // We should pad the shorter one to the longer length. case b @ BinaryComparison(attr: Attribute, lit) if lit.foldable => - padAttrLitCmp(attr, lit).map { newChildren => + padAttrLitCmp(attr, attr.metadata, lit).map { newChildren => b.withNewChildren(newChildren) }.getOrElse(b) case b @ BinaryComparison(lit, attr: Attribute) if lit.foldable => - padAttrLitCmp(attr, lit).map { newChildren => + padAttrLitCmp(attr, attr.metadata, lit).map { newChildren => +b.withNewChildren(newChildren.reverse)
[spark] branch master updated (88cf86f -> 150769b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88cf86f [SPARK-34797][ML] Refactor Logistic Aggregator - support virtual centering add 150769b [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 45 ++--- .../apache/spark/sql/CharVarcharTestSuite.scala| 57 -- 2 files changed, 79 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 35c70e4 [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator 35c70e4 is described below commit 35c70e417d8c6e3958e0da8a4bec731f9e394a28 Author: Cheng Su AuthorDate: Wed Mar 24 23:06:35 2021 +0900 [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator ### What changes were proposed in this pull request? Both local limit and global limit define the output partitioning and output ordering in the same way and this is duplicated (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala#L159-L175 ). We can move the output partitioning and ordering into their parent trait - `BaseLimitExec`. This is doable as `BaseLimitExec` has no more other child class. This is a minor code refactoring. ### Why are the changes needed? Clean up the code a little bit. Better readability. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pure refactoring. Rely on existing unit tests. Closes #31950 from c21/limit-cleanup. Authored-by: Cheng Su Signed-off-by: Takeshi Yamamuro --- .../main/scala/org/apache/spark/sql/execution/limit.scala | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala index d8f67fb..e5a2995 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala @@ -113,6 +113,10 @@ object BaseLimitExec { trait BaseLimitExec extends LimitExec with CodegenSupport { override def output: Seq[Attribute] = child.output + override def outputPartitioning: Partitioning = child.outputPartitioning + + override def outputOrdering: Seq[SortOrder] = child.outputOrdering + protected override def doExecute(): RDD[InternalRow] = child.execute().mapPartitions { iter => iter.take(limit) } @@ -156,12 +160,7 @@ trait BaseLimitExec extends LimitExec with CodegenSupport { /** * Take the first `limit` elements of each child partition, but do not collect or shuffle them. */ -case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { - - override def outputOrdering: Seq[SortOrder] = child.outputOrdering - - override def outputPartitioning: Partitioning = child.outputPartitioning -} +case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec /** * Take the first `limit` elements of the child's single output partition. @@ -169,10 +168,6 @@ case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { override def requiredChildDistribution: List[Distribution] = AllTuples :: Nil - - override def outputPartitioning: Partitioning = child.outputPartitioning - - override def outputOrdering: Seq[SortOrder] = child.outputOrdering } /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (da013d0 -> 250c820)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from da013d0 [MINOR][DOCS][ML] Doc 'mode' as a supported Imputer strategy in Pyspark add 250c820 [SPARK-34796][SQL][3.1] Initialize counter variable for LIMIT code-gen in doProduce() No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/limit.scala | 12 .../scala/org/apache/spark/sql/SQLQuerySuite.scala| 19 +++ 2 files changed, 27 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-34776][SQL][3.0][2.4] Window class should override producedAttributes
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 59e4ae4 [SPARK-34776][SQL][3.0][2.4] Window class should override producedAttributes 59e4ae4 is described below commit 59e4ae4149ff93bd64c8b3210c27dc2fbebe2a96 Author: Liang-Chi Hsieh AuthorDate: Sat Mar 20 11:26:01 2021 +0900 [SPARK-34776][SQL][3.0][2.4] Window class should override producedAttributes ### What changes were proposed in this pull request? This patch proposes to override `producedAttributes` of `Window` class. ### Why are the changes needed? This is a backport of #31897 to branch-3.0/2.4. Unlike original PR, nested column pruning does not allow pushing through `Window` in branch-3.0/2.4 yet. But `Window` doesn't override `producedAttributes`. It's wrong and could cause potential issue. So backport `Window` related change. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. Closes #31904 from viirya/SPARK-34776-3.0. Authored-by: Liang-Chi Hsieh Signed-off-by: Takeshi Yamamuro (cherry picked from commit 828cf76bced1b70769b0453f3e9ba95faaa84e39) Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index a0086c1..2fe9cd4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -621,6 +621,8 @@ case class Window( override def output: Seq[Attribute] = child.output ++ windowExpressions.map(_.toAttribute) + override def producedAttributes: AttributeSet = windowOutputSet + def windowOutputSet: AttributeSet = AttributeSet(windowExpressions.map(_.toAttribute)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (25d7219 -> 828cf76)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 25d7219 [SPARK-34719][SQL][3.0] Correctly resolve the view query with duplicated column names add 828cf76 [SPARK-34776][SQL][3.0][2.4] Window class should override producedAttributes No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (620cae0 -> 2ff0032)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 620cae0 [SPARK-33122][SQL] Remove redundant aggregates in the Optimzier add 2ff0032 [SPARK-34796][SQL] Initialize counter variable for LIMIT code-gen in doProduce() No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/limit.scala | 12 .../scala/org/apache/spark/sql/SQLQuerySuite.scala| 19 +++ 2 files changed, 27 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7a8a600 -> 620cae0)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7a8a600 [SPARK-34776][SQL] Nested column pruning should not prune Window produced attributes add 620cae0 [SPARK-33122][SQL] Remove redundant aggregates in the Optimzier No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 50 --- .../analysis/PullOutNondeterministic.scala | 74 ++ .../spark/sql/catalyst/optimizer/Optimizer.scala | 45 ++ .../plans/logical/basicLogicalOperators.scala | 2 +- .../optimizer/RemoveRedundantAggregatesSuite.scala | 163 + .../execution/RemoveRedundantProjectsSuite.scala | 2 +- 6 files changed, 284 insertions(+), 52 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala create mode 100644 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregatesSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34719][SQL][3.0] Correctly resolve the view query with duplicated column names
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 25d7219 [SPARK-34719][SQL][3.0] Correctly resolve the view query with duplicated column names 25d7219 is described below commit 25d72191de7c842aa2acd4b7307ba8e6585dd182 Author: Wenchen Fan AuthorDate: Sat Mar 20 11:09:50 2021 +0900 [SPARK-34719][SQL][3.0] Correctly resolve the view query with duplicated column names backport https://github.com/apache/spark/pull/31811 to 3.0 ### What changes were proposed in this pull request? For permanent views (and the new SQL temp view in Spark 3.1), we store the view SQL text and re-parse/analyze the view SQL text when reading the view. In the case of `SELECT * FROM ...`, we want to avoid view schema change (e.g. the referenced table changes its schema) and will record the view query output column names when creating the view, so that when reading the view we can add a `SELECT recorded_column_names FROM ...` to retain the original view query schema. In Spark 3.1 and before, the final SELECT is added after the analysis phase: https://github.com/apache/spark/blob/branch-3.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala#L67 If the view query has duplicated output column names, we always pick the first column when reading a view. A simple repro: ``` scala> sql("create view c(x, y) as select 1 a, 2 a") res0: org.apache.spark.sql.DataFrame = [] scala> sql("select * from c").show +---+---+ | x| y| +---+---+ | 1| 1| +---+---+ ``` In the master branch, we will fail at the view reading time due to https://github.com/apache/spark/commit/b891862fb6b740b103d5a09530626ee4e0e8f6e3 , which adds the final SELECT during analysis, so that the query fails with `Reference 'a' is ambiguous` This PR proposes to resolve the view query output column names from the matching attributes by ordinal. For example, `create view c(x, y) as select 1 a, 2 a`, the view query output column names are `[a, a]`. When we reading the view, there are 2 matching attributes (e.g.`[a#1, a#2]`) and we can simply match them by ordinal. A negative example is ``` create table t(a int) create view v as select *, 1 as col from t replace table t(a int, col int) ``` When reading the view, the view query output column names are `[a, col]`, and there are two matching attributes of `col`, and we should fail the query. See the tests for details. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? new test Closes #31894 from cloud-fan/backport. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/analysis/view.scala | 44 ++--- .../apache/spark/sql/execution/SQLViewSuite.scala | 45 +- 2 files changed, 82 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala index 6560164..013a303 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala @@ -17,7 +17,10 @@ package org.apache.spark.sql.catalyst.analysis -import org.apache.spark.sql.catalyst.expressions.Alias +import java.util.Locale + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute} import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, View} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.internal.SQLConf @@ -60,15 +63,44 @@ object EliminateView extends Rule[LogicalPlan] with CastSupport { // The child has the different output attributes with the View operator. Adds a Project over // the child of the view. case v @ View(desc, output, child) if child.resolved && !v.sameOutput(child) => + // Use the stored view query output column names to find the matching attributes. The column + // names may have duplication, e.g. `CREATE VIEW v(x, y) AS SELECT 1 col, 2 col`. We need to + // make sure the that matching attributes have the same number of duplications, and pick the + // corresponding attribute by ordinal. val resolver = conf.resolver val queryColumnNames = desc.viewQueryColumnNames val queryOutput = if (queryColumnNames.nonEmpty) { -// Find the attribute that has the expected at
[spark] branch branch-3.1 updated (1b70aad -> c2629a7)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 1b70aad [SPARK-34747][SQL][DOCS] Add virtual operators to the built-in function document add c2629a7 [SPARK-34719][SQL][3.1] Correctly resolve the view query with duplicated column names No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/analysis/view.scala | 44 +--- .../spark/sql/execution/SQLViewTestSuite.scala | 48 ++ 2 files changed, 86 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34781][SQL] Eliminate LEFT SEMI/ANTI joins to its left child side in AQE
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8207e2f [SPARK-34781][SQL] Eliminate LEFT SEMI/ANTI joins to its left child side in AQE 8207e2f is described below commit 8207e2f65cc2ce2d87ee60ee05a2c1ee896cf93e Author: Cheng Su AuthorDate: Fri Mar 19 09:41:52 2021 +0900 [SPARK-34781][SQL] Eliminate LEFT SEMI/ANTI joins to its left child side in AQE ### What changes were proposed in this pull request? In `EliminateJoinToEmptyRelation.scala`, we can extend it to cover more cases for LEFT SEMI and LEFT ANI joins: * Join is left semi join, join right side is non-empty and condition is empty. Eliminate join to its left side. * Join is left anti join, join right side is empty. Eliminate join to its left side. Given we eliminate join to its left side here, renaming the current optimization rule to `EliminateUnnecessaryJoin` instead. In addition, also change to use `checkRowCount()` to check run time row count, instead of using `EmptyHashedRelation`. So this can cover `BroadcastNestedLoopJoin` as well. (`BroadcastNestedLoopJoin`'s broadcast side is `Array[InternalRow]`, not `HashedRelation`). ### Why are the changes needed? Cover more join cases, and improve query performance for affected queries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unit tests in `AdaptiveQueryExecSuite.scala`. Closes #31873 from c21/aqe-join. Authored-by: Cheng Su Signed-off-by: Takeshi Yamamuro --- .../sql/execution/adaptive/AQEOptimizer.scala | 2 +- .../adaptive/EliminateJoinToEmptyRelation.scala| 71 - .../adaptive/EliminateUnnecessaryJoin.scala| 91 ++ .../spark/sql/DynamicPartitionPruningSuite.scala | 2 +- .../adaptive/AdaptiveQueryExecSuite.scala | 51 5 files changed, 127 insertions(+), 90 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala index 04b8ade..901637d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala @@ -29,7 +29,7 @@ class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { private val defaultBatches = Seq( Batch("Demote BroadcastHashJoin", Once, DemoteBroadcastHashJoin), -Batch("Eliminate Join to Empty Relation", Once, EliminateJoinToEmptyRelation) +Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin) ) final override protected def batches: Seq[Batch] = { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala deleted file mode 100644 index d6df522..000 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala +++ /dev/null @@ -1,71 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.execution.adaptive - -import org.apache.spark.sql.catalyst.planning.ExtractSingleColumnNullAwareAntiJoin -import org.apache.spark.sql.catalyst.plans.{Inner, LeftAnti, LeftSemi} -import org.apache.spark.sql.catalyst.plans.logical.{Join, LocalRelation, LogicalPlan} -import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.execution.joins.{EmptyHashedRelation, HashedRelation, HashedRelationWithAllNullKeys} - -/** - * This optimization rule detects and converts a Join to an empty [[LocalRelation]]: - * 1. Join is single column NULL-aware anti join (NAAJ), and broadcasted [[HashedRelation]] - *is [[HashedRelationWithAllNullKeys]]. - * - * 2. Join is in
[spark] branch branch-3.1 updated: [SPARK-34749][SQL][3.1] Simplify ResolveCreateNamedStruct
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 448b8d0 [SPARK-34749][SQL][3.1] Simplify ResolveCreateNamedStruct 448b8d0 is described below commit 448b8d07df41040058c21e6102406e1656727599 Author: Wenchen Fan AuthorDate: Thu Mar 18 07:44:11 2021 +0900 [SPARK-34749][SQL][3.1] Simplify ResolveCreateNamedStruct backports https://github.com/apache/spark/pull/31843 ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/31808 and simplifies its fix to one line (excluding comments). ### Why are the changes needed? code simplification ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #31867 from cloud-fan/backport. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro --- .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 2 -- .../spark/sql/catalyst/expressions/complexTypeCreator.scala | 10 +- .../sql/catalyst/expressions/complexTypeExtractors.scala | 11 +-- .../spark/sql/catalyst/parser/ExpressionParserSuite.scala | 2 +- 4 files changed, 11 insertions(+), 14 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index f98f33b..f4cdeab 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -3840,8 +3840,6 @@ object ResolveCreateNamedStruct extends Rule[LogicalPlan] { val children = e.children.grouped(2).flatMap { case Seq(NamePlaceholder, e: NamedExpression) if e.resolved => Seq(Literal(e.name), e) -case Seq(NamePlaceholder, e: ExtractValue) if e.resolved && e.name.isDefined => - Seq(Literal(e.name.get), e) case kv => kv } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala index cb59fbd..1779d41 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala @@ -20,7 +20,7 @@ package org.apache.spark.sql.catalyst.expressions import scala.collection.mutable.ArrayBuffer import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.analysis.{Resolver, TypeCheckResult, TypeCoercion, UnresolvedExtractValue} +import org.apache.spark.sql.catalyst.analysis.{Resolver, TypeCheckResult, TypeCoercion, UnresolvedAttribute, UnresolvedExtractValue} import org.apache.spark.sql.catalyst.analysis.FunctionRegistry.{FUNC_ALIAS, FunctionBuilder} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.expressions.codegen.Block._ @@ -336,6 +336,14 @@ object CreateStruct { */ def apply(children: Seq[Expression]): CreateNamedStruct = { CreateNamedStruct(children.zipWithIndex.flatMap { + // For multi-part column name like `struct(a.b.c)`, it may be resolved into: + // 1. Attribute if `a.b.c` is simply a qualified column name. + // 2. GetStructField if `a.b` refers to a struct-type column. + // 3. GetArrayStructFields if `a.b` refers to a array-of-struct-type column. + // 4. GetMapValue if `a.b` refers to a map-type column. + // We should always use the last part of the column name (`c` in the above example) as the + // alias name inside CreateNamedStruct. + case (u: UnresolvedAttribute, _) => Seq(Literal(u.nameParts.last), u) case (e: NamedExpression, _) if e.resolved => Seq(Literal(e.name), e) case (e: NamedExpression, _) => Seq(NamePlaceholder, e) case (e, index) => Seq(Literal(s"col${index + 1}"), e) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala index 9b80140..ef247ef 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala @@ -94,10 +94,7 @@ object ExtractValue { } } -trait ExtractValue extends Expression { - // The name that is used to extract the value. - def name: Option[String] -}
[spark] branch master updated (bf4570b -> 9f7b0a0)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bf4570b [SPARK-34749][SQL] Simplify ResolveCreateNamedStruct add 9f7b0a0 [SPARK-34758][SQL] Simplify Analyzer.resolveLiteralFunction No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 29 ++ 1 file changed, 7 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (48637a9 -> bf4570b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 48637a9 [SPARK-34766][SQL] Do not capture maven config for views add bf4570b [SPARK-34749][SQL] Simplify ResolveCreateNamedStruct No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 2 -- .../sql/catalyst/expressions/complexTypeCreator.scala | 10 +- .../sql/catalyst/expressions/complexTypeExtractors.scala | 14 +- .../spark/sql/catalyst/parser/ExpressionParserSuite.scala | 2 +- 4 files changed, 11 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ee756fd [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance ee756fd is described below commit ee756fd69528f90f63ffd45edc821c6b69a8a35e Author: Gengliang Wang AuthorDate: Tue Mar 9 13:19:14 2021 +0900 [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance ### What changes were proposed in this pull request? 1. Fix the table of valid type coercion combinations. Binary type should be allowed casting to String type and disallowed casting to Numeric types. 2. Summary all the `CAST`s that can cause runtime exceptions. ### Why are the changes needed? Fix a mistake in the docs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Run `jekyll serve` and preview: ![image](https://user-images.githubusercontent.com/1097932/110334374-8fab5a80-7fd7-11eb-86e7-c519cfa41b99.png) Closes #31781 from gengliangwang/reviseAnsiDoc2. Authored-by: Gengliang Wang Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-ansi-compliance.md | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index 99e230b..4b3ff46 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -72,16 +72,23 @@ The type conversion of Spark ANSI mode follows the syntax rules of section 6.13 | Source\Target | Numeric | String | Date | Timestamp | Interval | Boolean | Binary | Array | Map | Struct | |---|-||--|---|--|-||---|-|| -| Numeric | Y | Y | N| N | N| Y | N | N | N | N | -| String| Y | Y | Y| Y | Y| Y | Y | N | N | N | +| Numeric | **Y** | Y | N| N | N| Y | N | N | N | N | +| String| **Y** | Y | **Y** | **Y** | **Y** | **Y** | Y | N | N | N | | Date | N | Y | Y| Y | N| N | N | N | N | N | | Timestamp | N | Y | Y| Y | N| N | N | N | N | N | | Interval | N | Y | N| N | Y| N | N | N | N | N | | Boolean | Y | Y | N| N | N| Y | N | N | N | N | -| Binary| Y | N | N| N | N| N | Y | N | N | N | -| Array | N | N | N| N | N| N | N | Y | N | N | -| Map | N | N | N| N | N| N | N | N | Y | N | -| Struct| N | N | N| N | N| N | N | N | N | Y | +| Binary| N | Y | N| N | N| N | Y | N | N | N | +| Array | N | N | N| N | N| N | N | **Y** | N | N | +| Map | N | N | N| N | N| N | N | N | **Y** | N | +| Struct| N | N | N| N | N| N | N | N | N | **Y** | + +In the table above, all the `CAST`s that can cause runtime exceptions are marked as red **Y**: +* CAST(Numeric AS Numeric): raise an overflow exception if the value is out of the target data type's range. +* CAST(String AS (Numeric/Date/Timestamp/Interval/Boolean)): raise a runtime exception if the value can't be parsed as the target data type. +* CAST(Array AS Array): raise an exception if there is any on the conversion of the elements. +* CAST(Map AS Map): raise an exception if there is any on the conversion of the keys and the values. +* CAST(Struct AS Struct): raise an exception if there is any on the conversion of the struct fields. Currently, the ANSI mode affects explicit casting and assignment casting only. In future releases, the behaviour of type coercion might change along with the other two type conversion rules. @@ -163,9 +170,6 @@ The behavior of some SQL functions can be different under ANSI mode (`spark.sql. The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`). - `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices. - `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map. - - `CAST(string_col AS TIMESTAMP)`: This operator should fail with an except
[spark] branch master updated (f72b906 -> 1a97224)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f72b906 [SPARK-34643][R][DOCS] Use CRAN URL in canonical form add 1a97224 [SPARK-34595][SQL] DPP support RLIKE No new revisions were added by this update. Summary of changes: .../dynamicpruning/PartitionPruning.scala | 2 +- .../spark/sql/DynamicPartitionPruningSuite.scala | 26 ++ 2 files changed, 27 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9ac5ee2e -> dbce74d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9ac5ee2e [SPARK-32924][WEBUI] Make duration column in master UI sorted in the correct order add dbce74d [SPARK-34607][SQL] Add `Utils.isMemberClass` to fix a malformed class name error on jdk8u No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/util/Utils.scala | 28 + .../spark/sql/catalyst/encoders/OuterScopes.scala | 2 +- .../sql/catalyst/expressions/objects/objects.scala | 2 +- .../catalyst/encoders/ExpressionEncoderSuite.scala | 70 ++ 4 files changed, 100 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (499f620 -> 56edb81)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 499f620 [MINOR][SQL][DOCS] Fix some wrong default values in SQL tuning guide's AQE section add 56edb81 [SPARK-33474][SQL] Support TypeConstructed partition spec value No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 + docs/sql-ref-syntax-ddl-alter-table.md | 8 ++-- docs/sql-ref-syntax-dml-insert-into.md | 15 ++- docs/sql-ref-syntax-dml-insert-overwrite-table.md | 25 ++- .../spark/sql/catalyst/parser/AstBuilder.scala | 14 +-- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 30 -- .../org/apache/spark/sql/SQLInsertTestSuite.scala | 48 ++ .../command/AlterTableAddPartitionSuiteBase.scala | 8 .../command/AlterTableDropPartitionSuiteBase.scala | 10 + .../AlterTableRenamePartitionSuiteBase.scala | 11 + 10 files changed, 158 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (499cc79 -> b13a4b8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 499cc79 [SPARK-34503][DOCS][FOLLOWUP] Document available codecs for event log compression add b13a4b8 [SPARK-34573][SQL] Avoid global locking in SQLConf object for sqlConfEntries map No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d574308 -> 1afe284)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d574308 [SPARK-34579][SQL][TEST] Fix wrong UT in SQLQuerySuite add 1afe284 [SPARK-34570][SQL] Remove dead code from constructors of [Hive]SessionStateBuilder No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/sql/SparkSession.scala| 11 --- .../apache/spark/sql/internal/BaseSessionStateBuilder.scala | 3 +-- .../scala/org/apache/spark/sql/internal/SessionState.scala| 7 +++ .../test/scala/org/apache/spark/sql/test/TestSQLContext.scala | 9 - .../org/apache/spark/sql/hive/HiveSessionStateBuilder.scala | 7 +++ .../test/scala/org/apache/spark/sql/hive/test/TestHive.scala | 9 - 6 files changed, 19 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34506][CORE] ADD JAR with ivy coordinates should be compatible with Hive transitive behavior
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0216051 [SPARK-34506][CORE] ADD JAR with ivy coordinates should be compatible with Hive transitive behavior 0216051 is described below commit 0216051acadedcc7e9bcd840aa78776159b200d1 Author: Shardul Mahadik AuthorDate: Mon Mar 1 09:10:20 2021 +0900 [SPARK-34506][CORE] ADD JAR with ivy coordinates should be compatible with Hive transitive behavior ### What changes were proposed in this pull request? SPARK-33084 added the ability to use ivy coordinates with `SparkContext.addJar`. PR #29966 claims to mimic Hive behavior although I found a few cases where it doesn't 1) The default value of the transitive parameter is false, both in case of parameter not being specified in coordinate or parameter value being invalid. The Hive behavior is that transitive is [true if not specified](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L169) in the coordinate and [false for invalid values](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe752 [...] 2) The parameter value for transitive parameter is regarded as case-sensitive [based on the understanding](https://github.com/apache/spark/pull/29966#discussion_r547752259) that Hive behavior is case-sensitive. However, this is not correct, Hive [treats the parameter value case-insensitively](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L122). I propose that we be compatible with Hive for these behaviors ### Why are the changes needed? To make `ADD JAR` with ivy coordinates compatible with Hive's transitive behavior ### Does this PR introduce _any_ user-facing change? The user-facing changes here are within master as the feature introduced in SPARK-33084 has not been released yet 1. Previously an ivy coordinate without `transitive` parameter specified did not resolve transitive dependency, now it does. 2. Previously an `transitive` parameter value was treated case-sensitively. e.g. `transitive=TRUE` would be treated as false as it did not match exactly `true`. Now it will be treated case-insensitively. ### How was this patch tested? Modified existing unit tests to test new behavior Add new unit test to cover usage of `exclude` with unspecified `transitive` Closes #31623 from shardulm94/spark-34506. Authored-by: Shardul Mahadik Signed-off-by: Takeshi Yamamuro --- .../org/apache/spark/util/DependencyUtils.scala| 14 + .../scala/org/apache/spark/SparkContextSuite.scala | 33 +++--- docs/sql-ref-syntax-aux-resource-mgmt-add-jar.md | 2 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 8 +++--- .../spark/sql/hive/execution/HiveQuerySuite.scala | 7 - 5 files changed, 42 insertions(+), 22 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/util/DependencyUtils.scala b/core/src/main/scala/org/apache/spark/util/DependencyUtils.scala index 60e866a..f7135edd 100644 --- a/core/src/main/scala/org/apache/spark/util/DependencyUtils.scala +++ b/core/src/main/scala/org/apache/spark/util/DependencyUtils.scala @@ -59,8 +59,9 @@ private[spark] object DependencyUtils extends Logging { * @param uri Ivy URI need to be downloaded. * @return Tuple value of parameter `transitive` and `exclude` value. * - * 1. transitive: whether to download dependency jar of Ivy URI, default value is false - *and this parameter value is case-sensitive. Invalid value will be treat as false. + * 1. transitive: whether to download dependency jar of Ivy URI, default value is true + *and this parameter value is case-insensitive. This mimics Hive's behaviour for + *parsing the transitive parameter. Invalid value will be treat as false. *Example: Input: exclude=org.mortbay.jetty:jetty&transitive=true *Output: true * @@ -72,7 +73,7 @@ private[spark] object DependencyUtils extends Logging { private def parseQueryParams(uri: URI): (Boolean, String) = { val uriQuery = uri.getQuery if (uriQuery == null) { - (false, "") + (true, "") } else { val mapTokens = uriQuery.split("&").map(_.split("=")) if (mapTokens.exists(isInvalidQueryString)) { @@ -81,14 +82,15 @@ private[spark] object DependencyUtils extends Logging { } val groupedParams = mapTokens.map(kv => (kv(0), kv(1))).groupBy(_._1) - // Parse transitive paramet
[spark] branch master updated (5a48eb8 -> d07fc30)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5a48eb8 [SPARK-34415][ML] Python example add d07fc30 [SPARK-33687][SQL] Support analyze all tables in a specific database No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 2 + docs/sql-ref-syntax-aux-analyze-table.md | 6 +- docs/sql-ref-syntax-aux-analyze-tables.md | 110 + docs/sql-ref-syntax-aux-analyze.md | 1 + docs/sql-ref-syntax.md | 1 + .../apache/spark/sql/catalyst/parser/SqlBase.g4| 2 + .../spark/sql/catalyst/analysis/Analyzer.scala | 2 + .../spark/sql/catalyst/parser/AstBuilder.scala | 19 .../sql/catalyst/plans/logical/v2Commands.scala| 9 ++ .../spark/sql/catalyst/parser/DDLParserSuite.scala | 9 ++ .../catalyst/analysis/ResolveSessionCatalog.scala | 3 + .../execution/command/AnalyzeTableCommand.scala| 36 +-- .../{cache.scala => AnalyzeTablesCommand.scala}| 22 - .../spark/sql/execution/command/CommandUtils.scala | 37 ++- .../spark/sql/StatisticsCollectionSuite.scala | 37 +++ 15 files changed, 257 insertions(+), 39 deletions(-) create mode 100644 docs/sql-ref-syntax-aux-analyze-tables.md copy sql/core/src/main/scala/org/apache/spark/sql/execution/command/{cache.scala => AnalyzeTablesCommand.scala} (60%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (56e664c -> 05069ff)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 56e664c [SPARK-34392][SQL] Support ZoneOffset +h:mm in DateTimeUtils. getZoneId add 05069ff [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has 0/1 partition No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/limit.scala | 76 +- .../sql/execution/TakeOrderedAndProjectSuite.scala | 56 +--- 2 files changed, 78 insertions(+), 54 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-33971][SQL] Eliminate distinct from more aggregates
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 67ec4f7 [SPARK-33971][SQL] Eliminate distinct from more aggregates 67ec4f7 is described below commit 67ec4f7f67dc494c2619b7faf1b1145f2200b65c Author: tanel.k...@gmail.com AuthorDate: Fri Feb 26 21:59:02 2021 +0900 [SPARK-33971][SQL] Eliminate distinct from more aggregates ### What changes were proposed in this pull request? Add more aggregate expressions to `EliminateDistinct` rule. ### Why are the changes needed? Distinct aggregation can add a significant overhead. It's better to remove distinct whenever possible. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #30999 from tanelk/SPARK-33971_eliminate_distinct. Authored-by: tanel.k...@gmail.com Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/optimizer/Optimizer.scala | 16 ++--- .../optimizer/EliminateDistinctSuite.scala | 41 +++--- 2 files changed, 32 insertions(+), 25 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index 717770f..cb24180 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -352,11 +352,17 @@ abstract class Optimizer(catalogManager: CatalogManager) */ object EliminateDistinct extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan transformExpressions { -case ae: AggregateExpression if ae.isDistinct => - ae.aggregateFunction match { -case _: Max | _: Min => ae.copy(isDistinct = false) -case _ => ae - } +case ae: AggregateExpression if ae.isDistinct && isDuplicateAgnostic(ae.aggregateFunction) => + ae.copy(isDistinct = false) + } + + private def isDuplicateAgnostic(af: AggregateFunction): Boolean = af match { +case _: Max => true +case _: Min => true +case _: BitAndAgg => true +case _: BitOrAgg => true +case _: CollectSet => true +case _ => false } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateDistinctSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateDistinctSuite.scala index 51c7519..0848d56 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateDistinctSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateDistinctSuite.scala @@ -18,6 +18,8 @@ package org.apache.spark.sql.catalyst.optimizer import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.expressions.aggregate._ import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} import org.apache.spark.sql.catalyst.rules.RuleExecutor @@ -32,25 +34,24 @@ class EliminateDistinctSuite extends PlanTest { val testRelation = LocalRelation('a.int) - test("Eliminate Distinct in Max") { -val query = testRelation - .select(maxDistinct('a).as('result)) - .analyze -val answer = testRelation - .select(max('a).as('result)) - .analyze -assert(query != answer) -comparePlans(Optimize.execute(query), answer) - } - - test("Eliminate Distinct in Min") { -val query = testRelation - .select(minDistinct('a).as('result)) - .analyze -val answer = testRelation - .select(min('a).as('result)) - .analyze -assert(query != answer) -comparePlans(Optimize.execute(query), answer) + Seq( +Max(_), +Min(_), +BitAndAgg(_), +BitOrAgg(_), +CollectSet(_: Expression) + ).foreach { +aggBuilder => + val agg = aggBuilder('a) + test(s"Eliminate Distinct in ${agg.prettyName}") { +val query = testRelation + .select(agg.toAggregateExpression(isDistinct = true).as('result)) + .analyze +val answer = testRelation + .select(agg.toAggregateExpression(isDistinct = false).as('result)) + .analyze +assert(query != answer) +comparePlans(Optimize.execute(query), answer) + } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e6753c9 -> 0a37a95)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e6753c9 [SPARK-33995][SQL] Expose make_interval as a Scala function add 0a37a95 [SPARK-31816][SQL][DOCS] Added high level description about JDBC connection providers for users/developers No new revisions were added by this update. Summary of changes: docs/sql-data-sources-jdbc.md | 14 ...rg.apache.spark.sql.jdbc.JdbcConnectionProvider | 1 + .../sql/jdbc/ExampleJdbcConnectionProvider.scala | 19 +++-- .../main/scala/org/apache/spark/sql/jdbc/README.md | 84 ++ 4 files changed, 108 insertions(+), 10 deletions(-) create mode 100644 examples/src/main/resources/META-INF/services/org.apache.spark.sql.jdbc.JdbcConnectionProvider copy sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/IntentionallyFaultyConnectionProvider.scala => examples/src/main/scala/org/apache/spark/examples/sql/jdbc/ExampleJdbcConnectionProvider.scala (70%) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/jdbc/README.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3e12e9d -> f79305a)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3e12e9d [SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with other `SHOW` command add f79305a [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 45 +- .../apache/spark/sql/jdbc/PostgresDialect.scala| 5 ++- 2 files changed, 47 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (76baaf7 -> 1f4135c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 76baaf7 [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1 add 1f4135c [SPARK-34350][SQL][TESTS] replace withTimeZone defined in OracleIntegrationSuite with DateTimeTestUtils.withDefaultTimeZone No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/OracleIntegrationSuite.scala| 22 +++--- 1 file changed, 3 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (361d702 -> 55399eb)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 361d702 [SPARK-34359][SQL] Add a legacy config to restore the output schema of SHOW DATABASES add 55399eb [SPARK-34343][SQL][TESTS] Add missing test for some non-array types in PostgreSQL No new revisions were added by this update. Summary of changes: .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 75 -- 1 file changed, 69 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5acc5b8 -> 66f3480)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5acc5b8 [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3 add 66f3480 [SPARK-34318][SQL] Dataset.colRegex should work with column names and qualifiers which contain newlines No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala | 4 ++-- .../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 9 + 2 files changed, 11 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org