[spark] branch master updated (499f620 -> 56edb81)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 499f620 [MINOR][SQL][DOCS] Fix some wrong default values in SQL tuning guide's AQE section add 56edb81 [SPARK-33474][SQL] Support TypeConstructed partition spec value No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 + docs/sql-ref-syntax-ddl-alter-table.md | 8 ++-- docs/sql-ref-syntax-dml-insert-into.md | 15 ++- docs/sql-ref-syntax-dml-insert-overwrite-table.md | 25 ++- .../spark/sql/catalyst/parser/AstBuilder.scala | 14 +-- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 30 -- .../org/apache/spark/sql/SQLInsertTestSuite.scala | 48 ++ .../command/AlterTableAddPartitionSuiteBase.scala | 8 .../command/AlterTableDropPartitionSuiteBase.scala | 10 + .../AlterTableRenamePartitionSuiteBase.scala | 11 + 10 files changed, 158 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9ac5ee2e -> dbce74d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9ac5ee2e [SPARK-32924][WEBUI] Make duration column in master UI sorted in the correct order add dbce74d [SPARK-34607][SQL] Add `Utils.isMemberClass` to fix a malformed class name error on jdk8u No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/util/Utils.scala | 28 + .../spark/sql/catalyst/encoders/OuterScopes.scala | 2 +- .../sql/catalyst/expressions/objects/objects.scala | 2 +- .../catalyst/encoders/ExpressionEncoderSuite.scala | 70 ++ 4 files changed, 100 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f72b906 -> 1a97224)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f72b906 [SPARK-34643][R][DOCS] Use CRAN URL in canonical form add 1a97224 [SPARK-34595][SQL] DPP support RLIKE No new revisions were added by this update. Summary of changes: .../dynamicpruning/PartitionPruning.scala | 2 +- .../spark/sql/DynamicPartitionPruningSuite.scala | 26 ++ 2 files changed, 27 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ee756fd [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance ee756fd is described below commit ee756fd69528f90f63ffd45edc821c6b69a8a35e Author: Gengliang Wang AuthorDate: Tue Mar 9 13:19:14 2021 +0900 [SPARK-34665][SQL][DOCS] Revise the type coercion section of ANSI Compliance ### What changes were proposed in this pull request? 1. Fix the table of valid type coercion combinations. Binary type should be allowed casting to String type and disallowed casting to Numeric types. 2. Summary all the `CAST`s that can cause runtime exceptions. ### Why are the changes needed? Fix a mistake in the docs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Run `jekyll serve` and preview: ![image](https://user-images.githubusercontent.com/1097932/110334374-8fab5a80-7fd7-11eb-86e7-c519cfa41b99.png) Closes #31781 from gengliangwang/reviseAnsiDoc2. Authored-by: Gengliang Wang Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-ansi-compliance.md | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index 99e230b..4b3ff46 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -72,16 +72,23 @@ The type conversion of Spark ANSI mode follows the syntax rules of section 6.13 | Source\Target | Numeric | String | Date | Timestamp | Interval | Boolean | Binary | Array | Map | Struct | |---|-||--|---|--|-||---|-|| -| Numeric | Y | Y | N| N | N| Y | N | N | N | N | -| String| Y | Y | Y| Y | Y| Y | Y | N | N | N | +| Numeric | **Y** | Y | N| N | N| Y | N | N | N | N | +| String| **Y** | Y | **Y** | **Y** | **Y** | **Y** | Y | N | N | N | | Date | N | Y | Y| Y | N| N | N | N | N | N | | Timestamp | N | Y | Y| Y | N| N | N | N | N | N | | Interval | N | Y | N| N | Y| N | N | N | N | N | | Boolean | Y | Y | N| N | N| Y | N | N | N | N | -| Binary| Y | N | N| N | N| N | Y | N | N | N | -| Array | N | N | N| N | N| N | N | Y | N | N | -| Map | N | N | N| N | N| N | N | N | Y | N | -| Struct| N | N | N| N | N| N | N | N | N | Y | +| Binary| N | Y | N| N | N| N | Y | N | N | N | +| Array | N | N | N| N | N| N | N | **Y** | N | N | +| Map | N | N | N| N | N| N | N | N | **Y** | N | +| Struct| N | N | N| N | N| N | N | N | N | **Y** | + +In the table above, all the `CAST`s that can cause runtime exceptions are marked as red **Y**: +* CAST(Numeric AS Numeric): raise an overflow exception if the value is out of the target data type's range. +* CAST(String AS (Numeric/Date/Timestamp/Interval/Boolean)): raise a runtime exception if the value can't be parsed as the target data type. +* CAST(Array AS Array): raise an exception if there is any on the conversion of the elements. +* CAST(Map AS Map): raise an exception if there is any on the conversion of the keys and the values. +* CAST(Struct AS Struct): raise an exception if there is any on the conversion of the struct fields. Currently, the ANSI mode affects explicit casting and assignment casting only. In future releases, the behaviour of type coercion might change along with the other two type conversion rules. @@ -163,9 +170,6 @@ The behavior of some SQL functions can be different under ANSI mode (`spark.sql. The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`). - `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices. - `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map. - - `CAST(string_col AS TIMESTAMP)`: This operator should fail with an except
[spark] branch master updated (48637a9 -> bf4570b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 48637a9 [SPARK-34766][SQL] Do not capture maven config for views add bf4570b [SPARK-34749][SQL] Simplify ResolveCreateNamedStruct No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 2 -- .../sql/catalyst/expressions/complexTypeCreator.scala | 10 +- .../sql/catalyst/expressions/complexTypeExtractors.scala | 14 +- .../spark/sql/catalyst/parser/ExpressionParserSuite.scala | 2 +- 4 files changed, 11 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bf4570b -> 9f7b0a0)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bf4570b [SPARK-34749][SQL] Simplify ResolveCreateNamedStruct add 9f7b0a0 [SPARK-34758][SQL] Simplify Analyzer.resolveLiteralFunction No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 29 ++ 1 file changed, 7 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34749][SQL][3.1] Simplify ResolveCreateNamedStruct
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 448b8d0 [SPARK-34749][SQL][3.1] Simplify ResolveCreateNamedStruct 448b8d0 is described below commit 448b8d07df41040058c21e6102406e1656727599 Author: Wenchen Fan AuthorDate: Thu Mar 18 07:44:11 2021 +0900 [SPARK-34749][SQL][3.1] Simplify ResolveCreateNamedStruct backports https://github.com/apache/spark/pull/31843 ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/31808 and simplifies its fix to one line (excluding comments). ### Why are the changes needed? code simplification ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #31867 from cloud-fan/backport. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro --- .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 2 -- .../spark/sql/catalyst/expressions/complexTypeCreator.scala | 10 +- .../sql/catalyst/expressions/complexTypeExtractors.scala | 11 +-- .../spark/sql/catalyst/parser/ExpressionParserSuite.scala | 2 +- 4 files changed, 11 insertions(+), 14 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index f98f33b..f4cdeab 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -3840,8 +3840,6 @@ object ResolveCreateNamedStruct extends Rule[LogicalPlan] { val children = e.children.grouped(2).flatMap { case Seq(NamePlaceholder, e: NamedExpression) if e.resolved => Seq(Literal(e.name), e) -case Seq(NamePlaceholder, e: ExtractValue) if e.resolved && e.name.isDefined => - Seq(Literal(e.name.get), e) case kv => kv } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala index cb59fbd..1779d41 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala @@ -20,7 +20,7 @@ package org.apache.spark.sql.catalyst.expressions import scala.collection.mutable.ArrayBuffer import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.analysis.{Resolver, TypeCheckResult, TypeCoercion, UnresolvedExtractValue} +import org.apache.spark.sql.catalyst.analysis.{Resolver, TypeCheckResult, TypeCoercion, UnresolvedAttribute, UnresolvedExtractValue} import org.apache.spark.sql.catalyst.analysis.FunctionRegistry.{FUNC_ALIAS, FunctionBuilder} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.catalyst.expressions.codegen.Block._ @@ -336,6 +336,14 @@ object CreateStruct { */ def apply(children: Seq[Expression]): CreateNamedStruct = { CreateNamedStruct(children.zipWithIndex.flatMap { + // For multi-part column name like `struct(a.b.c)`, it may be resolved into: + // 1. Attribute if `a.b.c` is simply a qualified column name. + // 2. GetStructField if `a.b` refers to a struct-type column. + // 3. GetArrayStructFields if `a.b` refers to a array-of-struct-type column. + // 4. GetMapValue if `a.b` refers to a map-type column. + // We should always use the last part of the column name (`c` in the above example) as the + // alias name inside CreateNamedStruct. + case (u: UnresolvedAttribute, _) => Seq(Literal(u.nameParts.last), u) case (e: NamedExpression, _) if e.resolved => Seq(Literal(e.name), e) case (e: NamedExpression, _) => Seq(NamePlaceholder, e) case (e, index) => Seq(Literal(s"col${index + 1}"), e) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala index 9b80140..ef247ef 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala @@ -94,10 +94,7 @@ object ExtractValue { } } -trait ExtractValue extends Expression { - // The name that is used to extract the value. - def name: Option[String] -}
[spark] branch master updated: [SPARK-34781][SQL] Eliminate LEFT SEMI/ANTI joins to its left child side in AQE
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8207e2f [SPARK-34781][SQL] Eliminate LEFT SEMI/ANTI joins to its left child side in AQE 8207e2f is described below commit 8207e2f65cc2ce2d87ee60ee05a2c1ee896cf93e Author: Cheng Su AuthorDate: Fri Mar 19 09:41:52 2021 +0900 [SPARK-34781][SQL] Eliminate LEFT SEMI/ANTI joins to its left child side in AQE ### What changes were proposed in this pull request? In `EliminateJoinToEmptyRelation.scala`, we can extend it to cover more cases for LEFT SEMI and LEFT ANI joins: * Join is left semi join, join right side is non-empty and condition is empty. Eliminate join to its left side. * Join is left anti join, join right side is empty. Eliminate join to its left side. Given we eliminate join to its left side here, renaming the current optimization rule to `EliminateUnnecessaryJoin` instead. In addition, also change to use `checkRowCount()` to check run time row count, instead of using `EmptyHashedRelation`. So this can cover `BroadcastNestedLoopJoin` as well. (`BroadcastNestedLoopJoin`'s broadcast side is `Array[InternalRow]`, not `HashedRelation`). ### Why are the changes needed? Cover more join cases, and improve query performance for affected queries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unit tests in `AdaptiveQueryExecSuite.scala`. Closes #31873 from c21/aqe-join. Authored-by: Cheng Su Signed-off-by: Takeshi Yamamuro --- .../sql/execution/adaptive/AQEOptimizer.scala | 2 +- .../adaptive/EliminateJoinToEmptyRelation.scala| 71 - .../adaptive/EliminateUnnecessaryJoin.scala| 91 ++ .../spark/sql/DynamicPartitionPruningSuite.scala | 2 +- .../adaptive/AdaptiveQueryExecSuite.scala | 51 5 files changed, 127 insertions(+), 90 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala index 04b8ade..901637d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala @@ -29,7 +29,7 @@ class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { private val defaultBatches = Seq( Batch("Demote BroadcastHashJoin", Once, DemoteBroadcastHashJoin), -Batch("Eliminate Join to Empty Relation", Once, EliminateJoinToEmptyRelation) +Batch("Eliminate Unnecessary Join", Once, EliminateUnnecessaryJoin) ) final override protected def batches: Seq[Batch] = { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala deleted file mode 100644 index d6df522..000 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala +++ /dev/null @@ -1,71 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.execution.adaptive - -import org.apache.spark.sql.catalyst.planning.ExtractSingleColumnNullAwareAntiJoin -import org.apache.spark.sql.catalyst.plans.{Inner, LeftAnti, LeftSemi} -import org.apache.spark.sql.catalyst.plans.logical.{Join, LocalRelation, LogicalPlan} -import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.execution.joins.{EmptyHashedRelation, HashedRelation, HashedRelationWithAllNullKeys} - -/** - * This optimization rule detects and converts a Join to an empty [[LocalRelation]]: - * 1. Join is single column NULL-aware anti join (NAAJ), and broadcasted [[HashedRelation]] - *is [[HashedRelationWithAllNullKeys]]. - * - * 2. Join is in
[spark] branch branch-3.1 updated (1b70aad -> c2629a7)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 1b70aad [SPARK-34747][SQL][DOCS] Add virtual operators to the built-in function document add c2629a7 [SPARK-34719][SQL][3.1] Correctly resolve the view query with duplicated column names No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/analysis/view.scala | 44 +--- .../spark/sql/execution/SQLViewTestSuite.scala | 48 ++ 2 files changed, 86 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34719][SQL][3.0] Correctly resolve the view query with duplicated column names
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 25d7219 [SPARK-34719][SQL][3.0] Correctly resolve the view query with duplicated column names 25d7219 is described below commit 25d72191de7c842aa2acd4b7307ba8e6585dd182 Author: Wenchen Fan AuthorDate: Sat Mar 20 11:09:50 2021 +0900 [SPARK-34719][SQL][3.0] Correctly resolve the view query with duplicated column names backport https://github.com/apache/spark/pull/31811 to 3.0 ### What changes were proposed in this pull request? For permanent views (and the new SQL temp view in Spark 3.1), we store the view SQL text and re-parse/analyze the view SQL text when reading the view. In the case of `SELECT * FROM ...`, we want to avoid view schema change (e.g. the referenced table changes its schema) and will record the view query output column names when creating the view, so that when reading the view we can add a `SELECT recorded_column_names FROM ...` to retain the original view query schema. In Spark 3.1 and before, the final SELECT is added after the analysis phase: https://github.com/apache/spark/blob/branch-3.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala#L67 If the view query has duplicated output column names, we always pick the first column when reading a view. A simple repro: ``` scala> sql("create view c(x, y) as select 1 a, 2 a") res0: org.apache.spark.sql.DataFrame = [] scala> sql("select * from c").show +---+---+ | x| y| +---+---+ | 1| 1| +---+---+ ``` In the master branch, we will fail at the view reading time due to https://github.com/apache/spark/commit/b891862fb6b740b103d5a09530626ee4e0e8f6e3 , which adds the final SELECT during analysis, so that the query fails with `Reference 'a' is ambiguous` This PR proposes to resolve the view query output column names from the matching attributes by ordinal. For example, `create view c(x, y) as select 1 a, 2 a`, the view query output column names are `[a, a]`. When we reading the view, there are 2 matching attributes (e.g.`[a#1, a#2]`) and we can simply match them by ordinal. A negative example is ``` create table t(a int) create view v as select *, 1 as col from t replace table t(a int, col int) ``` When reading the view, the view query output column names are `[a, col]`, and there are two matching attributes of `col`, and we should fail the query. See the tests for details. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? new test Closes #31894 from cloud-fan/backport. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/analysis/view.scala | 44 ++--- .../apache/spark/sql/execution/SQLViewSuite.scala | 45 +- 2 files changed, 82 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala index 6560164..013a303 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala @@ -17,7 +17,10 @@ package org.apache.spark.sql.catalyst.analysis -import org.apache.spark.sql.catalyst.expressions.Alias +import java.util.Locale + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute} import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, View} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.internal.SQLConf @@ -60,15 +63,44 @@ object EliminateView extends Rule[LogicalPlan] with CastSupport { // The child has the different output attributes with the View operator. Adds a Project over // the child of the view. case v @ View(desc, output, child) if child.resolved && !v.sameOutput(child) => + // Use the stored view query output column names to find the matching attributes. The column + // names may have duplication, e.g. `CREATE VIEW v(x, y) AS SELECT 1 col, 2 col`. We need to + // make sure the that matching attributes have the same number of duplications, and pick the + // corresponding attribute by ordinal. val resolver = conf.resolver val queryColumnNames = desc.viewQueryColumnNames val queryOutput = if (queryColumnNames.nonEmpty) { -// Find the attribute that has the expected at
[spark] branch master updated (7a8a600 -> 620cae0)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7a8a600 [SPARK-34776][SQL] Nested column pruning should not prune Window produced attributes add 620cae0 [SPARK-33122][SQL] Remove redundant aggregates in the Optimzier No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 50 --- .../analysis/PullOutNondeterministic.scala | 74 ++ .../spark/sql/catalyst/optimizer/Optimizer.scala | 45 ++ .../plans/logical/basicLogicalOperators.scala | 2 +- .../optimizer/RemoveRedundantAggregatesSuite.scala | 163 + .../execution/RemoveRedundantProjectsSuite.scala | 2 +- 6 files changed, 284 insertions(+), 52 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala create mode 100644 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregatesSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (620cae0 -> 2ff0032)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 620cae0 [SPARK-33122][SQL] Remove redundant aggregates in the Optimzier add 2ff0032 [SPARK-34796][SQL] Initialize counter variable for LIMIT code-gen in doProduce() No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/limit.scala | 12 .../scala/org/apache/spark/sql/SQLQuerySuite.scala| 19 +++ 2 files changed, 27 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (25d7219 -> 828cf76)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 25d7219 [SPARK-34719][SQL][3.0] Correctly resolve the view query with duplicated column names add 828cf76 [SPARK-34776][SQL][3.0][2.4] Window class should override producedAttributes No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-34776][SQL][3.0][2.4] Window class should override producedAttributes
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 59e4ae4 [SPARK-34776][SQL][3.0][2.4] Window class should override producedAttributes 59e4ae4 is described below commit 59e4ae4149ff93bd64c8b3210c27dc2fbebe2a96 Author: Liang-Chi Hsieh AuthorDate: Sat Mar 20 11:26:01 2021 +0900 [SPARK-34776][SQL][3.0][2.4] Window class should override producedAttributes ### What changes were proposed in this pull request? This patch proposes to override `producedAttributes` of `Window` class. ### Why are the changes needed? This is a backport of #31897 to branch-3.0/2.4. Unlike original PR, nested column pruning does not allow pushing through `Window` in branch-3.0/2.4 yet. But `Window` doesn't override `producedAttributes`. It's wrong and could cause potential issue. So backport `Window` related change. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. Closes #31904 from viirya/SPARK-34776-3.0. Authored-by: Liang-Chi Hsieh Signed-off-by: Takeshi Yamamuro (cherry picked from commit 828cf76bced1b70769b0453f3e9ba95faaa84e39) Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index a0086c1..2fe9cd4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -621,6 +621,8 @@ case class Window( override def output: Seq[Attribute] = child.output ++ windowExpressions.map(_.toAttribute) + override def producedAttributes: AttributeSet = windowOutputSet + def windowOutputSet: AttributeSet = AttributeSet(windowExpressions.map(_.toAttribute)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (da013d0 -> 250c820)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from da013d0 [MINOR][DOCS][ML] Doc 'mode' as a supported Imputer strategy in Pyspark add 250c820 [SPARK-34796][SQL][3.1] Initialize counter variable for LIMIT code-gen in doProduce() No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/limit.scala | 12 .../scala/org/apache/spark/sql/SQLQuerySuite.scala| 19 +++ 2 files changed, 27 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 35c70e4 [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator 35c70e4 is described below commit 35c70e417d8c6e3958e0da8a4bec731f9e394a28 Author: Cheng Su AuthorDate: Wed Mar 24 23:06:35 2021 +0900 [SPARK-34853][SQL] Remove duplicated definition of output partitioning/ordering for limit operator ### What changes were proposed in this pull request? Both local limit and global limit define the output partitioning and output ordering in the same way and this is duplicated (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala#L159-L175 ). We can move the output partitioning and ordering into their parent trait - `BaseLimitExec`. This is doable as `BaseLimitExec` has no more other child class. This is a minor code refactoring. ### Why are the changes needed? Clean up the code a little bit. Better readability. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pure refactoring. Rely on existing unit tests. Closes #31950 from c21/limit-cleanup. Authored-by: Cheng Su Signed-off-by: Takeshi Yamamuro --- .../main/scala/org/apache/spark/sql/execution/limit.scala | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala index d8f67fb..e5a2995 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala @@ -113,6 +113,10 @@ object BaseLimitExec { trait BaseLimitExec extends LimitExec with CodegenSupport { override def output: Seq[Attribute] = child.output + override def outputPartitioning: Partitioning = child.outputPartitioning + + override def outputOrdering: Seq[SortOrder] = child.outputOrdering + protected override def doExecute(): RDD[InternalRow] = child.execute().mapPartitions { iter => iter.take(limit) } @@ -156,12 +160,7 @@ trait BaseLimitExec extends LimitExec with CodegenSupport { /** * Take the first `limit` elements of each child partition, but do not collect or shuffle them. */ -case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { - - override def outputOrdering: Seq[SortOrder] = child.outputOrdering - - override def outputPartitioning: Partitioning = child.outputPartitioning -} +case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec /** * Take the first `limit` elements of the child's single output partition. @@ -169,10 +168,6 @@ case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { override def requiredChildDistribution: List[Distribution] = AllTuples :: Nil - - override def outputPartitioning: Partitioning = child.outputPartitioning - - override def outputOrdering: Seq[SortOrder] = child.outputOrdering } /** - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88cf86f -> 150769b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88cf86f [SPARK-34797][ML] Refactor Logistic Aggregator - support virtual centering add 150769b [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 45 ++--- .../apache/spark/sql/CharVarcharTestSuite.scala| 57 -- 2 files changed, 79 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 5ecf306 [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries 5ecf306 is described below commit 5ecf306245d17053e25b68c844828878a66b593a Author: Takeshi Yamamuro AuthorDate: Thu Mar 25 08:31:57 2021 +0900 [SPARK-34833][SQL] Apply right-padding correctly for correlated subqueries ### What changes were proposed in this pull request? This PR intends to fix the bug that does not apply right-padding for char types inside correlated subquries. For example, a query below returns nothing in master, but a correct result is `c`. ``` scala> sql(s"CREATE TABLE t1(v VARCHAR(3), c CHAR(5)) USING parquet") scala> sql(s"CREATE TABLE t2(v VARCHAR(5), c CHAR(7)) USING parquet") scala> sql("INSERT INTO t1 VALUES ('c', 'b')") scala> sql("INSERT INTO t2 VALUES ('a', 'b')") scala> val df = sql(""" |SELECT v FROM t1 |WHERE 'a' IN (SELECT v FROM t2 WHERE t2.c = t1.c )""".stripMargin) scala> df.show() +---+ | v| +---+ +---+ ``` This is because `ApplyCharTypePadding` does not handle the case above to apply right-padding into `'abc'`. This PR modifies the code in `ApplyCharTypePadding` for handling it correctly. ``` // Before this PR: scala> df.explain(true) == Analyzed Logical Plan == v: string Project [v#13] +- Filter a IN (list#12 [c#14]) : +- Project [v#15] : +- Filter (c#16 = outer(c#14)) :+- SubqueryAlias spark_catalog.default.t2 : +- Relation default.t2[v#15,c#16] parquet +- SubqueryAlias spark_catalog.default.t1 +- Relation default.t1[v#13,c#14] parquet scala> df.show() +---+ | v| +---+ +---+ // After this PR: scala> df.explain(true) == Analyzed Logical Plan == v: string Project [v#43] +- Filter a IN (list#42 [c#44]) : +- Project [v#45] : +- Filter (c#46 = rpad(outer(c#44), 7, )) :+- SubqueryAlias spark_catalog.default.t2 : +- Relation default.t2[v#45,c#46] parquet +- SubqueryAlias spark_catalog.default.t1 +- Relation default.t1[v#43,c#44] parquet scala> df.show() +---+ | v| +---+ | c| +---+ ``` This fix is lated to TPCDS q17; the query returns nothing because of this bug: https://github.com/apache/spark/pull/31886/files#r599333799 ### Why are the changes needed? Bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests added. Closes #31940 from maropu/FixCharPadding. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit 150769bcedb6e4a97596e0f04d686482cd09e92a) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/analysis/Analyzer.scala | 45 ++--- .../apache/spark/sql/CharVarcharTestSuite.scala| 57 -- 2 files changed, 79 insertions(+), 23 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index f4cdeab..d490845 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -3921,16 +3921,28 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = { plan.resolveOperatorsUp { - case operator if operator.resolved => operator.transformExpressionsUp { + case operator => operator.transformExpressionsUp { +case e if !e.childrenResolved => e + // String literal is treated as char type when it's compared to a char type column. // We should pad the shorter one to the longer length. case b @ BinaryComparison(attr: Attribute, lit) if lit.foldable => - padAttrLitCmp(attr, lit).map { newChildren => + padAttrLitCmp(attr, attr.metadata, lit).map { newChildren => b.withNewChildren(newChildren) }.getOrElse(b) case b @ BinaryComparison(lit, attr: Attribute) if lit.foldable => - padAttrLitCmp(attr, lit).map { newChildren => + padAttrLitCmp(attr, attr.metadata, lit).map { newChildren => +b.withNewChildren(newChildren.reverse)
[spark] branch master updated (6d88212 -> 658e95c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6d88212 [SPARK-34840][SHUFFLE] Fixes cases of corruption in merged shuffle … add 658e95c [SPARK-34833][SQL][FOLLOWUP] Handle outer references in all the places No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 67 +- 1 file changed, 41 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34833][SQL][FOLLOWUP] Handle outer references in all the places
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new f3c1298 [SPARK-34833][SQL][FOLLOWUP] Handle outer references in all the places f3c1298 is described below commit f3c129827986ba06c8a9ab00bd687e8d025103d1 Author: Wenchen Fan AuthorDate: Fri Mar 26 09:10:03 2021 +0900 [SPARK-34833][SQL][FOLLOWUP] Handle outer references in all the places ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/31940 . This PR generalizes the matching of attributes and outer references, so that outer references are handled everywhere. Note that, currently correlated subquery has a lot of limitations in Spark, and the newly covered cases are not possible to happen. So this PR is a code refactor. ### Why are the changes needed? code cleanup ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #31959 from cloud-fan/follow. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro (cherry picked from commit 658e95c345d5aa2a98b8d2a854e003a5c77ed581) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/analysis/Analyzer.scala | 67 +- 1 file changed, 41 insertions(+), 26 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index d490845..600a5af 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -3919,6 +3919,14 @@ object UpdateOuterReferences extends Rule[LogicalPlan] { */ object ApplyCharTypePadding extends Rule[LogicalPlan] { + object AttrOrOuterRef { +def unapply(e: Expression): Option[Attribute] = e match { + case a: Attribute => Some(a) + case OuterReference(a: Attribute) => Some(a) + case _ => None +} + } + override def apply(plan: LogicalPlan): LogicalPlan = { plan.resolveOperatorsUp { case operator => operator.transformExpressionsUp { @@ -3926,27 +3934,17 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { // String literal is treated as char type when it's compared to a char type column. // We should pad the shorter one to the longer length. -case b @ BinaryComparison(attr: Attribute, lit) if lit.foldable => - padAttrLitCmp(attr, attr.metadata, lit).map { newChildren => -b.withNewChildren(newChildren) - }.getOrElse(b) - -case b @ BinaryComparison(lit, attr: Attribute) if lit.foldable => - padAttrLitCmp(attr, attr.metadata, lit).map { newChildren => -b.withNewChildren(newChildren.reverse) - }.getOrElse(b) - -case b @ BinaryComparison(or @ OuterReference(attr: Attribute), lit) if lit.foldable => - padAttrLitCmp(or, attr.metadata, lit).map { newChildren => +case b @ BinaryComparison(e @ AttrOrOuterRef(attr), lit) if lit.foldable => + padAttrLitCmp(e, attr.metadata, lit).map { newChildren => b.withNewChildren(newChildren) }.getOrElse(b) -case b @ BinaryComparison(lit, or @ OuterReference(attr: Attribute)) if lit.foldable => - padAttrLitCmp(or, attr.metadata, lit).map { newChildren => +case b @ BinaryComparison(lit, e @ AttrOrOuterRef(attr)) if lit.foldable => + padAttrLitCmp(e, attr.metadata, lit).map { newChildren => b.withNewChildren(newChildren.reverse) }.getOrElse(b) -case i @ In(attr: Attribute, list) +case i @ In(e @ AttrOrOuterRef(attr), list) if attr.dataType == StringType && list.forall(_.foldable) => CharVarcharUtils.getRawType(attr.metadata).flatMap { case CharType(length) => @@ -3955,7 +3953,7 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { val literalCharLengths = literalChars.map(_.numChars()) val targetLen = (length +: literalCharLengths).max Some(i.copy( -value = addPadding(attr, length, targetLen), +value = addPadding(e, length, targetLen), list = list.zip(literalCharLengths).map { case (lit, charLength) => addPadding(lit, charLength, targetLen) } ++ nulls.map(Literal.create(_, StringType @@ -3963,19 +3961,36 @@ object ApplyCharTypePadding extends Rule[LogicalPlan] { }.getOrElse(i) // For char type colum
[spark] branch master updated (b2bfe98 -> fcef237)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b2bfe98 [SPARK-34845][CORE] ProcfsMetricsGetter shouldn't return partial procfs metrics add fcef237 [SPARK-34622][SQL] Push down limit through Project with Join No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/Optimizer.scala | 33 +- .../catalyst/optimizer/LimitPushdownSuite.scala| 9 ++ 2 files changed, 29 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3951e33 -> 90f2d4d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3951e33 [SPARK-34881][SQL] New SQL Function: TRY_CAST add 90f2d4d [SPARK-34882][SQL] Replace if with filter clause in RewriteDistinctAggregates No new revisions were added by this update. Summary of changes: .../optimizer/RewriteDistinctAggregates.scala | 47 ++ .../org/apache/spark/sql/DataFrameSuite.scala | 29 - 2 files changed, 49 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a9ca197 -> 39d5677)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a9ca197 [SPARK-34949][CORE] Prevent BlockManager reregister when Executor is shutting down add 39d5677 [SPARK-34932][SQL] deprecate GROUP BY ... GROUPING SETS (...) and promote GROUP BY GROUPING SETS (...) No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-groupby.md | 34 +++ .../spark/sql/catalyst/analysis/Analyzer.scala | 36 +++ .../spark/sql/catalyst/expressions/grouping.scala | 46 --- .../spark/sql/catalyst/parser/AstBuilder.scala | 13 +++--- .../analysis/ResolveGroupingAnalyticsSuite.scala | 51 +- .../sql/catalyst/parser/PlanParserSuite.scala | 2 +- 6 files changed, 72 insertions(+), 110 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39d5677 -> 7cfface)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39d5677 [SPARK-34932][SQL] deprecate GROUP BY ... GROUPING SETS (...) and promote GROUP BY GROUPING SETS (...) add 7cfface [SPARK-34935][SQL] CREATE TABLE LIKE should respect the reserved table properties No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 2 ++ .../scala/org/apache/spark/sql/execution/SparkSqlParser.scala | 3 ++- .../org/apache/spark/sql/execution/SparkSqlParserSuite.scala | 8 3 files changed, 12 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (390d5bd -> 7c8dc5e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 390d5bd [SPARK-34968][TEST][PYTHON] Add the `-fr` argument to xargs rm add 7c8dc5e [SPARK-34922][SQL] Use a relative cost comparison function in the CBO No new revisions were added by this update. Summary of changes: .../catalyst/optimizer/CostBasedJoinReorder.scala | 28 +- .../org/apache/spark/sql/internal/SQLConf.scala| 6 +- .../optimizer/joinReorder/JoinReorderSuite.scala | 3 - .../StarJoinCostBasedReorderSuite.scala| 9 +- .../approved-plans-modified/q73.sf100/explain.txt | 86 +-- .../q73.sf100/simplified.txt | 20 +- .../approved-plans-v1_4/q12.sf100/explain.txt | 178 +++ .../approved-plans-v1_4/q12.sf100/simplified.txt | 52 +- .../approved-plans-v1_4/q13.sf100/explain.txt | 134 ++--- .../approved-plans-v1_4/q13.sf100/simplified.txt | 38 +- .../approved-plans-v1_4/q18.sf100/explain.txt | 152 +++--- .../approved-plans-v1_4/q18.sf100/simplified.txt | 50 +- .../approved-plans-v1_4/q19.sf100/explain.txt | 376 ++--- .../approved-plans-v1_4/q19.sf100/simplified.txt | 118 ++--- .../approved-plans-v1_4/q20.sf100/explain.txt | 178 +++ .../approved-plans-v1_4/q20.sf100/simplified.txt | 52 +- .../approved-plans-v1_4/q24a.sf100/explain.txt | 116 ++-- .../approved-plans-v1_4/q24a.sf100/simplified.txt | 34 +- .../approved-plans-v1_4/q24b.sf100/explain.txt | 116 ++-- .../approved-plans-v1_4/q24b.sf100/simplified.txt | 34 +- .../approved-plans-v1_4/q25.sf100/explain.txt | 192 +++ .../approved-plans-v1_4/q25.sf100/simplified.txt | 138 ++--- .../approved-plans-v1_4/q33.sf100/explain.txt | 264 +- .../approved-plans-v1_4/q33.sf100/simplified.txt | 58 +- .../approved-plans-v1_4/q52.sf100/explain.txt | 146 +++--- .../approved-plans-v1_4/q52.sf100/simplified.txt | 30 +- .../approved-plans-v1_4/q55.sf100/explain.txt | 142 ++--- .../approved-plans-v1_4/q55.sf100/simplified.txt | 30 +- .../approved-plans-v1_4/q72.sf100/explain.txt | 326 ++-- .../approved-plans-v1_4/q72.sf100/simplified.txt | 154 +++--- .../approved-plans-v1_4/q81.sf100/explain.txt | 582 ++--- .../approved-plans-v1_4/q81.sf100/simplified.txt | 146 +++--- .../approved-plans-v1_4/q91.sf100/explain.txt | 312 +-- .../approved-plans-v1_4/q91.sf100/simplified.txt | 66 +-- .../approved-plans-v1_4/q98.sf100/explain.txt | 186 +++ .../approved-plans-v1_4/q98.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q12.sf100/explain.txt | 178 +++ .../approved-plans-v2_7/q12.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q18a.sf100/explain.txt | 172 +++--- .../approved-plans-v2_7/q18a.sf100/simplified.txt | 54 +- .../approved-plans-v2_7/q20.sf100/explain.txt | 178 +++ .../approved-plans-v2_7/q20.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q72.sf100/explain.txt | 326 ++-- .../approved-plans-v2_7/q72.sf100/simplified.txt | 154 +++--- .../approved-plans-v2_7/q98.sf100/explain.txt | 182 +++ .../approved-plans-v2_7/q98.sf100/simplified.txt | 52 +- 46 files changed, 3011 insertions(+), 2993 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (f6b5c6f -> 84d96e8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from f6b5c6f [SPARK-34970][SQL][SERCURITY][3.1] Redact map-type options in the output of explain() add 84d96e8 [SPARK-34922][SQL][3.1] Use a relative cost comparison function in the CBO No new revisions were added by this update. Summary of changes: .../catalyst/optimizer/CostBasedJoinReorder.scala | 28 +- .../org/apache/spark/sql/internal/SQLConf.scala| 6 +- .../optimizer/joinReorder/JoinReorderSuite.scala | 3 - .../StarJoinCostBasedReorderSuite.scala| 9 +- .../approved-plans-modified/q73.sf100/explain.txt | 8 +- .../approved-plans-v1_4/q12.sf100/explain.txt | 174 ++--- .../approved-plans-v1_4/q12.sf100/simplified.txt | 52 +- .../approved-plans-v1_4/q13.sf100/explain.txt | 138 ++-- .../approved-plans-v1_4/q13.sf100/simplified.txt | 34 +- .../approved-plans-v1_4/q18.sf100/explain.txt | 303 .../approved-plans-v1_4/q18.sf100/simplified.txt | 50 +- .../approved-plans-v1_4/q19.sf100/explain.txt | 368 - .../approved-plans-v1_4/q19.sf100/simplified.txt | 116 +-- .../approved-plans-v1_4/q20.sf100/explain.txt | 174 ++--- .../approved-plans-v1_4/q20.sf100/simplified.txt | 52 +- .../approved-plans-v1_4/q24a.sf100/explain.txt | 832 +++-- .../approved-plans-v1_4/q24a.sf100/simplified.txt | 34 +- .../approved-plans-v1_4/q24b.sf100/explain.txt | 832 +++-- .../approved-plans-v1_4/q24b.sf100/simplified.txt | 34 +- .../approved-plans-v1_4/q25.sf100/explain.txt | 186 ++--- .../approved-plans-v1_4/q25.sf100/simplified.txt | 130 ++-- .../approved-plans-v1_4/q33.sf100/explain.txt | 395 +- .../approved-plans-v1_4/q33.sf100/simplified.txt | 58 +- .../approved-plans-v1_4/q52.sf100/explain.txt | 138 ++-- .../approved-plans-v1_4/q52.sf100/simplified.txt | 26 +- .../approved-plans-v1_4/q55.sf100/explain.txt | 134 ++-- .../approved-plans-v1_4/q55.sf100/simplified.txt | 26 +- .../approved-plans-v1_4/q72.sf100/explain.txt | 260 +++ .../approved-plans-v1_4/q72.sf100/simplified.txt | 150 ++-- .../approved-plans-v1_4/q81.sf100/explain.txt | 570 +++--- .../approved-plans-v1_4/q81.sf100/simplified.txt | 142 ++-- .../approved-plans-v1_4/q91.sf100/explain.txt | 304 .../approved-plans-v1_4/q91.sf100/simplified.txt | 62 +- .../approved-plans-v1_4/q98.sf100/explain.txt | 182 ++--- .../approved-plans-v1_4/q98.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q12.sf100/explain.txt | 174 ++--- .../approved-plans-v2_7/q12.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q18a.sf100/explain.txt | 737 +- .../approved-plans-v2_7/q18a.sf100/simplified.txt | 54 +- .../approved-plans-v2_7/q20.sf100/explain.txt | 174 ++--- .../approved-plans-v2_7/q20.sf100/simplified.txt | 52 +- .../approved-plans-v2_7/q72.sf100/explain.txt | 260 +++ .../approved-plans-v2_7/q72.sf100/simplified.txt | 150 ++-- .../approved-plans-v2_7/q98.sf100/explain.txt | 178 ++--- .../approved-plans-v2_7/q98.sf100/simplified.txt | 52 +- 45 files changed, 4024 insertions(+), 3921 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b9ee41f [SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO b9ee41f is described below commit b9ee41fa9957631ca0f859ee928358c108fbd9a9 Author: Tanel Kiis AuthorDate: Thu Apr 8 11:03:59 2021 +0900 [SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO ### What changes were proposed in this pull request? Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes. ### Why are the changes needed? In #30965 we changed to CBO cost comparison function so it would be "symetric": `A.betterThan(B)` now implies, that `!B.betterThan(A)`. With that we caused a performance regressions in some queries - TPCDS q19 for example. The original cost comparison function used the ratios `relativeRows = A.rowCount / B.rowCount` and `relativeSize = A.size / B.size`. The changed function compared "absolute" cost values `costA = w*A.rowCount + (1-w)*A.size` and `costB = w*B.rowCount + (1-w)*B.size`. Given the input from wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios. Originally `A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1` was used. Besides being "non-symteric", this also can exhibit one overwhelming other. For `w=0.5` If `A` size (bytes) is at least 2x larger than `B`, then no matter how many times more rows does the `B` plan have, `B` will allways be considered to be better - `0.5*2 + 0.5*0.01 > 1`. When working with ratios, then it would be better to multiply them. The proposed cost comparison function is: `A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1`. ### Does this PR introduce _any_ user-facing change? Comparison of the changed TPCDS v1.4 query execution times at sf=10: | absolute | multiplicative | | additive | -- | -- | -- | -- | -- | -- q12 | 145 | 137 | -5.52% | 141 | -2.76% q13 | 264 | 271 | 2.65% | 271 | 2.65% q17 | 4521 | 4243 | -6.15% | 4348 | -3.83% q18 | 758 | 466 | -38.52% | 480 | -36.68% q19 | 38503 | 2167 | -94.37% | 2176 | -94.35% q20 | 119 | 120 | 0.84% | 126 | 5.88% q24a | 16429 | 16838 | 2.49% | 17103 | 4.10% q24b | 16592 | 16999 | 2.45% | 17268 | 4.07% q25 | 3558 | 3556 | -0.06% | 3675 | 3.29% q33 | 362 | 361 | -0.28% | 380 | 4.97% q52 | 1020 | 1032 | 1.18% | 1052 | 3.14% q55 | 927 | 938 | 1.19% | 961 | 3.67% q72 | 24169 | 13377 | -44.65% | 24306 | 0.57% q81 | 1285 | 1185 | -7.78% | 1168 | -9.11% q91 | 324 | 336 | 3.70% | 337 | 4.01% q98 | 126 | 129 | 2.38% | 131 | 3.97% All times are in ms, the change is compared to the situation in the master branch (absolute). The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved. ### How was this patch tested? PlanStabilitySuite Closes #32076 from tanelk/SPARK-34922_cbo_better_cost_function_3.0. Lead-authored-by: Tanel Kiis Co-authored-by: tanel.k...@gmail.com Signed-off-by: Takeshi Yamamuro --- .../catalyst/optimizer/CostBasedJoinReorder.scala | 28 ++ .../org/apache/spark/sql/internal/SQLConf.scala| 6 +++-- .../sql/catalyst/optimizer/JoinReorderSuite.scala | 3 --- .../optimizer/StarJoinCostBasedReorderSuite.scala | 9 +++ 4 files changed, 32 insertions(+), 14 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala index 93c608dc..ed7d92e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala @@ -343,12 +343,30 @@ object JoinReorderDP extends PredicateHelper with Logging { } } +/** + * To identify the plan with smaller computational cost, + * we use the weighted geometric mean of ratio of rows and the ratio of sizes in bytes. + * + * There are other ways to combine these values as a cost comparison function. + * Some of these, that we have experimented with, but have gotten worse result, + * than with the current one: + * 1) Weighted ar
[spark] branch master updated (9c1f807 -> 278203d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9c1f807 [SPARK-35031][PYTHON] Port Koalas operations on different frames tests into PySpark add 278203d [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 6 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 9 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 79 -- .../sql/catalyst/parser/PlanParserSuite.scala | 18 +- .../test/resources/sql-tests/inputs/transform.sql | 132 + .../resources/sql-tests/results/transform.sql.out | 316 - .../spark/sql/execution/SparkSqlParserSuite.scala | 164 +-- .../sql/execution/command/DDLParserSuite.scala | 14 +- 8 files changed, 662 insertions(+), 76 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (26f312e -> caf33be)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 26f312e [SPARK-35037][SQL] Recognize sign before the interval string in literals add caf33be [SPARK-33411][SQL] Cardinality estimation of union, sort and range operator No new revisions were added by this update. Summary of changes: .../plans/logical/LogicalPlanVisitor.scala | 3 + .../plans/logical/basicLogicalOperators.scala | 22 ++- .../statsEstimation/BasicStatsPlanVisitor.scala| 12 +- .../SizeInBytesOnlyStatsPlanVisitor.scala | 2 + .../logical/statsEstimation/UnionEstimation.scala | 120 + .../BasicStatsEstimationSuite.scala| 136 +-- .../statsEstimation/UnionEstimationSuite.scala | 194 + .../spark/sql/StatisticsCollectionSuite.scala | 4 +- 8 files changed, 473 insertions(+), 20 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/UnionEstimation.scala create mode 100644 sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/UnionEstimationSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12abfe7 -> 074f770)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12abfe7 [SPARK-34716][SQL] Support ANSI SQL intervals by the aggregate function `sum` add 074f770 [SPARK-35115][SQL][TESTS] Check ANSI intervals in `MutableProjectionSuite` No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/MutableProjectionSuite.scala | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (978cd0b -> fd08c93)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 978cd0b [SPARK-35092][UI] the auto-generated rdd's name in the storage tab should be truncated if it is too long add fd08c93 [SPARK-35109][SQL] Fix minor exception messages of HashedRelation and HashJoin No new revisions were added by this update. Summary of changes: .../apache/spark/sql/errors/QueryExecutionErrors.scala | 18 ++ .../spark/sql/execution/joins/HashedRelation.scala | 6 ++ 2 files changed, 8 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (034ba76 -> 5f48abe)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 034ba76 [SPARK-35080][SQL] Only allow a subset of correlated equality predicates when a subquery is aggregated add 5f48abe [SPARK-34639][SQL][3.1] RelationalGroupedDataset.alias should not create UnresolvedAlias No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala | 6 +- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala| 3 +++ 2 files changed, 4 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9af338c -> e503b9c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9af338c [SPARK-35078][SQL] Add tree traversal pruning in expression rules add e503b9c [SPARK-35201][SQL] Format empty grouping set exception in CUBE/ROLLUP No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 6 ++ .../main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala | 3 +++ 2 files changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-33976][SQL][DOCS][FOLLOWUP] Fix syntax error in select doc page
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 26a5e33 [SPARK-33976][SQL][DOCS][FOLLOWUP] Fix syntax error in select doc page 26a5e33 is described below commit 26a5e339a61ab06fb2949166db705f1b575addd3 Author: Angerszh AuthorDate: Wed Apr 28 16:47:02 2021 +0900 [SPARK-33976][SQL][DOCS][FOLLOWUP] Fix syntax error in select doc page ### What changes were proposed in this pull request? Add doc about `TRANSFORM` and related function. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #32257 from AngersZh/SPARK-33976-followup. Authored-by: Angerszh Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index 62a7f5f..500eda1 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -41,7 +41,7 @@ select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_stat While `select_statement` is defined as ```sql -SELECT [ hints , ... ] [ ALL | DISTINCT ] { [[ named_expression | regex_column_names ] [ , ... ] | TRANSFORM (...)) ] } +SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ [ named_expression | regex_column_names ] [ , ... ] | TRANSFORM (...) ] } FROM { from_item [ , ... ] } [ PIVOT clause ] [ LATERAL VIEW clause ] [ ... ] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (e58055b -> 361e684)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from e58055b [SPARK-35244][SQL] Invoke should throw the original exception add 361e684 [SPARK-33976][SQL][DOCS][3.1] Add a SQL doc page for a TRANSFORM clause No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml| 2 + docs/sql-ref-syntax-qry-select-transform.md | 235 docs/sql-ref-syntax-qry-select.md | 7 +- docs/sql-ref-syntax-qry.md | 1 + docs/sql-ref-syntax.md | 1 + 5 files changed, 245 insertions(+), 1 deletion(-) create mode 100644 docs/sql-ref-syntax-qry-select-transform.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (6e83789b -> a556bc8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 6e83789b [SPARK-35244][SQL] Invoke should throw the original exception add a556bc8 [SPARK-33976][SQL][DOCS][3.0] Add a SQL doc page for a TRANSFORM clause No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml| 2 + docs/sql-ref-syntax-qry-select-transform.md | 235 docs/sql-ref-syntax-qry-select.md | 7 +- docs/sql-ref-syntax-qry.md | 1 + docs/sql-ref-syntax.md | 1 + 5 files changed, 245 insertions(+), 1 deletion(-) create mode 100644 docs/sql-ref-syntax-qry-select-transform.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (26a5e33 -> 8b62c29)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 26a5e33 [SPARK-33976][SQL][DOCS][FOLLOWUP] Fix syntax error in select doc page add 8b62c29 [SPARK-35214][SQL] OptimizeSkewedJoin support ShuffledHashJoinExec No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 4 +- .../execution/adaptive/OptimizeSkewedJoin.scala| 189 - .../execution/exchange/EnsureRequirements.scala| 9 +- .../sql/execution/joins/ShuffledHashJoinExec.scala | 3 +- .../spark/sql/execution/joins/ShuffledJoin.scala | 18 +- .../sql/execution/joins/SortMergeJoinExec.scala| 17 -- .../adaptive/AdaptiveQueryExecSuite.scala | 130 +++--- 7 files changed, 204 insertions(+), 166 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (361e684 -> db8204e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 361e684 [SPARK-33976][SQL][DOCS][3.1] Add a SQL doc page for a TRANSFORM clause add db8204e [SPARK-35159][SQL][DOCS][3.1] Extract hive format doc No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 52 +-- docs/sql-ref-syntax-hive-format.md | 73 ++ docs/sql-ref-syntax-qry-select-transform.md| 48 +- 3 files changed, 77 insertions(+), 96 deletions(-) create mode 100644 docs/sql-ref-syntax-hive-format.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (a556bc8 -> c6659e6)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from a556bc8 [SPARK-33976][SQL][DOCS][3.0] Add a SQL doc page for a TRANSFORM clause add c6659e6 [SPARK-35159][SQL][DOCS][3.0] Extract hive format doc No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 52 +-- docs/sql-ref-syntax-hive-format.md | 73 ++ docs/sql-ref-syntax-qry-select-transform.md| 48 +- 3 files changed, 77 insertions(+), 96 deletions(-) create mode 100644 docs/sql-ref-syntax-hive-format.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (86d3bb5 -> 403e479)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 86d3bb5 [SPARK-34981][SQL] Implement V2 function resolution and evaluation add 403e479 [SPARK-35244][SQL][FOLLOWUP] Add null check for the exception cause No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/expressions/objects/objects.scala| 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (caa46ce -> cd689c9)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from caa46ce [SPARK-35112][SQL] Support Cast string to day-second interval add cd689c9 [SPARK-35192][SQL][TESTS] Port minimal TPC-DS datagen code from databricks/spark-sql-perf No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 31 +- .../scala/org/apache/spark/sql/GenTPCDSData.scala | 445 + .../scala/org/apache/spark/sql/TPCDSBase.scala | 537 + .../sql/{TPCDSBase.scala => TPCDSSchema.scala} | 92 +--- 4 files changed, 466 insertions(+), 639 deletions(-) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala copy sql/core/src/test/scala/org/apache/spark/sql/{TPCDSBase.scala => TPCDSSchema.scala} (83%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fd3f8f -> f550e03)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fd3f8f [SPARK-35294][SQL] Add tree traversal pruning in rules with dedicated files under optimizer add f550e03 [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions No new revisions were added by this update. Summary of changes: .../expressions/higherOrderFunctions.scala | 12 ++- .../scala/org/apache/spark/sql/functions.scala | 12 +-- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 23 ++ 3 files changed, 40 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 6df4ec0 [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions 6df4ec0 is described below commit 6df4ec09a17077c2a0b114a7bf5736711ba268e4 Author: dsolow AuthorDate: Wed May 5 12:46:13 2021 +0900 [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions ### What changes were proposed in this pull request? To fix lambda variable name issues in nested DataFrame functions, this PR modifies code to use a global counter for `LambdaVariables` names created by higher order functions. This is the rework of #31887. Closes #31887. ### Why are the changes needed? This moves away from the current hard-coded variable names which break on nested function calls. There is currently a bug where nested transforms in particular fail (the inner variable shadows the outer variable) For this query: ``` val df = Seq( (Seq(1,2,3), Seq("a", "b", "c")) ).toDF("numbers", "letters") df.select( f.flatten( f.transform( $"numbers", (number: Column) => { f.transform( $"letters", (letter: Column) => { f.struct( number.as("number"), letter.as("letter") ) } ) } ) ).as("zipped") ).show(10, false) ``` This is the current (incorrect) output: ``` ++ |zipped | ++ |[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]| ++ ``` And this is the correct output after fix: ``` ++ |zipped | ++ |[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]| ++ ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added the new test in `DataFrameFunctionsSuite`. Closes #32424 from maropu/pr31887. Lead-authored-by: dsolow Co-authored-by: Takeshi Yamamuro Co-authored-by: dmsolow Signed-off-by: Takeshi Yamamuro (cherry picked from commit f550e03b96638de93381734c4eada2ace02d9a4f) Signed-off-by: Takeshi Yamamuro --- .../expressions/higherOrderFunctions.scala | 12 ++- .../scala/org/apache/spark/sql/functions.scala | 12 +-- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 23 ++ 3 files changed, 40 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala index ba447ea..a4e069d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql.catalyst.expressions import java.util.Comparator -import java.util.concurrent.atomic.AtomicReference +import java.util.concurrent.atomic.{AtomicInteger, AtomicReference} import scala.collection.mutable @@ -52,6 +52,16 @@ case class UnresolvedNamedLambdaVariable(nameParts: Seq[String]) override def sql: String = name } +object UnresolvedNamedLambdaVariable { + + // Counter to ensure lambda variable names are unique + private val nextVarNameId = new AtomicInteger(0) + + def freshVarName(name: String): String = { +s"${name}_${nextVarNameId.getAndIncrement()}" + } +} + /** * A named lambda variable. */ diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index e6b41cd..6bc49b6 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -3644,22 +3644,22 @@ object functions { } private def createLambda(f: Column
[spark] branch branch-3.0 updated: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8ef4023 [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions 8ef4023 is described below commit 8ef4023683dee537a40d376d93c329a802a929bd Author: dsolow AuthorDate: Wed May 5 12:46:13 2021 +0900 [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions ### What changes were proposed in this pull request? To fix lambda variable name issues in nested DataFrame functions, this PR modifies code to use a global counter for `LambdaVariables` names created by higher order functions. This is the rework of #31887. Closes #31887. ### Why are the changes needed? This moves away from the current hard-coded variable names which break on nested function calls. There is currently a bug where nested transforms in particular fail (the inner variable shadows the outer variable) For this query: ``` val df = Seq( (Seq(1,2,3), Seq("a", "b", "c")) ).toDF("numbers", "letters") df.select( f.flatten( f.transform( $"numbers", (number: Column) => { f.transform( $"letters", (letter: Column) => { f.struct( number.as("number"), letter.as("letter") ) } ) } ) ).as("zipped") ).show(10, false) ``` This is the current (incorrect) output: ``` ++ |zipped | ++ |[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]| ++ ``` And this is the correct output after fix: ``` ++ |zipped | ++ |[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]| ++ ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added the new test in `DataFrameFunctionsSuite`. Closes #32424 from maropu/pr31887. Lead-authored-by: dsolow Co-authored-by: Takeshi Yamamuro Co-authored-by: dmsolow Signed-off-by: Takeshi Yamamuro (cherry picked from commit f550e03b96638de93381734c4eada2ace02d9a4f) Signed-off-by: Takeshi Yamamuro --- .../expressions/higherOrderFunctions.scala | 12 ++- .../scala/org/apache/spark/sql/functions.scala | 12 +-- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 23 ++ 3 files changed, 40 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala index e5cf8c0..a530ce5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala @@ -18,7 +18,7 @@ package org.apache.spark.sql.catalyst.expressions import java.util.Comparator -import java.util.concurrent.atomic.AtomicReference +import java.util.concurrent.atomic.{AtomicInteger, AtomicReference} import scala.collection.mutable @@ -52,6 +52,16 @@ case class UnresolvedNamedLambdaVariable(nameParts: Seq[String]) override def sql: String = name } +object UnresolvedNamedLambdaVariable { + + // Counter to ensure lambda variable names are unique + private val nextVarNameId = new AtomicInteger(0) + + def freshVarName(name: String): String = { +s"${name}_${nextVarNameId.getAndIncrement()}" + } +} + /** * A named lambda variable. */ diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index bb77c7e..f6d6200 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -3489,22 +3489,22 @@ object functions { } private def createLambda(f: Column
[spark] 01/09: Update docs to reflect alternative key value notation
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit c685abe33681fcbf0bfa6aa86ba229f19e4d451f Author: Niklas Riekenbrauck AuthorDate: Fri Mar 19 14:53:53 2021 +0100 Update docs to reflect alternative key value notation --- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index ba0516a..82d3a09 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS ( key1=val1, key2=val2, ... ) ] +[ OPTIONS [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ] [ LOCATION path ] [ COMMENT table_comment ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ AS select_statement ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 04/09: Update to eaasier KV syntax
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 1d157ed7209b355294b8e07e672ba8b5916e93f5 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:17:26 2021 +0100 Update to eaasier KV syntax --- docs/sql-ref-syntax-ddl-alter-database.md | 2 +- docs/sql-ref-syntax-ddl-alter-table.md | 8 docs/sql-ref-syntax-ddl-alter-view.md | 2 +- docs/sql-ref-syntax-ddl-create-database.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 7 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-database.md b/docs/sql-ref-syntax-ddl-alter-database.md index fbc454e..2de9675 100644 --- a/docs/sql-ref-syntax-ddl-alter-database.md +++ b/docs/sql-ref-syntax-ddl-alter-database.md @@ -31,7 +31,7 @@ for a database and may be used for auditing purposes. ```sql ALTER { DATABASE | SCHEMA } database_name -SET DBPROPERTIES ( property_name = property_value [ , ... ] ) +SET DBPROPERTIES ( ( property_name [=] property_value [ , ... ] | ( property_name property_value [ , ... ] ) ``` ### Parameters diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 2d42eb4..912de0f 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -169,7 +169,7 @@ this overrides the old value with the new one. ```sql -- Set Table Properties -ALTER TABLE table_identifier SET TBLPROPERTIES ( key1 = val1, key2 = val2, ... ) +ALTER TABLE table_identifier SET TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ) -- Unset Table Properties ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ) @@ -184,10 +184,10 @@ ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ```sql -- Set SERDE Properties ALTER TABLE table_identifier [ partition_spec ] -SET SERDEPROPERTIES ( key1 = val1, key2 = val2, ... ) +SET SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ALTER TABLE table_identifier [ partition_spec ] SET SERDE serde_class_name -[ WITH SERDEPROPERTIES ( key1 = val1, key2 = val2, ... ) ] +[ WITH SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] ``` SET LOCATION And SET FILE FORMAT @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( key1 = val1, key2 = val2, ... )** +* **SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ** Specifies the SERDE properties to be set. diff --git a/docs/sql-ref-syntax-ddl-alter-view.md b/docs/sql-ref-syntax-ddl-alter-view.md index d69f246..25280c4 100644 --- a/docs/sql-ref-syntax-ddl-alter-view.md +++ b/docs/sql-ref-syntax-ddl-alter-view.md @@ -49,7 +49,7 @@ the properties. Syntax ```sql -ALTER VIEW view_identifier SET TBLPROPERTIES ( property_key = property_val [ , ... ] ) +ALTER VIEW view_identifier SET TBLPROPERTIES ( property_key [=] property_val [ , ... ] ) ``` Parameters diff --git a/docs/sql-ref-syntax-ddl-create-database.md b/docs/sql-ref-syntax-ddl-create-database.md index 9d8bf47..7db410e 100644 --- a/docs/sql-ref-syntax-ddl-create-database.md +++ b/docs/sql-ref-syntax-ddl-create-database.md @@ -29,7 +29,7 @@ Creates a database with the specified name. If database with the same name alrea CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name [ COMMENT database_comment ] [ LOCATION database_directory ] -[ WITH DBPROPERTIES ( property_name = property_value [ , ... ] ) ] +[ WITH DBPROPERTIES ( property_name [=] property_value [ , ... ] ) ] ``` ### Parameters @@ -50,7 +50,7 @@ CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name Specifies the description for the database. -* **WITH DBPROPERTIES ( property_name=property_value [ , ... ] )** +* **WITH DBPROPERTIES ( property_name [=] property_value [ , ... ] )** Specifies the properties for the database in key-value pairs. diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index 9926bc6..7d8e692 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS ( ( key1=val1, key2=val2, ... ) | ( key1 val1,
[spark] 07/09: Remove unnecessary change
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 42cd52e297b141a8b837a8315ca4c84a5ffc3def Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:27:52 2021 +0100 Remove unnecessary change --- docs/sql-ref-syntax-ddl-alter-database.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-alter-database.md b/docs/sql-ref-syntax-ddl-alter-database.md index 2de9675..6ac6863 100644 --- a/docs/sql-ref-syntax-ddl-alter-database.md +++ b/docs/sql-ref-syntax-ddl-alter-database.md @@ -31,7 +31,7 @@ for a database and may be used for auditing purposes. ```sql ALTER { DATABASE | SCHEMA } database_name -SET DBPROPERTIES ( ( property_name [=] property_value [ , ... ] | ( property_name property_value [ , ... ] ) +SET DBPROPERTIES ( property_name [=] property_value [ , ... ] ) ``` ### Parameters - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch pr31899 created (now 0c4e71e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git. at 0c4e71e Fix This branch includes the following new commits: new c685abe Update docs to reflect alternative key value notation new 2ff9703 Update docs other create table docs new 8245a55 Fix alternatives with subrule grammar new 1d157ed Update to eaasier KV syntax new 83ec2ee Commit missing doc updates new fff449b Some more fixes new 42cd52e Remove unnecessary change new 2ebb2aa remove space new 0c4e71e Fix The 9 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/09: Update docs other create table docs
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 2ff970350427835a7b7f7f9d0ec7bc8f1049f7fd Author: Niklas Riekenbrauck AuthorDate: Fri Mar 19 15:17:48 2021 +0100 Update docs other create table docs --- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index b2f5957..63880d5 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-like.md b/docs/sql-ref-syntax-ddl-create-table-like.md index cfb959c..a374296a 100644 --- a/docs/sql-ref-syntax-ddl-create-table-like.md +++ b/docs/sql-ref-syntax-ddl-create-table-like.md @@ -30,7 +30,7 @@ CREATE TABLE [IF NOT EXISTS] table_identifier LIKE source_table_identifier USING data_source [ ROW FORMAT row_format ] [ STORED AS file_format ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ LOCATION path ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 03/09: Fix alternatives with subrule grammar
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 8245a55dd1092fe9ef3fbcacb5cf07d1888ac23a Author: Niklas Riekenbrauck AuthorDate: Sat Mar 20 13:00:08 2021 +0100 Fix alternatives with subrule grammar --- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index 82d3a09..9926bc6 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ OPTIONS ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ] [ LOCATION path ] [ COMMENT table_comment ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 63880d5..2e05e64 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-like.md b/docs/sql-ref-syntax-ddl-create-table-like.md index a374296a..772b299 100644 --- a/docs/sql-ref-syntax-ddl-create-table-like.md +++ b/docs/sql-ref-syntax-ddl-create-table-like.md @@ -30,7 +30,7 @@ CREATE TABLE [IF NOT EXISTS] table_identifier LIKE source_table_identifier USING data_source [ ROW FORMAT row_format ] [ STORED AS file_format ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ LOCATION path ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 05/09: Commit missing doc updates
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 83ec2ee71751142220464ea54ffc6e47ccc35ad4 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:19:27 2021 +0100 Commit missing doc updates --- docs/sql-ref-syntax-ddl-alter-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 912de0f..866b596 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -184,10 +184,10 @@ ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ```sql -- Set SERDE Properties ALTER TABLE table_identifier [ partition_spec ] -SET SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) +SET SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ALTER TABLE table_identifier [ partition_spec ] SET SERDE serde_class_name -[ WITH SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] +[ WITH SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ] ``` SET LOCATION And SET FILE FORMAT @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ** +* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** Specifies the SERDE properties to be set. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 06/09: Some more fixes
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit fff449bd54f2204d7cfc7a5fcf5c8877aa37a992 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:22:11 2021 +0100 Some more fixes --- docs/sql-ref-syntax-ddl-alter-table.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 866b596..915ccf8 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -169,7 +169,7 @@ this overrides the old value with the new one. ```sql -- Set Table Properties -ALTER TABLE table_identifier SET TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ) +ALTER TABLE table_identifier SET TBLPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) -- Unset Table Properties ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ) @@ -219,7 +219,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' Specifies the partition on which the property has to be set. Note that one can use a typed literal (e.g., date'2019-01-02') in the partition spec. -**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` +**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` * **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 48d089d..3231b66 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ] +[ TBLPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ] [ AS select_statement ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 08/09: remove space
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 2ebb2aac7a0c87c929e72bc7c8c080096c55a8f1 Author: Niklas Riekenbrauck AuthorDate: Tue Mar 30 13:20:32 2021 +0200 remove space --- docs/sql-ref-syntax-ddl-alter-table.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 915ccf8..ae40fe4 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** +* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... )** Specifies the SERDE properties to be set. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 09/09: Fix
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pr31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 0c4e71e00129bcffe933a8cceffab7cf51cf33ce Author: Takeshi Yamamuro AuthorDate: Thu May 6 10:14:25 2021 +0900 Fix --- docs/sql-ref-syntax-hive-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-hive-format.md b/docs/sql-ref-syntax-hive-format.md index 8092e58..01b8d3f 100644 --- a/docs/sql-ref-syntax-hive-format.md +++ b/docs/sql-ref-syntax-hive-format.md @@ -30,7 +30,7 @@ There are two ways to define a row format in `row_format` of `CREATE TABLE` and ```sql row_format: -SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ] +SERDE serde_class [ WITH SERDEPROPERTIES (k1 [=] v1, k2 [=] v2, ... ) ] | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] [ MAP KEYS TERMINATED BY map_key_terminated_char ] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 04/09: Update to eaasier KV syntax
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 1d157ed7209b355294b8e07e672ba8b5916e93f5 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:17:26 2021 +0100 Update to eaasier KV syntax --- docs/sql-ref-syntax-ddl-alter-database.md | 2 +- docs/sql-ref-syntax-ddl-alter-table.md | 8 docs/sql-ref-syntax-ddl-alter-view.md | 2 +- docs/sql-ref-syntax-ddl-create-database.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 7 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-database.md b/docs/sql-ref-syntax-ddl-alter-database.md index fbc454e..2de9675 100644 --- a/docs/sql-ref-syntax-ddl-alter-database.md +++ b/docs/sql-ref-syntax-ddl-alter-database.md @@ -31,7 +31,7 @@ for a database and may be used for auditing purposes. ```sql ALTER { DATABASE | SCHEMA } database_name -SET DBPROPERTIES ( property_name = property_value [ , ... ] ) +SET DBPROPERTIES ( ( property_name [=] property_value [ , ... ] | ( property_name property_value [ , ... ] ) ``` ### Parameters diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 2d42eb4..912de0f 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -169,7 +169,7 @@ this overrides the old value with the new one. ```sql -- Set Table Properties -ALTER TABLE table_identifier SET TBLPROPERTIES ( key1 = val1, key2 = val2, ... ) +ALTER TABLE table_identifier SET TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ) -- Unset Table Properties ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ) @@ -184,10 +184,10 @@ ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ```sql -- Set SERDE Properties ALTER TABLE table_identifier [ partition_spec ] -SET SERDEPROPERTIES ( key1 = val1, key2 = val2, ... ) +SET SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ALTER TABLE table_identifier [ partition_spec ] SET SERDE serde_class_name -[ WITH SERDEPROPERTIES ( key1 = val1, key2 = val2, ... ) ] +[ WITH SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] ``` SET LOCATION And SET FILE FORMAT @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( key1 = val1, key2 = val2, ... )** +* **SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ** Specifies the SERDE properties to be set. diff --git a/docs/sql-ref-syntax-ddl-alter-view.md b/docs/sql-ref-syntax-ddl-alter-view.md index d69f246..25280c4 100644 --- a/docs/sql-ref-syntax-ddl-alter-view.md +++ b/docs/sql-ref-syntax-ddl-alter-view.md @@ -49,7 +49,7 @@ the properties. Syntax ```sql -ALTER VIEW view_identifier SET TBLPROPERTIES ( property_key = property_val [ , ... ] ) +ALTER VIEW view_identifier SET TBLPROPERTIES ( property_key [=] property_val [ , ... ] ) ``` Parameters diff --git a/docs/sql-ref-syntax-ddl-create-database.md b/docs/sql-ref-syntax-ddl-create-database.md index 9d8bf47..7db410e 100644 --- a/docs/sql-ref-syntax-ddl-create-database.md +++ b/docs/sql-ref-syntax-ddl-create-database.md @@ -29,7 +29,7 @@ Creates a database with the specified name. If database with the same name alrea CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name [ COMMENT database_comment ] [ LOCATION database_directory ] -[ WITH DBPROPERTIES ( property_name = property_value [ , ... ] ) ] +[ WITH DBPROPERTIES ( property_name [=] property_value [ , ... ] ) ] ``` ### Parameters @@ -50,7 +50,7 @@ CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] database_name Specifies the description for the database. -* **WITH DBPROPERTIES ( property_name=property_value [ , ... ] )** +* **WITH DBPROPERTIES ( property_name [=] property_value [ , ... ] )** Specifies the properties for the database in key-value pairs. diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index 9926bc6..7d8e692 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS ( ( key1=val1, key2=val2, ... ) | ( key1 val1,
[spark] 01/09: Update docs to reflect alternative key value notation
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit c685abe33681fcbf0bfa6aa86ba229f19e4d451f Author: Niklas Riekenbrauck AuthorDate: Fri Mar 19 14:53:53 2021 +0100 Update docs to reflect alternative key value notation --- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index ba0516a..82d3a09 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS ( key1=val1, key2=val2, ... ) ] +[ OPTIONS [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ] [ LOCATION path ] [ COMMENT table_comment ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ AS select_statement ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 03/09: Fix alternatives with subrule grammar
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 8245a55dd1092fe9ef3fbcacb5cf07d1888ac23a Author: Niklas Riekenbrauck AuthorDate: Sat Mar 20 13:00:08 2021 +0100 Fix alternatives with subrule grammar --- docs/sql-ref-syntax-ddl-create-table-datasource.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index 82d3a09..9926bc6 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -29,14 +29,14 @@ The `CREATE TABLE` statement defines a new table using a Data Source. CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ] USING data_source -[ OPTIONS [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ OPTIONS ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ] [ LOCATION path ] [ COMMENT table_comment ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 63880d5..2e05e64 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-like.md b/docs/sql-ref-syntax-ddl-create-table-like.md index a374296a..772b299 100644 --- a/docs/sql-ref-syntax-ddl-create-table-like.md +++ b/docs/sql-ref-syntax-ddl-create-table-like.md @@ -30,7 +30,7 @@ CREATE TABLE [IF NOT EXISTS] table_identifier LIKE source_table_identifier USING data_source [ ROW FORMAT row_format ] [ STORED AS file_format ] -[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] +[ TBLPROPERTIES ( ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] [ LOCATION path ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/09: Update docs other create table docs
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 2ff970350427835a7b7f7f9d0ec7bc8f1049f7fd Author: Niklas Riekenbrauck AuthorDate: Fri Mar 19 15:17:48 2021 +0100 Update docs other create table docs --- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- docs/sql-ref-syntax-ddl-create-table-like.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index b2f5957..63880d5 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ AS select_statement ] ``` diff --git a/docs/sql-ref-syntax-ddl-create-table-like.md b/docs/sql-ref-syntax-ddl-create-table-like.md index cfb959c..a374296a 100644 --- a/docs/sql-ref-syntax-ddl-create-table-like.md +++ b/docs/sql-ref-syntax-ddl-create-table-like.md @@ -30,7 +30,7 @@ CREATE TABLE [IF NOT EXISTS] table_identifier LIKE source_table_identifier USING data_source [ ROW FORMAT row_format ] [ STORED AS file_format ] -[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] +[ TBLPROPERTIES [ ( key1=val1, key2=val2, ... ) | ( key1 val1, key2 val2, ... ) ] ] [ LOCATION path ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch pull/31899 created (now 0c4e71e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git. at 0c4e71e Fix This branch includes the following new commits: new c685abe Update docs to reflect alternative key value notation new 2ff9703 Update docs other create table docs new 8245a55 Fix alternatives with subrule grammar new 1d157ed Update to eaasier KV syntax new 83ec2ee Commit missing doc updates new fff449b Some more fixes new 42cd52e Remove unnecessary change new 2ebb2aa remove space new 0c4e71e Fix The 9 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 06/09: Some more fixes
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit fff449bd54f2204d7cfc7a5fcf5c8877aa37a992 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:22:11 2021 +0100 Some more fixes --- docs/sql-ref-syntax-ddl-alter-table.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 866b596..915ccf8 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -169,7 +169,7 @@ this overrides the old value with the new one. ```sql -- Set Table Properties -ALTER TABLE table_identifier SET TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ) +ALTER TABLE table_identifier SET TBLPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) -- Unset Table Properties ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ) @@ -219,7 +219,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' Specifies the partition on which the property has to be set. Note that one can use a typed literal (e.g., date'2019-01-02') in the partition spec. -**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` +**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` * **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 48d089d..3231b66 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -37,7 +37,7 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] -[ TBLPROPERTIES ( ( key1 [=] val1, key2 [=] val2, ... ) ] +[ TBLPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ] [ AS select_statement ] ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 09/09: Fix
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 0c4e71e00129bcffe933a8cceffab7cf51cf33ce Author: Takeshi Yamamuro AuthorDate: Thu May 6 10:14:25 2021 +0900 Fix --- docs/sql-ref-syntax-hive-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-hive-format.md b/docs/sql-ref-syntax-hive-format.md index 8092e58..01b8d3f 100644 --- a/docs/sql-ref-syntax-hive-format.md +++ b/docs/sql-ref-syntax-hive-format.md @@ -30,7 +30,7 @@ There are two ways to define a row format in `row_format` of `CREATE TABLE` and ```sql row_format: -SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ] +SERDE serde_class [ WITH SERDEPROPERTIES (k1 [=] v1, k2 [=] v2, ... ) ] | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] [ MAP KEYS TERMINATED BY map_key_terminated_char ] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 08/09: remove space
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 2ebb2aac7a0c87c929e72bc7c8c080096c55a8f1 Author: Niklas Riekenbrauck AuthorDate: Tue Mar 30 13:20:32 2021 +0200 remove space --- docs/sql-ref-syntax-ddl-alter-table.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 915ccf8..ae40fe4 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** +* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... )** Specifies the SERDE properties to be set. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 05/09: Commit missing doc updates
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 83ec2ee71751142220464ea54ffc6e47ccc35ad4 Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:19:27 2021 +0100 Commit missing doc updates --- docs/sql-ref-syntax-ddl-alter-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index 912de0f..866b596 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -184,10 +184,10 @@ ALTER TABLE table_identifier UNSET TBLPROPERTIES [ IF EXISTS ] ( key1, key2, ... ```sql -- Set SERDE Properties ALTER TABLE table_identifier [ partition_spec ] -SET SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) +SET SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ALTER TABLE table_identifier [ partition_spec ] SET SERDE serde_class_name -[ WITH SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ] +[ WITH SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ] ``` SET LOCATION And SET FILE FORMAT @@ -221,7 +221,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location' **Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )` -* **SERDEPROPERTIES ( ( key1 = val1, key2 = val2, ... ) | ( key1 val1, key2 val2, ... ) ) ** +* **SERDEPROPERTIES ( key1 [=] val1, key2 [=] val2, ... ) ** Specifies the SERDE properties to be set. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 07/09: Remove unnecessary change
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch pull/31899 in repository https://gitbox.apache.org/repos/asf/spark.git commit 42cd52e297b141a8b837a8315ca4c84a5ffc3def Author: Niklas Riekenbrauck AuthorDate: Sat Mar 27 15:27:52 2021 +0100 Remove unnecessary change --- docs/sql-ref-syntax-ddl-alter-database.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-alter-database.md b/docs/sql-ref-syntax-ddl-alter-database.md index 2de9675..6ac6863 100644 --- a/docs/sql-ref-syntax-ddl-alter-database.md +++ b/docs/sql-ref-syntax-ddl-alter-database.md @@ -31,7 +31,7 @@ for a database and may be used for auditing purposes. ```sql ALTER { DATABASE | SCHEMA } database_name -SET DBPROPERTIES ( ( property_name [=] property_value [ , ... ] | ( property_name property_value [ , ... ] ) +SET DBPROPERTIES ( property_name [=] property_value [ , ... ] ) ``` ### Parameters - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (19661f6 -> 5c67d0c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 19661f6 [SPARK-35325][SQL][TESTS] Add nested column ORC encryption test case add 5c67d0c [SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml |6 +- .../resources/tpcds-query-results/v1_4/q1.sql.out | 184 +- .../resources/tpcds-query-results/v1_4/q10.sql.out | 11 +- .../resources/tpcds-query-results/v1_4/q11.sql.out |6 + .../resources/tpcds-query-results/v1_4/q12.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q13.sql.out |2 +- .../tpcds-query-results/v1_4/q14a.sql.out | 200 +- .../tpcds-query-results/v1_4/q14b.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q15.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q16.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q17.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q18.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q19.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q2.sql.out | 5026 +-- .../resources/tpcds-query-results/v1_4/q20.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q21.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q22.sql.out | 200 +- .../tpcds-query-results/v1_4/q23a.sql.out |2 +- .../tpcds-query-results/v1_4/q23b.sql.out |5 +- .../tpcds-query-results/v1_4/q24a.sql.out |8 +- .../tpcds-query-results/v1_4/q24b.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q25.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q26.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q27.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q28.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q29.sql.out |3 +- .../resources/tpcds-query-results/v1_4/q3.sql.out | 172 +- .../resources/tpcds-query-results/v1_4/q30.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q31.sql.out | 112 +- .../resources/tpcds-query-results/v1_4/q32.sql.out |2 - .../resources/tpcds-query-results/v1_4/q33.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q34.sql.out | 434 +- .../resources/tpcds-query-results/v1_4/q35.sql.out | 188 +- .../resources/tpcds-query-results/v1_4/q36.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q37.sql.out |3 +- .../resources/tpcds-query-results/v1_4/q38.sql.out |2 +- .../tpcds-query-results/v1_4/q39a.sql.out | 449 +- .../tpcds-query-results/v1_4/q39b.sql.out | 24 +- .../resources/tpcds-query-results/v1_4/q4.sql.out | 10 +- .../resources/tpcds-query-results/v1_4/q40.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q41.sql.out |9 +- .../resources/tpcds-query-results/v1_4/q42.sql.out | 21 +- .../resources/tpcds-query-results/v1_4/q43.sql.out | 12 +- .../resources/tpcds-query-results/v1_4/q44.sql.out | 20 +- .../resources/tpcds-query-results/v1_4/q45.sql.out | 39 +- .../resources/tpcds-query-results/v1_4/q46.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q47.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q48.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q49.sql.out | 64 +- .../resources/tpcds-query-results/v1_4/q5.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q50.sql.out | 12 +- .../resources/tpcds-query-results/v1_4/q51.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q52.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q53.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q54.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q55.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q56.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q57.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q58.sql.out |4 +- .../resources/tpcds-query-results/v1_4/q59.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q6.sql.out | 91 +- .../resources/tpcds-query-results/v1_4/q60.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q61.sql.out |2 +- .../resources/tpcds-query-results/v1_4/q62.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q63.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q64.sql.out | 19 +- .../resources/tpcds-query-results/v1_4/q65.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q66.sql.out | 10 +- .../resources/tpcds-query-results/v1_4/q67.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q68.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q69.sql.out | 182 +- .../resources/tpcds-query-results/v1_4/q7.sql.out | 200 +- .../resources/tpcds-query-results/v1_4/q70.sql.out |6 +- .../resources/tpcds-query-results/v1_4
[spark] branch master updated (2634dba -> 6f0ef93)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2634dba [SPARK-35175][BUILD] Add linter for JavaScript source files add 6f0ef93 [SPARK-35297][CORE][DOC][MINOR] Modify the comment about the executor No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/executor/Executor.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b025780 -> 06c4009)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b025780 [SPARK-35331][SQL] Support resolving missing attrs for distribute/cluster by/repartition hint add 06c4009 [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results No new revisions were added by this update. Summary of changes: .../resources/tpcds-query-results/v1_4/q6.sql.out | 51 -- .../resources/tpcds-query-results/v1_4/q75.sql.out | 105 - .../scala/org/apache/spark/sql/TPCDSBase.scala | 2 +- .../org/apache/spark/sql/TPCDSQueryTestSuite.scala | 6 ++ 4 files changed, 7 insertions(+), 157 deletions(-) delete mode 100644 sql/core/src/test/resources/tpcds-query-results/v1_4/q6.sql.out delete mode 100644 sql/core/src/test/resources/tpcds-query-results/v1_4/q75.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5b65d8a -> 620f072)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5b65d8a [SPARK-35347][SQL] Use MethodUtils for looking up methods in Invoke and StaticInvoke add 620f072 [SPARK-35231][SQL] logical.Range override maxRowsPerPartition No new revisions were added by this update. Summary of changes: .../sql/catalyst/plans/logical/basicLogicalOperators.scala | 12 .../apache/spark/sql/catalyst/plans/LogicalPlanSuite.scala | 11 ++- 2 files changed, 22 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (620f072 -> 38eb5a6)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 620f072 [SPARK-35231][SQL] logical.Range override maxRowsPerPartition add 38eb5a6 [SPARK-35354][SQL] Replace BaseJoinExec with ShuffledJoin in CoalesceBucketsInJoin No new revisions were added by this update. Summary of changes: .../sql/execution/bucketing/CoalesceBucketsInJoin.scala | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (44bd0a8 -> c4ca232)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 44bd0a8 [SPARK-35088][SQL][FOLLOWUP] Improve the error message for Sequence expression add c4ca232 [SPARK-35363][SQL] Refactor sort merge join code-gen be agnostic to join type No new revisions were added by this update. Summary of changes: .../spark/sql/execution/joins/ShuffledJoin.scala | 2 +- .../sql/execution/joins/SortMergeJoinExec.scala| 163 +++-- 2 files changed, 84 insertions(+), 81 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae0579a -> 3241aeb)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae0579a [SPARK-35369][DOC] Document ExecutorAllocationManager metrics add 3241aeb [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests No new revisions were added by this update. Summary of changes: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala | 10 +- .../test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala | 6 -- 2 files changed, 9 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2eef2f9 -> 2390b9d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2eef2f9 [SPARK-35412][SQL] Fix a bug in groupBy of year-month/day-time intervals add 2390b9d [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new f9a396c [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit f9a396c is described below commit f9a396c37cb340671666379cb8d8a85435c7ad87 Author: Takeshi Yamamuro AuthorDate: Mon May 17 09:26:04 2021 +0900 [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit ### What changes were proposed in this pull request? This PR proposes to use the SHA of the latest commit ([2a5078a782192ddb6efbcead8de9973d6ab4f069](https://github.com/databricks/tpcds-kit/commit/2a5078a782192ddb6efbcead8de9973d6ab4f069)) when checking out `databricks/tpcds-kit`. This can prevent the test workflow from breaking accidentally if the repository changes drastically. ### Why are the changes needed? For better test workflow. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GA passed. Closes #32561 from maropu/UseRefInCheckout. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit 2390b9dbcbc0b0377d694d2c3c2c0fa78179cbd6) Signed-off-by: Takeshi Yamamuro --- .github/workflows/build_and_test.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 173cc0e..c8b4c77 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -481,6 +481,7 @@ jobs: uses: actions/checkout@v2 with: repository: databricks/tpcds-kit +ref: 2a5078a782192ddb6efbcead8de9973d6ab4f069 path: ./tpcds-kit - name: Build tpcds-kit if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true' - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8ebc1d3 [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit 8ebc1d3 is described below commit 8ebc1d317f978e524d55449ecc88daa806dde009 Author: Takeshi Yamamuro AuthorDate: Mon May 17 09:26:04 2021 +0900 [SPARK-35413][INFRA] Use the SHA of the latest commit when checking out databricks/tpcds-kit ### What changes were proposed in this pull request? This PR proposes to use the SHA of the latest commit ([2a5078a782192ddb6efbcead8de9973d6ab4f069](https://github.com/databricks/tpcds-kit/commit/2a5078a782192ddb6efbcead8de9973d6ab4f069)) when checking out `databricks/tpcds-kit`. This can prevent the test workflow from breaking accidentally if the repository changes drastically. ### Why are the changes needed? For better test workflow. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? GA passed. Closes #32561 from maropu/UseRefInCheckout. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit 2390b9dbcbc0b0377d694d2c3c2c0fa78179cbd6) Signed-off-by: Takeshi Yamamuro --- .github/workflows/build_and_test.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 936a256..77a2c79 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -428,6 +428,7 @@ jobs: uses: actions/checkout@v2 with: repository: databricks/tpcds-kit +ref: 2a5078a782192ddb6efbcead8de9973d6ab4f069 path: ./tpcds-kit - name: Build tpcds-kit if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true' - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7b942d5 -> cce0048)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7b942d5 [SPARK-35425][BUILD] Pin jinja2 in `spark-rm/Dockerfile` and add as a required dependency in the release README.md add cce0048 [SPARK-35351][SQL] Add code-gen for left anti sort merge join No new revisions were added by this update. Summary of changes: .../sql/execution/joins/SortMergeJoinExec.scala| 97 ++ .../approved-plans-v1_4/q16.sf100/explain.txt | 4 +- .../approved-plans-v1_4/q16.sf100/simplified.txt | 5 +- .../approved-plans-v1_4/q16/explain.txt| 4 +- .../approved-plans-v1_4/q16/simplified.txt | 5 +- .../approved-plans-v1_4/q69.sf100/explain.txt | 36 +++ .../approved-plans-v1_4/q69.sf100/simplified.txt | 110 +++-- .../approved-plans-v1_4/q87.sf100/explain.txt | 8 +- .../approved-plans-v1_4/q87.sf100/simplified.txt | 10 +- .../approved-plans-v1_4/q94.sf100/explain.txt | 4 +- .../approved-plans-v1_4/q94.sf100/simplified.txt | 5 +- .../approved-plans-v1_4/q94/explain.txt| 4 +- .../approved-plans-v1_4/q94/simplified.txt | 5 +- .../sql/execution/WholeStageCodegenSuite.scala | 22 + .../sql/execution/metric/SQLMetricsSuite.scala | 4 +- 15 files changed, 208 insertions(+), 115 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (186477c -> b1493d8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 186477c [SPARK-35263][TEST] Refactor ShuffleBlockFetcherIteratorSuite to reduce duplicated code add b1493d8 [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/codegen/CodeGenerator.scala | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a72d05c -> 46f7d78)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a72d05c [SPARK-35106][CORE][SQL] Avoid failing rename caused by destination directory not exist add 46f7d78 [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation No new revisions were added by this update. Summary of changes: .../plans/logical/basicLogicalOperators.scala | 43 +++- .../BasicStatsEstimationSuite.scala| 81 ++ 2 files changed, 108 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9283beb -> 1214213)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9283beb [SPARK-35418][SQL] Add sentences function to functions.{scala,py} add 1214213 [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation No new revisions were added by this update. Summary of changes: .../logical/statsEstimation/FilterEstimation.scala | 2 +- .../logical/statsEstimation/UnionEstimation.scala | 97 ++ .../BasicStatsEstimationSuite.scala| 2 +- .../statsEstimation/UnionEstimationSuite.scala | 65 +-- 4 files changed, 122 insertions(+), 44 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d1b24d8 -> 586caae)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d1b24d8 [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures add 586caae [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/execution/window/WindowExec.scala | 2 +- .../scala/org/apache/spark/sql/execution/window/WindowExecBase.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bdd8e1d -> e170e63)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bdd8e1d [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory add e170e63 [SPARK-35457][BUILD] Bump ANTLR runtime version to 4.8 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fdd7ca5 -> 548e37b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fdd7ca5 [SPARK-35498][PYTHON] Add thread target wrapper API for pyspark pin thread mode add 548e37b [SPARK-33122][SQL][FOLLOWUP] Extend RemoveRedundantAggregates optimizer rule to apply to more cases No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/Optimizer.scala | 43 + .../optimizer/RemoveRedundantAggregates.scala | 70 ++ .../optimizer/RemoveRedundantAggregatesSuite.scala | 16 - 3 files changed, 86 insertions(+), 43 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAggregates.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a59063d -> 08e6f63)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a59063d [SPARK-35581][SQL] Support special datetime values in typed literals only add 08e6f63 [SPARK-35577][TESTS] Allow to log container output for docker integration tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf07036 -> 912d60b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf07036 [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs add 912d60b [SPARK-35709][DOCS] Remove the reference to third party Nomad integration project No new revisions were added by this update. Summary of changes: docs/cluster-overview.md | 3 --- 1 file changed, 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e9af457 -> c463472)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e9af457 [SPARK-35718][SQL] Support casting of Date to timestamp without time zone type add c463472 [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions No new revisions were added by this update. Summary of changes: .../expressions/EquivalentExpressions.scala| 45 -- .../SubexpressionEliminationSuite.scala| 21 ++ 2 files changed, 45 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (864ff67 -> 9709ee5)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 864ff67 [SPARK-35429][CORE] Remove commons-httpclient from Hadoop-3.2 profile due to EOL and CVEs add 9709ee5 [SPARK-35760][SQL] Fix the max rows check for broadcast exchange No new revisions were added by this update. Summary of changes: .../execution/exchange/BroadcastExchangeExec.scala | 25 +++--- 1 file changed, 17 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ac228d4 -> 11e96dc)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ac228d4 [SPARK-35691][CORE] addFile/addJar/addDirectory should put CanonicalFile add 11e96dc [SPARK-35669][SQL] Quote the pushed column name only when nested column predicate pushdown is enabled No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/sources/filters.scala | 5 ++-- .../execution/datasources/DataSourceStrategy.scala | 31 +- .../spark/sql/FileBasedDataSourceSuite.scala | 10 +++ 3 files changed, 31 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5c96d64 -> b08cf6e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5c96d64 [SPARK-35707][ML] optimize sparse GEMM by skipping bound checking add b08cf6e [SPARK-35203][SQL] Improve Repartition statistics estimation No new revisions were added by this update. Summary of changes: .../logical/statsEstimation/BasicStatsPlanVisitor.scala | 4 ++-- .../SizeInBytesOnlyStatsPlanVisitor.scala | 4 ++-- .../statsEstimation/BasicStatsEstimationSuite.scala | 17 - 3 files changed, 16 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (37ef7bb -> f80be41)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 37ef7bb [SPARK-35840][SQL] Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType` add f80be41 [SPARK-34565][SQL] Collapse Window nodes with Project between them No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/Optimizer.scala | 25 --- .../catalyst/optimizer/CollapseWindowSuite.scala | 50 +- 2 files changed, 68 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][SQL] Combine the same codes in test cases
Repository: spark Updated Branches: refs/heads/master 261284842 -> 93f5592aa [MINOR][SQL] Combine the same codes in test cases ## What changes were proposed in this pull request? In the DDLSuit, there are four test cases have the same codes , writing a function can combine the same code. ## How was this patch tested? existing tests. Closes #23194 from CarolinePeng/Update_temp. Authored-by: å½ç¿00244106 <00244106@zte.intra> Signed-off-by: Takeshi Yamamuro Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/93f5592a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/93f5592a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/93f5592a Branch: refs/heads/master Commit: 93f5592aa8c1254a93524fda81cf0e418c22cb2f Parents: 2612848 Author: å½ç¿00244106 <00244106@zte.intra> Authored: Tue Dec 4 22:08:16 2018 +0900 Committer: Takeshi Yamamuro Committed: Tue Dec 4 22:08:16 2018 +0900 -- .../spark/sql/execution/command/DDLSuite.scala | 40 1 file changed, 16 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/93f5592a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala index 9d32fb6..052a5e7 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala @@ -377,41 +377,41 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } - test("CTAS a managed table with the existing empty directory") { -val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab1"))) + private def withEmptyDirInTablePath(dirName: String)(f : File => Unit): Unit = { +val tableLoc = + new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier(dirName))) try { tableLoc.mkdir() + f(tableLoc) +} finally { + waitForTasksToFinish() + Utils.deleteRecursively(tableLoc) +} + } + + + test("CTAS a managed table with the existing empty directory") { +withEmptyDirInTablePath("tab1") { tableLoc => withTable("tab1") { sql(s"CREATE TABLE tab1 USING ${dataSource} AS SELECT 1, 'a'") checkAnswer(spark.table("tab1"), Row(1, "a")) } -} finally { - waitForTasksToFinish() - Utils.deleteRecursively(tableLoc) } } test("create a managed table with the existing empty directory") { -val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab1"))) -try { - tableLoc.mkdir() +withEmptyDirInTablePath("tab1") { tableLoc => withTable("tab1") { sql(s"CREATE TABLE tab1 (col1 int, col2 string) USING ${dataSource}") sql("INSERT INTO tab1 VALUES (1, 'a')") checkAnswer(spark.table("tab1"), Row(1, "a")) } -} finally { - waitForTasksToFinish() - Utils.deleteRecursively(tableLoc) } } test("create a managed table with the existing non-empty directory") { withTable("tab1") { - val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab1"))) - try { -// create an empty hidden file -tableLoc.mkdir() + withEmptyDirInTablePath("tab1") { tableLoc => val hiddenGarbageFile = new File(tableLoc.getCanonicalPath, ".garbage") hiddenGarbageFile.createNewFile() val exMsg = "Can not create the managed table('`tab1`'). The associated location" @@ -439,28 +439,20 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { }.getMessage assert(ex.contains(exMsgWithDefaultDB)) } - } finally { -waitForTasksToFinish() -Utils.deleteRecursively(tableLoc) } } } test("rename a managed table with existing empty directory") { -val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab2"))) -try { +withEmptyDirInTablePath("tab2") { tableLoc => withTable("tab1") { sql(s"CREATE TABLE tab1 USING $dataSource AS SELECT 1, 'a'") -tableLoc.mkdir() val ex
[spark] branch master updated: [SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6955638 [SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability 6955638 is described below commit 6955638eae99cbe0a890a50e0c61c17641e7269f Author: Wenchen Fan AuthorDate: Thu Jan 10 20:15:25 2019 +0900 [SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability ## What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/18576 The newly added rule `UpdateNullabilityInAttributeReferences` does the same thing the `FixNullability` does, we only need to keep one of them. This PR removes `UpdateNullabilityInAttributeReferences`, and use `FixNullability` to replace it. Also rename it to `UpdateAttributeNullability` ## How was this patch tested? existing tests Closes #23390 from cloud-fan/nullable. Authored-by: Wenchen Fan Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/analysis/Analyzer.scala | 38 +-- .../analysis/UpdateAttributeNullability.scala | 57 ++ .../spark/sql/catalyst/optimizer/Optimizer.scala | 18 +-- ...dateAttributeNullabilityInOptimizerSuite.scala} | 9 ++-- 4 files changed, 65 insertions(+), 57 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 2aa0f21..a84bb76 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -197,8 +197,8 @@ class Analyzer( PullOutNondeterministic), Batch("UDF", Once, HandleNullInputsForUDF), -Batch("FixNullability", Once, - FixNullability), +Batch("UpdateNullability", Once, + UpdateAttributeNullability), Batch("Subquery", Once, UpdateOuterReferences), Batch("Cleanup", fixedPoint, @@ -1822,40 +1822,6 @@ class Analyzer( } /** - * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of - * corresponding Attributes of its children output Attributes. This step is needed because - * users can use a resolved AttributeReference in the Dataset API and outer joins - * can change the nullability of an AttribtueReference. Without the fix, a nullable column's - * nullable field can be actually set as non-nullable, which cause illegal optimization - * (e.g., NULL propagation) and wrong answers. - * See SPARK-13484 and SPARK-13801 for the concrete queries of this case. - */ - object FixNullability extends Rule[LogicalPlan] { - -def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp { - case p if !p.resolved => p // Skip unresolved nodes. - case p: LogicalPlan if p.resolved => -val childrenOutput = p.children.flatMap(c => c.output).groupBy(_.exprId).flatMap { - case (exprId, attributes) => -// If there are multiple Attributes having the same ExprId, we need to resolve -// the conflict of nullable field. We do not really expect this happen. -val nullable = attributes.exists(_.nullable) -attributes.map(attr => attr.withNullability(nullable)) -}.toSeq -// At here, we create an AttributeMap that only compare the exprId for the lookup -// operation. So, we can find the corresponding input attribute's nullability. -val attributeMap = AttributeMap[Attribute](childrenOutput.map(attr => attr -> attr)) -// For an Attribute used by the current LogicalPlan, if it is from its children, -// we fix the nullable field by using the nullability setting of the corresponding -// output Attribute from the children. -p.transformExpressions { - case attr: Attribute if attributeMap.contains(attr) => -attr.withNullability(attributeMap(attr).nullable) -} -} - } - - /** * Extracts [[WindowExpression]]s from the projectList of a [[Project]] operator and * aggregateExpressions of an [[Aggregate]] operator and creates individual [[Window]] * operators for every distinct [[WindowSpecDefinition]]. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UpdateAttributeNullability.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UpdateAttributeNullability.scala new file mode 100644 index 000..8655dec --- /dev/null +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analy
svn commit: r31887 - /dev/spark/KEYS
Author: yamamuro Date: Fri Jan 11 06:45:33 2019 New Revision: 31887 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Fri Jan 11 06:45:33 2019 @@ -829,3 +829,61 @@ aI9kX8V9gl5PZLw+LchGX5H7HKoRxZM3UbPkY5Mv ZIAzEigXrrsePyvHGf6H =6YJg -END PGP PUBLIC KEY BLOCK- + +pub rsa4096 2019-01-10 [SC] + 0E9925082727075EEE83D4B06EC5F1052DF08FF4 +uid [ultimate] Takeshi Yamamuro (CODE SIGNING KEY) +sub rsa4096 2019-01-10 [E] + +-BEGIN PGP PUBLIC KEY BLOCK- + +mQINBFw2q20BEADLW2BZbJO2YHmAmAumggCTm4aVWFRYH+NX0zqEX2bynA0GM5hR +euvLL6w5vq44S6zU+39o1s9wSDcBAqLNpPB2eDL8qqXKZa/AQTwCiitk9aDB1KZB +DzejoqtrtCK1WnCW7oB7mQIq+/txSyLgv1UgFijh2aAx0ChmMnb2WbeZAQz/5ids +ixMfZiRofZVJIjdNNe5kIBcc9uthoyLw3x16nLT3zrATtBSDAL8hAULOqXPMMf3T +xzm2cPnOnqFlKGkEWRuptnoPHJ8+Uwbb91oQmlFGolU9PvCQVdmtMWCmqvlg5SeZ +VSC+w4eUk8M2nWxPh+WrPP5eQMDVUdmWgC/ZzCoNW/AxY4T9G3h3XLpZoyoDEUmd +Xk95KiEq/fo2ZT2jF31tPsGPhlzGETnzDK1xdNtoFKqjvWxwdPmJgGBau2d30rxJ +gvrjMtvcJ8Z/L7D0hKR8r8eJB6GlfBTLARVQ/XygNS1sfR6+rv/kNFGR8932bNsf +OtxiAo1Ga3vn3Q3WK+9Ddz4HKhsoOwWYllRNE60xB2LGM7ZjvvY/I9Vx2Fqfew5z +MC1s3u1Bgu0FIepV+N0Qxs2yfavdfLSVCFZ2elXkyZ7vGAFikksgGRSLbYgx01Qx +gCx3nzYL1uol6s1z6jj039p/mEqSVMY1FiecmK3/inNMy4dLjg6s+Au+GwARAQAB +tDlUYWtlc2hpIFlhbWFtdXJvIChDT0RFIFNJR05JTkcgS0VZKSA8eWFtYW11cm9A +YXBhY2hlLm9yZz6JAk4EEwEIADgWIQQOmSUIJycHXu6D1LBuxfEFLfCP9AUCXDar +bQIbAwULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRBuxfEFLfCP9OLWD/4uxh1M +3BiXxwqKoBbephYTI//iSVRmwSXQdm7fsPkZywc7K4W+jiYyf1Qe8mZ4ikNVnvcE +W7+FkLGWDFHcIXddXcruynrTeQ9YwO/RPY26qYGWfeXaIf7obVSRVT6wCg//rw/o +xglE2aBXM6kgEgcZRkIo5FeLxGK0VgQ56ANN4Aa4/Jev7/Fca+MkeXH6UlxnkMD5 +W/UgMWMEZFKJPXiLpxgmhzzq5T5ahvhRQfxtRAXz+w/SK+vo+jeZZ5/SqtDECsa4 +uG/iWjC5bNOsV97iCFx/KxNY5I4U4Q5svG6mz+IRgCMV3jpkslQfME2wXgC/k1bT +vr8ICOEzQguYgBYdXl99cMgy3ULPy1vbx4DycuKneKtkp25voy5rtU3+JBrxpwSa +TwD1gRiXFscZ5oomI3rn0jPq1dIKhrQaG0T2QwKn47spdPK0TWbec+SNo07dDaC0 +IsqgSZ1fkGk5ILTZ/AfYzdnHHeJ3IvrkVFLMMD35Rwcji8E85tMXV7GmlDejjMNk +QTQMQymXB+yRqIrHMAss1IY11UmQCtGSJfHwiAYW+iRBZfpB7fFvHMhwQFT4wEPW +St5JyUiRled8+1BtDUYeBjDr9UtAh/moD7xXtu8wiZjea87LUt+H/tTogsHWN/kJ +igCoSWXK5ugVy8sKI/Q+jQSgXzduChiTQQWIvrkCDQRcNqttARAA0WuzOkBGx6/S +0YV5GGwn0+Zqxhm0EV/G4cT+1IPKgiMTuTp/vRF7IDwZwh5oalG4Cl7YGygqEx/V +gHqtf0m1aFV4vndmmMaHKnYAl9/rk3Svu3BRXgu9sJPoMz3nDlRhcT3IvVPZw34E +PQg0tKhnAbvSwxpRL1jHhJgHTYmebja0UTSVr3NXAs8Z+XSEjZN//5B5m4N2UkUh +XVMzfDWaOa+EYlKmzhqIt6Q8/MNjFp7jeNOKUMBoIP0JKf3Y37M9NLolQihJ9RwE +2f0a8PN5xMVDJTcDMox+bXa0ohcYKiu6whIz82tg0hZmgtdg20lC15ZTXzJh3DRh +cklbMeLegwijHLuCBIgOtbuVknWqktx89Xdg9IG84eByDPxxuZwM9QNbfip9JHKH +Pv8M2W1wPMIIgIaRRzEu1NKUoZq14/Djn0t1hb2rjQarPOR3pqlO75TdMZJ8ZVK3 +OSUKWbLed+VI/X2I0iiH5Ag/Ajzh9qIqyKVxZI0Md7G7CWHfiVRHNzMlGP08z4sn +N6uu9vzL6GSiHU5cPtD34gPXMlWq42wXCat8GMMHZAdeCwhLVm3+wPucq8OO +S0cTmUzxdnMomUO9HST2a3aO8ulBhu4wh3Y+1gkxvJ19N+WsS6uBFBOnaWf3m1Y0 +2bKSKEtKunWfwfXHowyFwKpQF4cFClEAEQEAAYkCNgQYAQgAIBYhBA6ZJQgnJwde +7oPUsG7F8QUt8I/0BQJcNqttAhsMAAoJEG7F8QUt8I/0H54QAKtJvjP7dtCQF+pZ +oy9KgfdF0CSdpTwXbEn0VE/GcdkJxXoiDTTb9GVAm/ySpwRUcTub/jFjh3uKN1t5 +SbVUR6TfewhKZ5fsKqTbUKYXag+CRLy1n59RQPg9LcL6NwTk3+SJ4cLAnj0buVFa +nlZ0W2fC54TK2xvGcnU7S3dQdlyPuvR6ouNqzQxEuXTI0t9cXdQFpf8WLt0KknsH +kMEZpKWMnrfA5fusqiGQ+9GcjowvEc6tPiZ+bMJyJSj2kmTHnCU0krxPr/xuFfNa +YpJvIZFPwn9GKxejOcZVckKtdhXMmtFlwLnCcWuB0GRRQjd9r8R+KCJM6RlTp4yI +LBBWmPnJp0Sd/9xCdVZp1fFNZ+w72q5Z0l+6r+DuvThYhH5HdRxfmH33SzdpWEf8 +WcKCbbi9mN+2ZsJufR5LvKsNpv6DLTwCuMFlIptxSxGiYZxRYMKeZJ84AWHL7sit +ftDfwHakkfUZgprK5MBuEcjxXrsmcM25Ns+rhA80JCRmsqqreSC4M9XnKkya5hoJ +83pIuVIGxOVLhVWYkAGCqW+UVr1zBBBZYe8U3wDCFucHazqcaOHCUXAxM4rwpp/K +pqnGj9s6Uudh/FXfVN5MC0/pH/ySSACkXwCmKXAh2s8F9w199WRsNlya3Ce1Ryan +/G8Bpm/p4kbeqJtsx3t7nhPke7fG +=4noL +-END PGP PUBLIC KEY BLOCK- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] tag v2.3.3-rc1 created (now 0e3d5fd)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to tag v2.3.3-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git. at 0e3d5fd (commit) This tag includes the following new commits: new 0e3d5fd Preparing Spark release v2.3.3-rc1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v2.3.3-rc1
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to tag v2.3.3-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git commit 0e3d5fd960927dd8ff1a909aba98b85fb9350c58 Author: Takeshi Yamamuro AuthorDate: Sun Jan 13 00:25:46 2019 + Preparing Spark release v2.3.3-rc1 --- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 2 +- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 40 files changed, 40 insertions(+), 40 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index f8b15cc..6a8cd4f 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.3 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index e412a47..6010b6e 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.3 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index d8f9a3d..8b5d3c8 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.3 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index a1a4f87..dd27a24 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.3 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index e650978..aded5e7d 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.3 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 350e3cb..a50f612 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.3 ../../pom.xml diff --git a/common/tags/pom.xml b/common/tags/pom.xml index e7fea41..8112ca4 100644 --- a/common/tags/pom.xml +++ b/common/tags/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.3 ../../pom.xml diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml index 601cc5d..0d5f61f 100644 --- a/common/unsafe/pom.xml +++ b/common/unsafe/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.3 ../../pom.xml diff --git a/core/pom.xml b/core/pom.xml index 2a7e644..930128d 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.3 ../pom.xml diff --git a/docs/_config.yml b/docs/_config.yml index 7629f5f..8e9c3b5 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -14,7 +14,7 @@ include: # These allow
[spark] 01/01: Preparing development version 2.3.4-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git commit e46b0edd1046329fa3e3a730d59a6a263f72cbd0 Author: Takeshi Yamamuro AuthorDate: Sun Jan 13 00:26:02 2019 + Preparing development version 2.3.4-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 6ec4966..a82446e 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.3 +Version: 2.3.4 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 6a8cd4f..612a1b8 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.3 +2.3.4-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 6010b6e..5547e97 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3 +2.3.4-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 8b5d3c8..119dde2 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3 +2.3.4-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index dd27a24..dba5224 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3 +2.3.4-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index aded5e7d..56902a3 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3 +2.3.4-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index a50f612..5302d95 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3 +2.3.4-SNAPSHOT ../../pom.xml diff --git a/common/tags/pom.xml b/common/tags/pom.xml index 8112ca4..232ebfa 100644 --- a/common/tags/pom.xml +++ b/common/tags/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3 +2.3.4-SNAPSHOT ../../pom.xml diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml index 0d5f61f..f0baa2a 100644 --- a/common/unsafe/pom.xml +++ b/common/unsafe/pom.xml @@ -22,7 +22,7 @@ org.apache
[spark] branch branch-2.3 updated (6d063ee -> e46b0ed)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git. from 6d063ee [SPARK-26538][SQL] Set default precision and scale for elements of postgres numeric array add 0e3d5fd Preparing Spark release v2.3.3-rc1 new e46b0ed Preparing development version 2.3.4-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.3 updated: [SPARK-25572][SPARKR] test only if not cran
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.3 by this push: new d397348 [SPARK-25572][SPARKR] test only if not cran d397348 is described below commit d397348b7bec20743f738694a135e4b67947fd99 Author: Felix Cheung AuthorDate: Sat Sep 29 14:48:32 2018 -0700 [SPARK-25572][SPARKR] test only if not cran ## What changes were proposed in this pull request? CRAN doesn't seem to respect the system requirements as running tests - we have seen cases where SparkR is run on Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt skipping all tests ## How was this patch tested? manual, jenkins, appveyor Author: Felix Cheung Closes #22589 from felixcheung/ralltests. (cherry picked from commit f4b138082ff91be74b0f5bbe19cdb90dd9e5f131) Signed-off-by: Takeshi Yamamuro --- R/pkg/tests/run-all.R | 83 +++ 1 file changed, 44 insertions(+), 39 deletions(-) diff --git a/R/pkg/tests/run-all.R b/R/pkg/tests/run-all.R index 94d7518..1e96418 100644 --- a/R/pkg/tests/run-all.R +++ b/R/pkg/tests/run-all.R @@ -18,50 +18,55 @@ library(testthat) library(SparkR) -# Turn all warnings into errors -options("warn" = 2) +# SPARK-25572 +if (identical(Sys.getenv("NOT_CRAN"), "true")) { -if (.Platform$OS.type == "windows") { - Sys.setenv(TZ = "GMT") -} + # Turn all warnings into errors + options("warn" = 2) -# Setup global test environment -# Install Spark first to set SPARK_HOME + if (.Platform$OS.type == "windows") { +Sys.setenv(TZ = "GMT") + } -# NOTE(shivaram): We set overwrite to handle any old tar.gz files or directories left behind on -# CRAN machines. For Jenkins we should already have SPARK_HOME set. -install.spark(overwrite = TRUE) + # Setup global test environment + # Install Spark first to set SPARK_HOME -sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R") -sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db") -invisible(lapply(sparkRWhitelistSQLDirs, - function(x) { unlink(file.path(sparkRDir, x), recursive = TRUE, force = TRUE)})) -sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE) + # NOTE(shivaram): We set overwrite to handle any old tar.gz files or directories left behind on + # CRAN machines. For Jenkins we should already have SPARK_HOME set. + install.spark(overwrite = TRUE) -sparkRTestMaster <- "local[1]" -sparkRTestConfig <- list() -if (identical(Sys.getenv("NOT_CRAN"), "true")) { - sparkRTestMaster <- "" -} else { - # Disable hsperfdata on CRAN - old_java_opt <- Sys.getenv("_JAVA_OPTIONS") - Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt)) - tmpDir <- tempdir() - tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir) - sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg, - spark.executor.extraJavaOptions = tmpArg) -} + sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R") + sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db") + invisible(lapply(sparkRWhitelistSQLDirs, + function(x) { unlink(file.path(sparkRDir, x), recursive = TRUE, force = TRUE)})) + sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE) -test_package("SparkR") + sparkRTestMaster <- "local[1]" + sparkRTestConfig <- list() + if (identical(Sys.getenv("NOT_CRAN"), "true")) { +sparkRTestMaster <- "" + } else { +# Disable hsperfdata on CRAN +old_java_opt <- Sys.getenv("_JAVA_OPTIONS") +Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt)) +tmpDir <- tempdir() +tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir) +sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg, + spark.executor.extraJavaOptions = tmpArg) + } -if (identical(Sys.getenv("NOT_CRAN"), "true")) { - # set random seed for predictable results. mostly for base's sample() in tree and classification - set.seed(42) - # for testthat 1.0.2 later, change reporter from "summary" to default_reporter() - testthat:::run_tests("SparkR", - file.path(sparkRDir, "pkg", "tests", "fulltests"), - NULL, - "summary") -} + test_package("SparkR") + + if
[spark] tag v2.3.3-rc1 deleted (was 0e3d5fd)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to tag v2.3.3-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git. *** WARNING: tag v2.3.3-rc1 was deleted! *** was 0e3d5fd Preparing Spark release v2.3.3-rc1 The revisions that were on this tag are still contained in other references; therefore, this change does not discard any commits from the repository. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/02: [SPARK-26010][R] fix vignette eval with Java 11
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git commit 20b749021bacaa2906775944e43597ccf37af62b Author: Felix Cheung AuthorDate: Mon Nov 12 19:03:30 2018 -0800 [SPARK-26010][R] fix vignette eval with Java 11 ## What changes were proposed in this pull request? changes in vignette only to disable eval ## How was this patch tested? Jenkins Author: Felix Cheung Closes #23007 from felixcheung/rjavavervig. (cherry picked from commit 88c82627267a9731b2438f0cc28dd656eb3dc834) Signed-off-by: Felix Cheung --- R/pkg/vignettes/sparkr-vignettes.Rmd | 14 ++ 1 file changed, 14 insertions(+) diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd b/R/pkg/vignettes/sparkr-vignettes.Rmd index d4713de..70970bd 100644 --- a/R/pkg/vignettes/sparkr-vignettes.Rmd +++ b/R/pkg/vignettes/sparkr-vignettes.Rmd @@ -57,6 +57,20 @@ First, let's load and attach the package. library(SparkR) ``` +```{r, include=FALSE} +# disable eval if java version not supported +override_eval <- tryCatch(!is.numeric(SparkR:::checkJavaVersion()), + error = function(e) { TRUE }, + warning = function(e) { TRUE }) + +if (override_eval) { + opts_hooks$set(eval = function(options) { +options$eval = FALSE +options + }) +} +``` + `SparkSession` is the entry point into SparkR which connects your R program to a Spark cluster. You can create a `SparkSession` using `sparkR.session` and pass in options such as the application name, any Spark packages depended on, etc. We use default settings in which it runs in local mode. It auto downloads Spark package in the background if no previous installation is found. For more details about setup, see [Spark Session](#SetupSparkSession). - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.3 updated (d397348 -> 01511e4)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git. discard d397348 [SPARK-25572][SPARKR] test only if not cran discard a9a1bc7 [SPARK-26010][R] fix vignette eval with Java 11 discard e46b0ed Preparing development version 2.3.4-SNAPSHOT discard 0e3d5fd Preparing Spark release v2.3.3-rc1 new 20b7490 [SPARK-26010][R] fix vignette eval with Java 11 new 01511e4 [SPARK-25572][SPARKR] test only if not cran This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (d397348) \ N -- N -- N refs/heads/branch-2.3 (01511e4) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/02: [SPARK-25572][SPARKR] test only if not cran
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git commit 01511e479013c56d70fe8ffa805ecbd66591b57e Author: Felix Cheung AuthorDate: Sat Sep 29 14:48:32 2018 -0700 [SPARK-25572][SPARKR] test only if not cran ## What changes were proposed in this pull request? CRAN doesn't seem to respect the system requirements as running tests - we have seen cases where SparkR is run on Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt skipping all tests ## How was this patch tested? manual, jenkins, appveyor Author: Felix Cheung Closes #22589 from felixcheung/ralltests. (cherry picked from commit f4b138082ff91be74b0f5bbe19cdb90dd9e5f131) Signed-off-by: Takeshi Yamamuro --- R/pkg/tests/run-all.R | 83 +++ 1 file changed, 44 insertions(+), 39 deletions(-) diff --git a/R/pkg/tests/run-all.R b/R/pkg/tests/run-all.R index 94d7518..1e96418 100644 --- a/R/pkg/tests/run-all.R +++ b/R/pkg/tests/run-all.R @@ -18,50 +18,55 @@ library(testthat) library(SparkR) -# Turn all warnings into errors -options("warn" = 2) +# SPARK-25572 +if (identical(Sys.getenv("NOT_CRAN"), "true")) { -if (.Platform$OS.type == "windows") { - Sys.setenv(TZ = "GMT") -} + # Turn all warnings into errors + options("warn" = 2) -# Setup global test environment -# Install Spark first to set SPARK_HOME + if (.Platform$OS.type == "windows") { +Sys.setenv(TZ = "GMT") + } -# NOTE(shivaram): We set overwrite to handle any old tar.gz files or directories left behind on -# CRAN machines. For Jenkins we should already have SPARK_HOME set. -install.spark(overwrite = TRUE) + # Setup global test environment + # Install Spark first to set SPARK_HOME -sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R") -sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db") -invisible(lapply(sparkRWhitelistSQLDirs, - function(x) { unlink(file.path(sparkRDir, x), recursive = TRUE, force = TRUE)})) -sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE) + # NOTE(shivaram): We set overwrite to handle any old tar.gz files or directories left behind on + # CRAN machines. For Jenkins we should already have SPARK_HOME set. + install.spark(overwrite = TRUE) -sparkRTestMaster <- "local[1]" -sparkRTestConfig <- list() -if (identical(Sys.getenv("NOT_CRAN"), "true")) { - sparkRTestMaster <- "" -} else { - # Disable hsperfdata on CRAN - old_java_opt <- Sys.getenv("_JAVA_OPTIONS") - Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt)) - tmpDir <- tempdir() - tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir) - sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg, - spark.executor.extraJavaOptions = tmpArg) -} + sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R") + sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db") + invisible(lapply(sparkRWhitelistSQLDirs, + function(x) { unlink(file.path(sparkRDir, x), recursive = TRUE, force = TRUE)})) + sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE) -test_package("SparkR") + sparkRTestMaster <- "local[1]" + sparkRTestConfig <- list() + if (identical(Sys.getenv("NOT_CRAN"), "true")) { +sparkRTestMaster <- "" + } else { +# Disable hsperfdata on CRAN +old_java_opt <- Sys.getenv("_JAVA_OPTIONS") +Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt)) +tmpDir <- tempdir() +tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir) +sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg, + spark.executor.extraJavaOptions = tmpArg) + } -if (identical(Sys.getenv("NOT_CRAN"), "true")) { - # set random seed for predictable results. mostly for base's sample() in tree and classification - set.seed(42) - # for testthat 1.0.2 later, change reporter from "summary" to default_reporter() - testthat:::run_tests("SparkR", - file.path(sparkRDir, "pkg", "tests", "fulltests"), - NULL, - "summary") -} + test_package("SparkR") + + if (identical(Sys.getenv("NOT_CRAN"), "true")) { +# set random seed for predictable results. mostly for base's sample() in tree and classification +se