[spark] branch branch-3.1 updated: [SPARK-33653][SQL][3.1] DSv2: REFRESH TABLE should recache the table itself
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 5715322 [SPARK-33653][SQL][3.1] DSv2: REFRESH TABLE should recache the table itself 5715322 is described below commit 5715322de3a44ad5d12adc663722168056b69957 Author: Chao Sun AuthorDate: Mon Dec 14 22:11:10 2020 -0800 [SPARK-33653][SQL][3.1] DSv2: REFRESH TABLE should recache the table itself This is a backport of #30742 for branch-3.1 ### What changes were proposed in this pull request? This changes DSv2 refresh table semantics to also recache the target table itself. ### Why are the changes needed? Currently "REFRESH TABLE" in DSv2 only invalidate all caches referencing the table. With #30403 merged which adds support for caching a DSv2 table, we should also recache the target table itself to make the behavior consistent with DSv1. ### Does this PR introduce _any_ user-facing change? Yes, now refreshing table in DSv2 also recache the target table itself. ### How was this patch tested? Added coverage of this new behavior in the existing UT for v2 refresh table command. Closes #30769 from sunchao/SPARK-33653-branch-3.1. Authored-by: Chao Sun Signed-off-by: Dongjoon Hyun --- .../datasources/v2/DataSourceV2Strategy.scala | 16 +--- .../execution/datasources/v2/RefreshTableExec.scala | 1 - .../spark/sql/connector/DataSourceV2SQLSuite.scala| 19 +++ 3 files changed, 32 insertions(+), 4 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala index 5289d35..97dab4b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala @@ -19,7 +19,7 @@ package org.apache.spark.sql.execution.datasources.v2 import scala.collection.JavaConverters._ -import org.apache.spark.sql.{AnalysisException, SparkSession, Strategy} +import org.apache.spark.sql.{AnalysisException, Dataset, SparkSession, Strategy} import org.apache.spark.sql.catalyst.analysis.{ResolvedNamespace, ResolvedPartitionSpec, ResolvedTable} import org.apache.spark.sql.catalyst.expressions.{And, Expression, NamedExpression, PredicateHelper, SubqueryExpression} import org.apache.spark.sql.catalyst.planning.PhysicalOperation @@ -56,9 +56,19 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat session.sharedState.cacheManager.recacheByPlan(session, r) } - private def invalidateCache(r: ResolvedTable)(): Unit = { + private def invalidateCache(r: ResolvedTable, recacheTable: Boolean = false)(): Unit = { val v2Relation = DataSourceV2Relation.create(r.table, Some(r.catalog), Some(r.identifier)) +val cache = session.sharedState.cacheManager.lookupCachedData(v2Relation) session.sharedState.cacheManager.uncacheQuery(session, v2Relation, cascade = true) +if (recacheTable && cache.isDefined) { + // save the cache name and cache level for recreation + val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName + val cacheLevel = cache.get.cachedRepresentation.cacheBuilder.storageLevel + + // recache with the same name and cache level. + val ds = Dataset.ofRows(session, v2Relation) + session.sharedState.cacheManager.cacheQuery(ds, cacheName, cacheLevel) +} } override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { @@ -137,7 +147,7 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat } case RefreshTable(r: ResolvedTable) => - RefreshTableExec(r.catalog, r.identifier, invalidateCache(r)) :: Nil + RefreshTableExec(r.catalog, r.identifier, invalidateCache(r, recacheTable = true)) :: Nil case ReplaceTable(catalog, ident, schema, parts, props, orCreate) => val propsWithOwner = CatalogV2Util.withDefaultOwnership(props) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala index 994583c..e66f0a1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala @@ -29,7 +29,6 @@ case class RefreshTableExec( catalog.invalidateTable(ident) // invalidate all caches referencing the given table -// TODO(SPARK-33437): re-cache the table itself once we
[spark] branch master updated (366beda -> 141e26d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 366beda [SPARK-33785][SQL] Migrate ALTER TABLE ... RECOVER PARTITIONS to use UnresolvedTable to resolve the identifier add 141e26d [SPARK-33767][SQL][TESTS] Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/parser/DDLParserSuite.scala | 50 +-- .../connector/AlterTablePartitionV2SQLSuite.scala | 112 .../AlterTableDropPartitionParserSuite.scala | 88 .../command/AlterTableDropPartitionSuiteBase.scala | 149 + .../spark/sql/execution/command/DDLSuite.scala | 57 ...te.scala => AlterTableDropPartitionSuite.scala} | 40 +++--- ...te.scala => AlterTableDropPartitionSuite.scala} | 38 -- .../spark/sql/hive/execution/HiveDDLSuite.scala| 4 - ...te.scala => AlterTableDropPartitionSuite.scala} | 26 +++- 9 files changed, 312 insertions(+), 252 deletions(-) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionParserSuite.scala create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionSuiteBase.scala copy sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/{AlterTableAddPartitionSuite.scala => AlterTableDropPartitionSuite.scala} (55%) copy sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/{ShowPartitionsSuite.scala => AlterTableDropPartitionSuite.scala} (56%) copy sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/{AlterTableAddPartitionSuite.scala => AlterTableDropPartitionSuite.scala} (56%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a99a47c -> 366beda)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a99a47c [SPARK-33748][K8S] Respect environment variables and configurations for Python executables add 366beda [SPARK-33785][SQL] Migrate ALTER TABLE ... RECOVER PARTITIONS to use UnresolvedTable to resolve the identifier No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala| 7 +-- .../org/apache/spark/sql/catalyst/plans/logical/statements.scala | 6 -- .../org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala | 7 +++ .../org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala | 3 ++- .../apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala | 5 ++--- .../spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala | 4 .../apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala | 3 ++- .../test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala | 4 +++- .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala | 4 +++- 9 files changed, 28 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (49d3256 -> a99a47c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 49d3256 [SPARK-33653][SQL] DSv2: REFRESH TABLE should recache the table itself add a99a47c [SPARK-33748][K8S] Respect environment variables and configurations for Python executables No new revisions were added by this update. Summary of changes: .../org/apache/spark/deploy/SparkSubmit.scala | 54 +++--- docs/running-on-kubernetes.md | 5 +- .../scala/org/apache/spark/deploy/k8s/Config.scala | 16 +++- .../org/apache/spark/deploy/k8s/Constants.scala| 3 +- .../k8s/features/DriverCommandFeatureStep.scala| 37 -- .../features/DriverCommandFeatureStepSuite.scala | 57 +-- .../src/main/dockerfiles/spark/entrypoint.sh | 10 +-- .../k8s/integrationtest/DepsTestsSuite.scala | 85 -- .../k8s/integrationtest/KubernetesSuite.scala | 6 +- .../integrationtest/KubernetesTestComponents.scala | 5 +- .../deploy/k8s/integrationtest/ProcessUtils.scala | 5 +- .../spark/deploy/k8s/integrationtest/Utils.scala | 9 ++- .../integration-tests/tests/py_container_checks.py | 2 +- .../{pyfiles.py => python_executable_check.py} | 27 +++ 14 files changed, 228 insertions(+), 93 deletions(-) copy resource-managers/kubernetes/integration-tests/tests/{pyfiles.py => python_executable_check.py} (62%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f156718 -> 49d3256)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f156718 [SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS add 49d3256 [SPARK-33653][SQL] DSv2: REFRESH TABLE should recache the table itself No new revisions were added by this update. Summary of changes: .../datasources/v2/DataSourceV2Strategy.scala | 16 +--- .../execution/datasources/v2/RefreshTableExec.scala | 1 - .../spark/sql/connector/DataSourceV2SQLSuite.scala| 19 +++ 3 files changed, 32 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new ac4d04e [SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS ac4d04e is described below commit ac4d04e480004e8b5de55e1323d27bd3c8bf28be Author: Max Gekk AuthorDate: Mon Dec 14 14:28:47 2020 -0800 [SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS ### What changes were proposed in this pull request? List partitions returned by the V2 `SHOW PARTITIONS` command in alphabetical order. ### Why are the changes needed? To have the same behavior as: 1. V1 in-memory catalog, see https://github.com/apache/spark/blob/a28ed86a387b286745b30cd4d90b3d558205a5a7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala#L546 2. V1 Hive catalogs, see https://github.com/apache/spark/blob/fab2995972761503563fa2aa547c67047c51bd33/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L715 ### Does this PR introduce _any_ user-facing change? Yes, after the changes, V2 SHOW PARTITIONS sorts its output. ### How was this patch tested? Added new UT to the base trait `ShowPartitionsSuiteBase` which contains tests for V1 and V2. Closes #30764 from MaxGekk/sort-show-partitions. Authored-by: Max Gekk Signed-off-by: Dongjoon Hyun (cherry picked from commit f156718587fc33b9bf8e5abc4ae1f6fa0a5da887) Signed-off-by: Dongjoon Hyun --- .../execution/datasources/v2/ShowPartitionsExec.scala | 5 +++-- .../sql/execution/command/ShowPartitionsSuiteBase.scala | 17 + 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala index c4b6aa8..416dce6 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala @@ -49,7 +49,7 @@ case class ShowPartitionsExec( val len = schema.length val partitions = new Array[String](len) val timeZoneId = SQLConf.get.sessionLocalTimeZone -partitionIdentifiers.map { row => +val output = partitionIdentifiers.map { row => var i = 0 while (i < len) { val dataType = schema(i).dataType @@ -59,7 +59,8 @@ case class ShowPartitionsExec( partitions(i) = escapePathName(schema(i).name) + "=" + escapePathName(partValueStr) i += 1 } - InternalRow(UTF8String.fromString(partitions.mkString("/"))) + partitions.mkString("/") } +output.sorted.map(p => InternalRow(UTF8String.fromString(p))) } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala index b695dec..56c6e5a 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala @@ -173,4 +173,21 @@ trait ShowPartitionsSuiteBase extends QueryTest with SQLTestUtils { } } } + + test("SPARK-33777: sorted output") { +withNamespace(s"$catalog.ns") { + sql(s"CREATE NAMESPACE $catalog.ns") + val table = s"$catalog.ns.dateTable" + withTable(table) { +sql(s""" + |CREATE TABLE $table (id int, part string) + |$defaultUsing + |PARTITIONED BY (part)""".stripMargin) +sql(s"ALTER TABLE $table ADD PARTITION(part = 'b')") +sql(s"ALTER TABLE $table ADD PARTITION(part = 'a')") +val partitions = sql(s"show partitions $table") +assert(partitions.first().getString(0) === "part=a") + } +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5885cc1 -> 412d86e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5885cc1 [SPARK-33261][K8S] Add a developer API for custom feature steps add 412d86e [SPARK-33771][SQL][TESTS] Fix Invalid value for HourOfAmPm when testing on JDK 14 No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/TimestampFormatterSuite.scala | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-33261][K8S] Add a developer API for custom feature steps
This is an automated email from the ASF dual-hosted git repository. holden pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5885cc1 [SPARK-33261][K8S] Add a developer API for custom feature steps 5885cc1 is described below commit 5885cc15cae9c9780530e235d2bd4bd6beda5dbb Author: Holden Karau AuthorDate: Mon Dec 14 12:05:28 2020 -0800 [SPARK-33261][K8S] Add a developer API for custom feature steps ### What changes were proposed in this pull request? Add a developer API for custom driver & executor feature steps. ### Why are the changes needed? While we allow templates for the basis of pod creation, some deployments need more flexibility in how the pods are configured. This adds a developer API for custom deployments. ### Does this PR introduce _any_ user-facing change? New developer API. ### How was this patch tested? Extended tests to verify custom step is applied when configured. Closes #30206 from holdenk/SPARK-33261-allow-people-to-extend-pod-feature-steps. Authored-by: Holden Karau Signed-off-by: Holden Karau --- .../scala/org/apache/spark/deploy/k8s/Config.scala | 20 ++ .../org/apache/spark/deploy/k8s/SparkPod.scala | 11 +++- .../k8s/features/KubernetesFeatureConfigStep.scala | 7 +- .../k8s/submit/KubernetesDriverBuilder.scala | 8 ++- .../cluster/k8s/KubernetesExecutorBuilder.scala| 8 ++- .../apache/spark/deploy/k8s/PodBuilderSuite.scala | 76 ++ .../k8s/submit/KubernetesDriverBuilderSuite.scala | 5 +- .../k8s/KubernetesExecutorBuilderSuite.scala | 4 ++ 8 files changed, 134 insertions(+), 5 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala index c28d6fd..40609ae 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala @@ -219,6 +219,26 @@ private[spark] object Config extends Logging { .stringConf .createOptional + val KUBERNETES_DRIVER_POD_FEATURE_STEPS = +ConfigBuilder("spark.kubernetes.driver.pod.featureSteps") + .doc("Class names of an extra driver pod feature step implementing " + +"KubernetesFeatureConfigStep. This is a developer API. Comma separated. " + +"Runs after all of Spark internal feature steps.") + .version("3.2.0") + .stringConf + .toSequence + .createWithDefault(Nil) + + val KUBERNETES_EXECUTOR_POD_FEATURE_STEPS = +ConfigBuilder("spark.kubernetes.executor.pod.featureSteps") + .doc("Class name of an extra executor pod feature step implementing " + +"KubernetesFeatureConfigStep. This is a developer API. Comma separated. " + +"Runs after all of Spark internal feature steps.") + .version("3.2.0") + .stringConf + .toSequence + .createWithDefault(Nil) + val KUBERNETES_ALLOCATION_BATCH_SIZE = ConfigBuilder("spark.kubernetes.allocation.batch.size") .doc("Number of pods to launch at once in each round of executor allocation.") diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPod.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPod.scala index fd11963..c2298e7 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPod.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPod.scala @@ -18,7 +18,16 @@ package org.apache.spark.deploy.k8s import io.fabric8.kubernetes.api.model.{Container, ContainerBuilder, Pod, PodBuilder} -private[spark] case class SparkPod(pod: Pod, container: Container) { +import org.apache.spark.annotation.{DeveloperApi, Unstable} + +/** + * :: DeveloperApi :: + * + * Represents a SparkPod consisting of pod and the container within the pod. + */ +@Unstable +@DeveloperApi +case class SparkPod(pod: Pod, container: Container) { /** * Convenience method to apply a series of chained transformations to a pod. diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KubernetesFeatureConfigStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KubernetesFeatureConfigStep.scala index 58cdaa3..3fec926 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KubernetesFeatureConfigStep.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KubernetesFeatureConfigStep.scala @@ -18,13 +18,18 @@ package
[spark] branch master updated (82aca7e -> bb60fb1)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82aca7e [SPARK-33779][SQL] DataSource V2: API to request distribution and ordering on write add bb60fb1 [SPARK-33779][SQL][FOLLOW-UP] Fix Java Linter error No new revisions were added by this update. Summary of changes: .../main/java/org/apache/spark/sql/connector/write/WriteBuilder.java| 2 -- 1 file changed, 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (839d689 -> 82aca7e)
This is an automated email from the ASF dual-hosted git repository. blue pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 839d689 [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field add 82aca7e [SPARK-33779][SQL] DataSource V2: API to request distribution and ordering on write No new revisions were added by this update. Summary of changes: .../distributions/ClusteredDistribution.java | 35 .../sql/connector/distributions/Distribution.java | 28 +++ .../sql/connector/distributions/Distributions.java | 56 + .../distributions/OrderedDistribution.java | 35 .../distributions/UnspecifiedDistribution.java | 28 +++ .../sql/connector/expressions/Expressions.java | 11 +++ .../sql/connector/expressions/NullOrdering.java| 42 ++ .../sql/connector/expressions/SortDirection.java | 42 ++ .../spark/sql/connector/expressions/SortOrder.java | 43 ++ .../write/RequiresDistributionAndOrdering.java | 57 + .../write/{WriteBuilder.java => Write.java}| 33 +--- .../spark/sql/connector/write/WriteBuilder.java| 39 ++--- .../connector/distributions/distributions.scala| 59 + .../sql/connector/expressions/expressions.scala| 96 ++ 14 files changed, 581 insertions(+), 23 deletions(-) create mode 100644 sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/ClusteredDistribution.java create mode 100644 sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/Distribution.java create mode 100644 sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/Distributions.java create mode 100644 sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/OrderedDistribution.java create mode 100644 sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/UnspecifiedDistribution.java create mode 100644 sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/NullOrdering.java create mode 100644 sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/SortDirection.java create mode 100644 sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/SortOrder.java create mode 100644 sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RequiresDistributionAndOrdering.java copy sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/{WriteBuilder.java => Write.java} (66%) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/connector/distributions/distributions.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 7dc4e32 [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field 7dc4e32 is described below commit 7dc4e32c114205ebb035512f6b7fd3f26154d1f0 Author: ulysses-you AuthorDate: Mon Dec 14 14:35:24 2020 + [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field ### What changes were proposed in this pull request? The deterministic field is wider than `NonDerterministic`, we should keep same range between pull out and check analysis. ### Why are the changes needed? For example ``` select * from values(1), (4) as t(c1) order by java_method('java.lang.Math', 'abs', c1) ``` We will get exception since `java_method` deterministic field is false but not a `NonDeterministic` ``` Exception in thread "main" org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in Project, Filter, Aggregate or Window, found: java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], true ;; ``` ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Add test. Closes #30703 from ulysses-you/SPARK-33733. Authored-by: ulysses-you Signed-off-by: Wenchen Fan (cherry picked from commit 839d6899adafd9a0695667656d00220d4665895d) Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/analysis/Analyzer.scala | 5 - .../expressions/CallMethodViaReflection.scala | 6 +++--- .../sql/catalyst/analysis/AnalysisSuite.scala | 22 ++ 3 files changed, 29 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index a688a24..c5c0c68 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -2947,7 +2947,10 @@ class Analyzer(override val catalogManager: CatalogManager) private def getNondeterToAttr(exprs: Seq[Expression]): Map[Expression, NamedExpression] = { exprs.filterNot(_.deterministic).flatMap { expr => -val leafNondeterministic = expr.collect { case n: Nondeterministic => n } +val leafNondeterministic = expr.collect { + case n: Nondeterministic => n + case udf: UserDefinedExpression if !udf.deterministic => udf +} leafNondeterministic.distinct.map { e => val ne = e match { case n: NamedExpression => n diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala index 4bd6418..0979a18 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala @@ -54,7 +54,7 @@ import org.apache.spark.util.Utils """, since = "2.0.0") case class CallMethodViaReflection(children: Seq[Expression]) - extends Expression with CodegenFallback { + extends Nondeterministic with CodegenFallback { override def prettyName: String = getTagValue(FunctionRegistry.FUNC_ALIAS).getOrElse("reflect") @@ -77,11 +77,11 @@ case class CallMethodViaReflection(children: Seq[Expression]) } } - override lazy val deterministic: Boolean = false override def nullable: Boolean = true override val dataType: DataType = StringType + override protected def initializeInternal(partitionIndex: Int): Unit = {} - override def eval(input: InternalRow): Any = { + override protected def evalInternal(input: InternalRow): Any = { var i = 0 while (i < argExprs.length) { buffer(i) = argExprs(i).eval(input).asInstanceOf[Object] diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala index f5bfdc5..468b8c0 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala @@ -984,4 +984,26 @@ class AnalysisSuite extends AnalysisTest with Matchers { s"please set '${SQLConf.ANALYZER_MAX_ITERATIONS.key}' to a larger value.")) }
[spark] branch master updated (5f9a7fe -> 839d689)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5f9a7fe [SPARK-33428][SQL] Conv UDF use BigInt to avoid Long value overflow add 839d689 [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 5 - .../expressions/CallMethodViaReflection.scala | 6 +++--- .../sql/catalyst/analysis/AnalysisSuite.scala | 22 ++ 3 files changed, 29 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bf2c88c -> 5f9a7fe)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bf2c88c [SPARK-33716][K8S] Fix potential race condition during pod termination add 5f9a7fe [SPARK-33428][SQL] Conv UDF use BigInt to avoid Long value overflow No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/NumberConverter.scala | 64 +- .../expressions/MathExpressionsSuite.scala | 6 +- .../sql/catalyst/util/NumberConverterSuite.scala | 4 +- .../org/apache/spark/sql/MathFunctionsSuite.scala | 2 +- .../hive/execution/HiveCompatibilitySuite.scala| 4 +- 5 files changed, 23 insertions(+), 57 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-33770][SQL][TESTS][2.4] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 2964626 [SPARK-33770][SQL][TESTS][2.4] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path 2964626 is described below commit 296462636fc8e204052cc2b135400f4060c47291 Author: Max Gekk AuthorDate: Mon Dec 14 19:50:07 2020 +0900 [SPARK-33770][SQL][TESTS][2.4] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path ### What changes were proposed in this pull request? Modify the tests that add partitions with `LOCATION`, and where the number of nested folders in `LOCATION` doesn't match to the number of partitioned columns. In that case, `ALTER TABLE .. DROP PARTITION` tries to access (delete) folder out of the "base" path in `LOCATION`. The problem belongs to Hive's MetaStore method `drop_partition_common`: https://github.com/apache/hive/blob/8696c82d07d303b6dbb69b4d443ab6f2b241b251/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L4876 which tries to delete empty partition sub-folders recursively starting from the most deeper partition sub-folder up to the base folder. In the case when the number of sub-folder is not equal to the number of partitioned columns `part_vals.size()`, the method will try to list and delete folders out of the base path. ### Why are the changes needed? To fix test failures like https://github.com/apache/spark/pull/30643#issuecomment-743774733: ``` org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite.ALTER TABLE .. ADD PARTITION Hive V1: SPARK-33521: universal type conversions of partition values sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897 does not exist; at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112) at org.apache.spark.sql.hive.HiveExternalCatalog.dropPartitions(HiveExternalCatalog.scala:1014) ... Caused by: sbt.ForkMain$ForkError: org.apache.hadoop.hive.metastore.api.MetaException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897 does not exist at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partition_with_environment_context(HiveMetaStore.java:3381) at sun.reflect.GeneratedMethodAccessor304.invoke(Unknown Source) ``` The issue can be reproduced by the following steps: 1. Create a base folder, for example: `/Users/maximgekk/tmp/part-location` 2. Create a sub-folder in the base folder and drop permissions for it: ``` $ mkdir /Users/maximgekk/tmp/part-location/aaa $ chmod a-rwx chmod a-rwx /Users/maximgekk/tmp/part-location/aaa $ ls -al /Users/maximgekk/tmp/part-location total 0 drwxr-xr-x 3 maximgekk staff96 Dec 13 18:42 . drwxr-xr-x 33 maximgekk staff 1056 Dec 13 18:32 .. d- 2 maximgekk staff64 Dec 13 18:42 aaa ``` 3. Create a table with a partition folder in the base folder: ```sql spark-sql> create table tbl (id int) partitioned by (part0 int, part1 int); spark-sql> alter table tbl add partition (part0=1,part1=2) location '/Users/maximgekk/tmp/part-location/tbl'; ``` 4. Try to drop this partition: ``` spark-sql> alter table tbl drop partition (part0=1,part1=2); 20/12/13 18:46:07 ERROR HiveClientImpl: == Attempt to drop the partition specs in table 'tbl' database 'default': Map(part0 -> 1, part1 -> 2) In this attempt, the following partitions have been dropped successfully: The remaining partitions have not been dropped: [1, 2] == Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing file:/Users/maximgekk/tmp/part-location/aaa; org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing file:/Users/maximgekk/tmp/part-location/aaa; ``` The command fails because it tries to access to the sub-folder `aaa` that is out of the partition path `/Users/maximgekk/tmp/part-location/tbl`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the affected tests from local IDEA which does not have access to folders out of partition paths. Lead-authored-by: Max Gekk Co-authored-by: Maxim Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 9160d59ae379910ca3bbd04ee25d336afff28abd)
[spark] branch branch-3.1 updated: [SPARK-33716][K8S] Fix potential race condition during pod termination
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new b44e650 [SPARK-33716][K8S] Fix potential race condition during pod termination b44e650 is described below commit b44e65042a8ea6cfd44796b83601d0a28beb4305 Author: Holden Karau AuthorDate: Mon Dec 14 02:09:59 2020 -0800 [SPARK-33716][K8S] Fix potential race condition during pod termination ### What changes were proposed in this pull request? Check that the pod state is not pending or running even if there is a deletion timestamp. ### Why are the changes needed? This can occur when the pod state and deletion timestamp are not updated by etcd in sync & we get a pod snapshot during an inconsistent view. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual testing with local version of Minikube on an overloaded computer that caused out of sync updates. Closes #30693 from holdenk/SPARK-33716-decommissioning-race-condition-during-pod-snapshot. Authored-by: Holden Karau Signed-off-by: Dongjoon Hyun (cherry picked from commit bf2c88ccaebd8e27d9fc27c55c9955129541d3e1) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala index be75311..e81d213 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala @@ -93,7 +93,8 @@ object ExecutorPodsSnapshot extends Logging { ( pod.getStatus == null || pod.getStatus.getPhase == null || -pod.getStatus.getPhase.toLowerCase(Locale.ROOT) != "terminating" + (pod.getStatus.getPhase.toLowerCase(Locale.ROOT) != "terminating" && + pod.getStatus.getPhase.toLowerCase(Locale.ROOT) != "running") )) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cd0356df -> bf2c88c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cd0356df [SPARK-33673][SQL] Avoid push down partition filters to ParquetScan for DataSourceV2 add bf2c88c [SPARK-33716][K8S] Fix potential race condition during pod termination No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a84c8d8 -> cd0356df)
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a84c8d8 [SPARK-33751][SQL] Migrate ALTER VIEW ... AS command to use UnresolvedView to resolve the identifier add cd0356df [SPARK-33673][SQL] Avoid push down partition filters to ParquetScan for DataSourceV2 No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/v2/parquet/ParquetScanBuilder.scala | 2 +- sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33770][SQL][TESTS][3.1][3.0] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d652b47 [SPARK-33770][SQL][TESTS][3.1][3.0] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path d652b47 is described below commit d652b47ce2693eedf3d465e50a252f6b80fe8ba7 Author: Max Gekk AuthorDate: Mon Dec 14 18:13:42 2020 +0900 [SPARK-33770][SQL][TESTS][3.1][3.0] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path ### What changes were proposed in this pull request? Modify the tests that add partitions with `LOCATION`, and where the number of nested folders in `LOCATION` doesn't match to the number of partitioned columns. In that case, `ALTER TABLE .. DROP PARTITION` tries to access (delete) folder out of the "base" path in `LOCATION`. The problem belongs to Hive's MetaStore method `drop_partition_common`: https://github.com/apache/hive/blob/8696c82d07d303b6dbb69b4d443ab6f2b241b251/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L4876 which tries to delete empty partition sub-folders recursively starting from the most deeper partition sub-folder up to the base folder. In the case when the number of sub-folder is not equal to the number of partitioned columns `part_vals.size()`, the method will try to list and delete folders out of the base path. ### Why are the changes needed? To fix test failures like https://github.com/apache/spark/pull/30643#issuecomment-743774733: ``` org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite.ALTER TABLE .. ADD PARTITION Hive V1: SPARK-33521: universal type conversions of partition values sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897 does not exist; at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112) at org.apache.spark.sql.hive.HiveExternalCatalog.dropPartitions(HiveExternalCatalog.scala:1014) ... Caused by: sbt.ForkMain$ForkError: org.apache.hadoop.hive.metastore.api.MetaException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897 does not exist at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partition_with_environment_context(HiveMetaStore.java:3381) at sun.reflect.GeneratedMethodAccessor304.invoke(Unknown Source) ``` The issue can be reproduced by the following steps: 1. Create a base folder, for example: `/Users/maximgekk/tmp/part-location` 2. Create a sub-folder in the base folder and drop permissions for it: ``` $ mkdir /Users/maximgekk/tmp/part-location/aaa $ chmod a-rwx chmod a-rwx /Users/maximgekk/tmp/part-location/aaa $ ls -al /Users/maximgekk/tmp/part-location total 0 drwxr-xr-x 3 maximgekk staff96 Dec 13 18:42 . drwxr-xr-x 33 maximgekk staff 1056 Dec 13 18:32 .. d- 2 maximgekk staff64 Dec 13 18:42 aaa ``` 3. Create a table with a partition folder in the base folder: ```sql spark-sql> create table tbl (id int) partitioned by (part0 int, part1 int); spark-sql> alter table tbl add partition (part0=1,part1=2) location '/Users/maximgekk/tmp/part-location/tbl'; ``` 4. Try to drop this partition: ``` spark-sql> alter table tbl drop partition (part0=1,part1=2); 20/12/13 18:46:07 ERROR HiveClientImpl: == Attempt to drop the partition specs in table 'tbl' database 'default': Map(part0 -> 1, part1 -> 2) In this attempt, the following partitions have been dropped successfully: The remaining partitions have not been dropped: [1, 2] == Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing file:/Users/maximgekk/tmp/part-location/aaa; org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing file:/Users/maximgekk/tmp/part-location/aaa; ``` The command fails because it tries to access to the sub-folder `aaa` that is out of the partition path `/Users/maximgekk/tmp/part-location/tbl`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the affected tests from local IDEA which does not have access to folders out of partition paths. Lead-authored-by: Max Gekk Co-authored-by: Maxim Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 9160d59ae379910ca3bbd04ee25d336afff28abd)
[spark] branch branch-3.1 updated (1559135 -> 01294f8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 1559135 [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases add 01294f8 [SPARK-33770][SQL][TESTS][3.1][3.0] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/catalog/ExternalCatalogSuite.scala| 9 +++-- .../scala/org/apache/spark/sql/hive/StatisticsSuite.scala| 12 .../org/apache/spark/sql/hive/execution/HiveDDLSuite.scala | 4 ++-- 3 files changed, 17 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b7c8210 -> a84c8d8)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b7c8210 [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases add a84c8d8 [SPARK-33751][SQL] Migrate ALTER VIEW ... AS command to use UnresolvedView to resolve the identifier No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 8 +--- .../spark/sql/catalyst/plans/logical/statements.scala | 8 .../spark/sql/catalyst/plans/logical/v2Commands.scala | 10 ++ .../spark/sql/catalyst/parser/DDLParserSuite.scala| 6 -- .../sql/catalyst/analysis/ResolveSessionCatalog.scala | 19 ++- .../apache/spark/sql/execution/command/views.scala| 3 --- .../spark/sql/connector/DataSourceV2SQLSuite.scala| 13 + .../org/apache/spark/sql/execution/SQLViewSuite.scala | 11 --- 8 files changed, 34 insertions(+), 44 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 1559135 [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases 1559135 is described below commit 1559135ea7e5cc41916d3b22fe95cfa307088149 Author: Linhong Liu AuthorDate: Mon Dec 14 08:31:50 2020 + [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases ### What changes were proposed in this pull request? Addressed comments in PR #30567, including: 1. add test case for SPARK-33647 and SPARK-33142 2. add migration guide 3. add `getRawTempView` and `getRawGlobalTempView` to return the raw view info (i.e. TemporaryViewRelation) 4. other minor code clean ### Why are the changes needed? Code clean and more test cases ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing and newly added test cases Closes #30666 from linhongliu-db/SPARK-33142-followup. Lead-authored-by: Linhong Liu Co-authored-by: Linhong Liu <67896261+linhongliu...@users.noreply.github.com> Signed-off-by: Wenchen Fan (cherry picked from commit b7c82101352078fb10ab1822bc745c8b4fbb2590) Signed-off-by: Wenchen Fan --- docs/sql-migration-guide.md| 4 +- .../sql/catalyst/catalog/SessionCatalog.scala | 44 ++ .../plans/logical/basicLogicalOperators.scala | 16 .../apache/spark/sql/execution/command/views.scala | 16 ++-- .../org/apache/spark/sql/CachedTableSuite.scala| 13 +++ .../apache/spark/sql/execution/SQLViewSuite.scala | 14 --- .../spark/sql/execution/SQLViewTestSuite.scala | 24 +++- 7 files changed, 79 insertions(+), 52 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 2bc04a0..d3ac76f 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -52,7 +52,9 @@ license: | - In Spark 3.1, refreshing a table will trigger an uncache operation for all other caches that reference the table, even if the table itself is not cached. In Spark 3.0 the operation will only be triggered if the table itself is cached. - - In Spark 3.1, creating or altering a view will capture runtime SQL configs and store them as view properties. These configs will be applied during the parsing and analysis phases of the view resolution. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.useCurrentConfigsForView` to `true`. + - In Spark 3.1, creating or altering a permanent view will capture runtime SQL configs and store them as view properties. These configs will be applied during the parsing and analysis phases of the view resolution. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.useCurrentConfigsForView` to `true`. + + - In Spark 3.1, the temporary view will have same behaviors with the permanent view, i.e. capture and store runtime SQL configs, SQL text, catalog and namespace. The capatured view properties will be applied during the parsing and analysis phases of the view resolution. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.storeAnalyzedPlanForView` to `true`. - Since Spark 3.1, CHAR/CHARACTER and VARCHAR types are supported in the table schema. Table scan/insertion will respect the char/varchar semantic. If char/varchar is used in places other than table schema, an exception will be thrown (CAST is an exception that simply treats char/varchar as string like before). To restore the behavior before Spark 3.1, which treats them as STRING types and ignores a length parameter, e.g. `CHAR(4)`, you can set `spark.sql.legacy.charVarcharAsString` to [...] diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index 51d7e96..0d259c9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -605,8 +605,16 @@ class SessionCatalog( /** * Return a local temporary view exactly as it was stored. */ + def getRawTempView(name: String): Option[LogicalPlan] = synchronized { +tempViews.get(formatTableName(name)) + } + + /** + * Generate a [[View]] operator from the view description if the view stores sql text, + * otherwise, it is same to `getRawTempView` + */ def getTempView(name: String): Option[LogicalPlan] = synchronized { -tempViews.get(formatTableName(name)).map(getTempViewPlan) +getRawTempView(name).map(getTempViewPlan) } def getTempViewNames(): Seq[String] =
[spark] branch master updated (e7fe92f -> b7c8210)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e7fe92f [SPARK-33546][SQL] Enable row format file format validation in CREATE TABLE LIKE add b7c8210 [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 4 +- .../sql/catalyst/catalog/SessionCatalog.scala | 44 ++ .../plans/logical/basicLogicalOperators.scala | 16 .../apache/spark/sql/execution/command/views.scala | 16 ++-- .../org/apache/spark/sql/CachedTableSuite.scala| 13 +++ .../apache/spark/sql/execution/SQLViewSuite.scala | 14 --- .../spark/sql/execution/SQLViewTestSuite.scala | 24 +++- 7 files changed, 79 insertions(+), 52 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (817f58d -> e7fe92f)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 817f58d [SPARK-33768][SQL] Remove `retainData` from `AlterTableDropPartition` add e7fe92f [SPARK-33546][SQL] Enable row format file format validation in CREATE TABLE LIKE No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/parser/AstBuilder.scala | 5 +- .../spark/sql/execution/SparkSqlParser.scala | 9 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 130 - 3 files changed, 108 insertions(+), 36 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9160d59 -> 817f58d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9160d59 [SPARK-33770][SQL][TESTS] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path add 817f58d [SPARK-33768][SQL] Remove `retainData` from `AlterTableDropPartition` No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala | 2 +- .../apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala | 2 +- .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 3 +-- .../org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala| 3 +-- .../scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala | 6 ++ .../apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala | 4 ++-- .../spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala | 2 +- 7 files changed, 9 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org