date:20201214

[spark] branch branch-3.1 updated: [SPARK-33653][SQL][3.1] DSv2: REFRESH TABLE should recache the table itself

2020-12-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 5715322  [SPARK-33653][SQL][3.1] DSv2: REFRESH TABLE should recache 
the table itself
5715322 is described below

commit 5715322de3a44ad5d12adc663722168056b69957
Author: Chao Sun 
AuthorDate: Mon Dec 14 22:11:10 2020 -0800

[SPARK-33653][SQL][3.1] DSv2: REFRESH TABLE should recache the table itself

This is a backport of #30742 for branch-3.1

### What changes were proposed in this pull request?

This changes DSv2 refresh table semantics to also recache the target table 
itself.

### Why are the changes needed?

Currently "REFRESH TABLE" in DSv2 only invalidate all caches referencing 
the table. With #30403 merged which adds support for caching a DSv2 table, we 
should also recache the target table itself to make the behavior consistent 
with DSv1.

### Does this PR introduce _any_ user-facing change?

Yes, now refreshing table in DSv2 also recache the target table itself.
### How was this patch tested?

Added coverage of this new behavior in the existing UT for v2 refresh table 
command.

Closes #30769 from sunchao/SPARK-33653-branch-3.1.

Authored-by: Chao Sun 
Signed-off-by: Dongjoon Hyun 
---
 .../datasources/v2/DataSourceV2Strategy.scala | 16 +---
 .../execution/datasources/v2/RefreshTableExec.scala   |  1 -
 .../spark/sql/connector/DataSourceV2SQLSuite.scala| 19 +++
 3 files changed, 32 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
index 5289d35..97dab4b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql.execution.datasources.v2
 
 import scala.collection.JavaConverters._
 
-import org.apache.spark.sql.{AnalysisException, SparkSession, Strategy}
+import org.apache.spark.sql.{AnalysisException, Dataset, SparkSession, 
Strategy}
 import org.apache.spark.sql.catalyst.analysis.{ResolvedNamespace, 
ResolvedPartitionSpec, ResolvedTable}
 import org.apache.spark.sql.catalyst.expressions.{And, Expression, 
NamedExpression, PredicateHelper, SubqueryExpression}
 import org.apache.spark.sql.catalyst.planning.PhysicalOperation
@@ -56,9 +56,19 @@ class DataSourceV2Strategy(session: SparkSession) extends 
Strategy with Predicat
 session.sharedState.cacheManager.recacheByPlan(session, r)
   }
 
-  private def invalidateCache(r: ResolvedTable)(): Unit = {
+  private def invalidateCache(r: ResolvedTable, recacheTable: Boolean = 
false)(): Unit = {
 val v2Relation = DataSourceV2Relation.create(r.table, Some(r.catalog), 
Some(r.identifier))
+val cache = session.sharedState.cacheManager.lookupCachedData(v2Relation)
 session.sharedState.cacheManager.uncacheQuery(session, v2Relation, cascade 
= true)
+if (recacheTable && cache.isDefined) {
+  // save the cache name and cache level for recreation
+  val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName
+  val cacheLevel = cache.get.cachedRepresentation.cacheBuilder.storageLevel
+
+  // recache with the same name and cache level.
+  val ds = Dataset.ofRows(session, v2Relation)
+  session.sharedState.cacheManager.cacheQuery(ds, cacheName, cacheLevel)
+}
   }
 
   override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
@@ -137,7 +147,7 @@ class DataSourceV2Strategy(session: SparkSession) extends 
Strategy with Predicat
   }
 
 case RefreshTable(r: ResolvedTable) =>
-  RefreshTableExec(r.catalog, r.identifier, invalidateCache(r)) :: Nil
+  RefreshTableExec(r.catalog, r.identifier, invalidateCache(r, 
recacheTable = true)) :: Nil
 
 case ReplaceTable(catalog, ident, schema, parts, props, orCreate) =>
   val propsWithOwner = CatalogV2Util.withDefaultOwnership(props)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala
index 994583c..e66f0a1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala
@@ -29,7 +29,6 @@ case class RefreshTableExec(
 catalog.invalidateTable(ident)
 
 // invalidate all caches referencing the given table
-// TODO(SPARK-33437): re-cache the table itself once we

[spark] branch master updated (366beda -> 141e26d)

2020-12-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 366beda  [SPARK-33785][SQL] Migrate ALTER TABLE ... RECOVER PARTITIONS 
to use UnresolvedTable to resolve the identifier
 add 141e26d  [SPARK-33767][SQL][TESTS] Unify v1 and v2 ALTER TABLE .. DROP 
PARTITION tests

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/parser/DDLParserSuite.scala |  50 +--
 .../connector/AlterTablePartitionV2SQLSuite.scala  | 112 
 .../AlterTableDropPartitionParserSuite.scala   |  88 
 .../command/AlterTableDropPartitionSuiteBase.scala | 149 +
 .../spark/sql/execution/command/DDLSuite.scala |  57 
 ...te.scala => AlterTableDropPartitionSuite.scala} |  40 +++---
 ...te.scala => AlterTableDropPartitionSuite.scala} |  38 --
 .../spark/sql/hive/execution/HiveDDLSuite.scala|   4 -
 ...te.scala => AlterTableDropPartitionSuite.scala} |  26 +++-
 9 files changed, 312 insertions(+), 252 deletions(-)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionParserSuite.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionSuiteBase.scala
 copy 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/{AlterTableAddPartitionSuite.scala
 => AlterTableDropPartitionSuite.scala} (55%)
 copy 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/{ShowPartitionsSuite.scala
 => AlterTableDropPartitionSuite.scala} (56%)
 copy 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/{AlterTableAddPartitionSuite.scala
 => AlterTableDropPartitionSuite.scala} (56%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a99a47c -> 366beda)

2020-12-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a99a47c  [SPARK-33748][K8S] Respect environment variables and 
configurations for Python executables
 add 366beda  [SPARK-33785][SQL] Migrate ALTER TABLE ... RECOVER PARTITIONS 
to use UnresolvedTable to resolve the identifier

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala| 7 +--
 .../org/apache/spark/sql/catalyst/plans/logical/statements.scala   | 6 --
 .../org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala   | 7 +++
 .../org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala  | 3 ++-
 .../apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala | 5 ++---
 .../spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala  | 4 
 .../apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala | 3 ++-
 .../test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala   | 4 +++-
 .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala   | 4 +++-
 9 files changed, 28 insertions(+), 15 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (49d3256 -> a99a47c)

2020-12-14 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 49d3256  [SPARK-33653][SQL] DSv2: REFRESH TABLE should recache the 
table itself
 add a99a47c  [SPARK-33748][K8S] Respect environment variables and 
configurations for Python executables

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/SparkSubmit.scala  | 54 +++---
 docs/running-on-kubernetes.md  |  5 +-
 .../scala/org/apache/spark/deploy/k8s/Config.scala | 16 +++-
 .../org/apache/spark/deploy/k8s/Constants.scala|  3 +-
 .../k8s/features/DriverCommandFeatureStep.scala| 37 --
 .../features/DriverCommandFeatureStepSuite.scala   | 57 +--
 .../src/main/dockerfiles/spark/entrypoint.sh   | 10 +--
 .../k8s/integrationtest/DepsTestsSuite.scala   | 85 --
 .../k8s/integrationtest/KubernetesSuite.scala  |  6 +-
 .../integrationtest/KubernetesTestComponents.scala |  5 +-
 .../deploy/k8s/integrationtest/ProcessUtils.scala  |  5 +-
 .../spark/deploy/k8s/integrationtest/Utils.scala   |  9 ++-
 .../integration-tests/tests/py_container_checks.py |  2 +-
 .../{pyfiles.py => python_executable_check.py} | 27 +++
 14 files changed, 228 insertions(+), 93 deletions(-)
 copy resource-managers/kubernetes/integration-tests/tests/{pyfiles.py => 
python_executable_check.py} (62%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f156718 -> 49d3256)

2020-12-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f156718  [SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS
 add 49d3256  [SPARK-33653][SQL] DSv2: REFRESH TABLE should recache the 
table itself

No new revisions were added by this update.

Summary of changes:
 .../datasources/v2/DataSourceV2Strategy.scala | 16 +---
 .../execution/datasources/v2/RefreshTableExec.scala   |  1 -
 .../spark/sql/connector/DataSourceV2SQLSuite.scala| 19 +++
 3 files changed, 32 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS

2020-12-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new ac4d04e  [SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS
ac4d04e is described below

commit ac4d04e480004e8b5de55e1323d27bd3c8bf28be
Author: Max Gekk 
AuthorDate: Mon Dec 14 14:28:47 2020 -0800

[SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS

### What changes were proposed in this pull request?
List partitions returned by the V2 `SHOW PARTITIONS` command in 
alphabetical order.

### Why are the changes needed?
To have the same behavior as:
1. V1 in-memory catalog, see 
https://github.com/apache/spark/blob/a28ed86a387b286745b30cd4d90b3d558205a5a7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala#L546
2. V1 Hive catalogs, see 
https://github.com/apache/spark/blob/fab2995972761503563fa2aa547c67047c51bd33/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L715

### Does this PR introduce _any_ user-facing change?
Yes, after the changes, V2 SHOW PARTITIONS sorts its output.

### How was this patch tested?
Added new UT to the base trait `ShowPartitionsSuiteBase` which contains 
tests for V1 and V2.

Closes #30764 from MaxGekk/sort-show-partitions.

Authored-by: Max Gekk 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit f156718587fc33b9bf8e5abc4ae1f6fa0a5da887)
Signed-off-by: Dongjoon Hyun 
---
 .../execution/datasources/v2/ShowPartitionsExec.scala   |  5 +++--
 .../sql/execution/command/ShowPartitionsSuiteBase.scala | 17 +
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala
index c4b6aa8..416dce6 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala
@@ -49,7 +49,7 @@ case class ShowPartitionsExec(
 val len = schema.length
 val partitions = new Array[String](len)
 val timeZoneId = SQLConf.get.sessionLocalTimeZone
-partitionIdentifiers.map { row =>
+val output = partitionIdentifiers.map { row =>
   var i = 0
   while (i < len) {
 val dataType = schema(i).dataType
@@ -59,7 +59,8 @@ case class ShowPartitionsExec(
 partitions(i) = escapePathName(schema(i).name) + "=" + 
escapePathName(partValueStr)
 i += 1
   }
-  InternalRow(UTF8String.fromString(partitions.mkString("/")))
+  partitions.mkString("/")
 }
+output.sorted.map(p => InternalRow(UTF8String.fromString(p)))
   }
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala
index b695dec..56c6e5a 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala
@@ -173,4 +173,21 @@ trait ShowPartitionsSuiteBase extends QueryTest with 
SQLTestUtils {
   }
 }
   }
+
+  test("SPARK-33777: sorted output") {
+withNamespace(s"$catalog.ns") {
+  sql(s"CREATE NAMESPACE $catalog.ns")
+  val table = s"$catalog.ns.dateTable"
+  withTable(table) {
+sql(s"""
+  |CREATE TABLE $table (id int, part string)
+  |$defaultUsing
+  |PARTITIONED BY (part)""".stripMargin)
+sql(s"ALTER TABLE $table ADD PARTITION(part = 'b')")
+sql(s"ALTER TABLE $table ADD PARTITION(part = 'a')")
+val partitions = sql(s"show partitions $table")
+assert(partitions.first().getString(0) === "part=a")
+  }
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5885cc1 -> 412d86e)

2020-12-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5885cc1  [SPARK-33261][K8S] Add a developer API for custom feature 
steps
 add 412d86e  [SPARK-33771][SQL][TESTS] Fix Invalid value for HourOfAmPm 
when testing on JDK 14

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/util/TimestampFormatterSuite.scala  | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-33261][K8S] Add a developer API for custom feature steps

2020-12-14 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5885cc1  [SPARK-33261][K8S] Add a developer API for custom feature 
steps
5885cc1 is described below

commit 5885cc15cae9c9780530e235d2bd4bd6beda5dbb
Author: Holden Karau 
AuthorDate: Mon Dec 14 12:05:28 2020 -0800

[SPARK-33261][K8S] Add a developer API for custom feature steps

### What changes were proposed in this pull request?

Add a developer API for custom driver & executor feature steps.

### Why are the changes needed?

While we allow templates for the basis of pod creation, some deployments 
need more flexibility in how the pods are configured. This adds a developer API 
for custom deployments.

### Does this PR introduce _any_ user-facing change?

New developer API.

### How was this patch tested?

Extended tests to verify custom step is applied when configured.

Closes #30206 from 
holdenk/SPARK-33261-allow-people-to-extend-pod-feature-steps.

Authored-by: Holden Karau 
Signed-off-by: Holden Karau 
---
 .../scala/org/apache/spark/deploy/k8s/Config.scala | 20 ++
 .../org/apache/spark/deploy/k8s/SparkPod.scala | 11 +++-
 .../k8s/features/KubernetesFeatureConfigStep.scala |  7 +-
 .../k8s/submit/KubernetesDriverBuilder.scala   |  8 ++-
 .../cluster/k8s/KubernetesExecutorBuilder.scala|  8 ++-
 .../apache/spark/deploy/k8s/PodBuilderSuite.scala  | 76 ++
 .../k8s/submit/KubernetesDriverBuilderSuite.scala  |  5 +-
 .../k8s/KubernetesExecutorBuilderSuite.scala   |  4 ++
 8 files changed, 134 insertions(+), 5 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
index c28d6fd..40609ae 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
@@ -219,6 +219,26 @@ private[spark] object Config extends Logging {
   .stringConf
   .createOptional
 
+  val KUBERNETES_DRIVER_POD_FEATURE_STEPS =
+ConfigBuilder("spark.kubernetes.driver.pod.featureSteps")
+  .doc("Class names of an extra driver pod feature step implementing " +
+"KubernetesFeatureConfigStep. This is a developer API. Comma 
separated. " +
+"Runs after all of Spark internal feature steps.")
+  .version("3.2.0")
+  .stringConf
+  .toSequence
+  .createWithDefault(Nil)
+
+  val KUBERNETES_EXECUTOR_POD_FEATURE_STEPS =
+ConfigBuilder("spark.kubernetes.executor.pod.featureSteps")
+  .doc("Class name of an extra executor pod feature step implementing " +
+"KubernetesFeatureConfigStep. This is a developer API. Comma 
separated. " +
+"Runs after all of Spark internal feature steps.")
+  .version("3.2.0")
+  .stringConf
+  .toSequence
+  .createWithDefault(Nil)
+
   val KUBERNETES_ALLOCATION_BATCH_SIZE =
 ConfigBuilder("spark.kubernetes.allocation.batch.size")
   .doc("Number of pods to launch at once in each round of executor 
allocation.")
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPod.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPod.scala
index fd11963..c2298e7 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPod.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPod.scala
@@ -18,7 +18,16 @@ package org.apache.spark.deploy.k8s
 
 import io.fabric8.kubernetes.api.model.{Container, ContainerBuilder, Pod, 
PodBuilder}
 
-private[spark] case class SparkPod(pod: Pod, container: Container) {
+import org.apache.spark.annotation.{DeveloperApi, Unstable}
+
+/**
+ * :: DeveloperApi ::
+ *
+ * Represents a SparkPod consisting of pod and the container within the pod.
+ */
+@Unstable
+@DeveloperApi
+case class SparkPod(pod: Pod, container: Container) {
 
   /**
* Convenience method to apply a series of chained transformations to a pod.
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KubernetesFeatureConfigStep.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KubernetesFeatureConfigStep.scala
index 58cdaa3..3fec926 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KubernetesFeatureConfigStep.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KubernetesFeatureConfigStep.scala
@@ -18,13 +18,18 @@ package

[spark] branch master updated (82aca7e -> bb60fb1)

2020-12-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 82aca7e  [SPARK-33779][SQL] DataSource V2: API to request distribution 
and ordering on write
 add bb60fb1  [SPARK-33779][SQL][FOLLOW-UP] Fix Java Linter error

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/spark/sql/connector/write/WriteBuilder.java| 2 --
 1 file changed, 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (839d689 -> 82aca7e)

2020-12-14 Thread blue

This is an automated email from the ASF dual-hosted git repository.

blue pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 839d689  [SPARK-33733][SQL] PullOutNondeterministic should check and 
collect deterministic field
 add 82aca7e  [SPARK-33779][SQL] DataSource V2: API to request distribution 
and ordering on write

No new revisions were added by this update.

Summary of changes:
 .../distributions/ClusteredDistribution.java   | 35 
 .../sql/connector/distributions/Distribution.java  | 28 +++
 .../sql/connector/distributions/Distributions.java | 56 +
 .../distributions/OrderedDistribution.java | 35 
 .../distributions/UnspecifiedDistribution.java | 28 +++
 .../sql/connector/expressions/Expressions.java | 11 +++
 .../sql/connector/expressions/NullOrdering.java| 42 ++
 .../sql/connector/expressions/SortDirection.java   | 42 ++
 .../spark/sql/connector/expressions/SortOrder.java | 43 ++
 .../write/RequiresDistributionAndOrdering.java | 57 +
 .../write/{WriteBuilder.java => Write.java}| 33 +---
 .../spark/sql/connector/write/WriteBuilder.java| 39 ++---
 .../connector/distributions/distributions.scala| 59 +
 .../sql/connector/expressions/expressions.scala| 96 ++
 14 files changed, 581 insertions(+), 23 deletions(-)
 create mode 100644 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/ClusteredDistribution.java
 create mode 100644 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/Distribution.java
 create mode 100644 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/Distributions.java
 create mode 100644 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/OrderedDistribution.java
 create mode 100644 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/UnspecifiedDistribution.java
 create mode 100644 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/NullOrdering.java
 create mode 100644 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/SortDirection.java
 create mode 100644 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/SortOrder.java
 create mode 100644 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RequiresDistributionAndOrdering.java
 copy 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/{WriteBuilder.java
 => Write.java} (66%)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/distributions/distributions.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field

2020-12-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 7dc4e32  [SPARK-33733][SQL] PullOutNondeterministic should check and 
collect deterministic field
7dc4e32 is described below

commit 7dc4e32c114205ebb035512f6b7fd3f26154d1f0
Author: ulysses-you 
AuthorDate: Mon Dec 14 14:35:24 2020 +

[SPARK-33733][SQL] PullOutNondeterministic should check and collect 
deterministic field

### What changes were proposed in this pull request?

The deterministic field is wider than `NonDerterministic`, we should keep 
same range between pull out and check analysis.

### Why are the changes needed?

For example
```
select * from values(1), (4) as t(c1) order by 
java_method('java.lang.Math', 'abs', c1)
```

We will get exception since `java_method` deterministic field is false but 
not a `NonDeterministic`
```
Exception in thread "main" org.apache.spark.sql.AnalysisException: 
nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found:
 java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST
in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], 
true
   ;;
```

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Add test.

Closes #30703 from ulysses-you/SPARK-33733.

Authored-by: ulysses-you 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 839d6899adafd9a0695667656d00220d4665895d)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala |  5 -
 .../expressions/CallMethodViaReflection.scala  |  6 +++---
 .../sql/catalyst/analysis/AnalysisSuite.scala  | 22 ++
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index a688a24..c5c0c68 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -2947,7 +2947,10 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
 private def getNondeterToAttr(exprs: Seq[Expression]): Map[Expression, 
NamedExpression] = {
   exprs.filterNot(_.deterministic).flatMap { expr =>
-val leafNondeterministic = expr.collect { case n: Nondeterministic => 
n }
+val leafNondeterministic = expr.collect {
+  case n: Nondeterministic => n
+  case udf: UserDefinedExpression if !udf.deterministic => udf
+}
 leafNondeterministic.distinct.map { e =>
   val ne = e match {
 case n: NamedExpression => n
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
index 4bd6418..0979a18 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala
@@ -54,7 +54,7 @@ import org.apache.spark.util.Utils
   """,
   since = "2.0.0")
 case class CallMethodViaReflection(children: Seq[Expression])
-  extends Expression with CodegenFallback {
+  extends Nondeterministic with CodegenFallback {
 
   override def prettyName: String = 
getTagValue(FunctionRegistry.FUNC_ALIAS).getOrElse("reflect")
 
@@ -77,11 +77,11 @@ case class CallMethodViaReflection(children: 
Seq[Expression])
 }
   }
 
-  override lazy val deterministic: Boolean = false
   override def nullable: Boolean = true
   override val dataType: DataType = StringType
+  override protected def initializeInternal(partitionIndex: Int): Unit = {}
 
-  override def eval(input: InternalRow): Any = {
+  override protected def evalInternal(input: InternalRow): Any = {
 var i = 0
 while (i < argExprs.length) {
   buffer(i) = argExprs(i).eval(input).asInstanceOf[Object]
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
index f5bfdc5..468b8c0 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
@@ -984,4 +984,26 @@ class AnalysisSuite extends AnalysisTest with Matchers {
 s"please set '${SQLConf.ANALYZER_MAX_ITERATIONS.key}' to a larger 
value."))
 }

[spark] branch master updated (5f9a7fe -> 839d689)

2020-12-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5f9a7fe  [SPARK-33428][SQL] Conv UDF use BigInt to avoid  Long value 
overflow
 add 839d689  [SPARK-33733][SQL] PullOutNondeterministic should check and 
collect deterministic field

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  5 -
 .../expressions/CallMethodViaReflection.scala  |  6 +++---
 .../sql/catalyst/analysis/AnalysisSuite.scala  | 22 ++
 3 files changed, 29 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bf2c88c -> 5f9a7fe)

2020-12-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bf2c88c  [SPARK-33716][K8S] Fix potential race condition during pod 
termination
 add 5f9a7fe  [SPARK-33428][SQL] Conv UDF use BigInt to avoid  Long value 
overflow

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/util/NumberConverter.scala  | 64 +-
 .../expressions/MathExpressionsSuite.scala |  6 +-
 .../sql/catalyst/util/NumberConverterSuite.scala   |  4 +-
 .../org/apache/spark/sql/MathFunctionsSuite.scala  |  2 +-
 .../hive/execution/HiveCompatibilitySuite.scala|  4 +-
 5 files changed, 23 insertions(+), 57 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-33770][SQL][TESTS][2.4] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path

2020-12-14 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2964626  [SPARK-33770][SQL][TESTS][2.4] Fix the `ALTER TABLE .. DROP 
PARTITION` tests that delete files out of partition path
2964626 is described below

commit 296462636fc8e204052cc2b135400f4060c47291
Author: Max Gekk 
AuthorDate: Mon Dec 14 19:50:07 2020 +0900

[SPARK-33770][SQL][TESTS][2.4] Fix the `ALTER TABLE .. DROP PARTITION` 
tests that delete files out of partition path

### What changes were proposed in this pull request?
Modify the tests that add partitions with `LOCATION`, and where the number 
of nested folders in `LOCATION` doesn't match to the number of partitioned 
columns. In that case, `ALTER TABLE .. DROP PARTITION` tries to access (delete) 
folder out of the "base" path in `LOCATION`.

The problem belongs to Hive's MetaStore method `drop_partition_common`:

https://github.com/apache/hive/blob/8696c82d07d303b6dbb69b4d443ab6f2b241b251/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L4876
which tries to delete empty partition sub-folders recursively starting from 
the most deeper partition sub-folder up to the base folder. In the case when 
the number of sub-folder is not equal to the number of partitioned columns 
`part_vals.size()`, the method will try to list and delete folders out of the 
base path.

### Why are the changes needed?
To fix test failures like 
https://github.com/apache/spark/pull/30643#issuecomment-743774733:
```

org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite.ALTER 
TABLE .. ADD PARTITION Hive V1: SPARK-33521: universal type conversions of 
partition values
sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: File 
file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897
 does not exist;
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.dropPartitions(HiveExternalCatalog.scala:1014)
...
Caused by: sbt.ForkMain$ForkError: 
org.apache.hadoop.hive.metastore.api.MetaException: File 
file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897
 does not exist
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partition_with_environment_context(HiveMetaStore.java:3381)
at sun.reflect.GeneratedMethodAccessor304.invoke(Unknown Source)
```

The issue can be reproduced by the following steps:
1. Create a base folder, for example: `/Users/maximgekk/tmp/part-location`
2. Create a sub-folder in the base folder and drop permissions for it:
```
$ mkdir /Users/maximgekk/tmp/part-location/aaa
$ chmod a-rwx chmod a-rwx /Users/maximgekk/tmp/part-location/aaa
$ ls -al /Users/maximgekk/tmp/part-location
total 0
drwxr-xr-x   3 maximgekk  staff96 Dec 13 18:42 .
drwxr-xr-x  33 maximgekk  staff  1056 Dec 13 18:32 ..
d-   2 maximgekk  staff64 Dec 13 18:42 aaa
```
3. Create a table with a partition folder in the base folder:
```sql
spark-sql> create table tbl (id int) partitioned by (part0 int, part1 int);
spark-sql> alter table tbl add partition (part0=1,part1=2) location 
'/Users/maximgekk/tmp/part-location/tbl';
```
4. Try to drop this partition:
```
spark-sql> alter table tbl drop partition (part0=1,part1=2);
20/12/13 18:46:07 ERROR HiveClientImpl:
==
Attempt to drop the partition specs in table 'tbl' database 'default':
Map(part0 -> 1, part1 -> 2)
In this attempt, the following partitions have been dropped successfully:

The remaining partitions have not been dropped:
[1, 2]
==

Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Error 
accessing file:/Users/maximgekk/tmp/part-location/aaa;
org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing 
file:/Users/maximgekk/tmp/part-location/aaa;
```
The command fails because it tries to access to the sub-folder `aaa` that 
is out of the partition path `/Users/maximgekk/tmp/part-location/tbl`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the affected tests from local IDEA which does not have access to 
folders out of partition paths.

Lead-authored-by: Max Gekk 
Co-authored-by: Maxim Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 9160d59ae379910ca3bbd04ee25d336afff28abd)

[spark] branch branch-3.1 updated: [SPARK-33716][K8S] Fix potential race condition during pod termination

2020-12-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new b44e650  [SPARK-33716][K8S] Fix potential race condition during pod 
termination
b44e650 is described below

commit b44e65042a8ea6cfd44796b83601d0a28beb4305
Author: Holden Karau 
AuthorDate: Mon Dec 14 02:09:59 2020 -0800

[SPARK-33716][K8S] Fix potential race condition during pod termination

### What changes were proposed in this pull request?

Check that the pod state is not pending or running even if there is a 
deletion timestamp.

### Why are the changes needed?

This can occur when the pod state and deletion timestamp are not updated by 
etcd in sync & we get a pod snapshot during an inconsistent view.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manual testing with local version of Minikube on an overloaded computer 
that caused out of sync updates.

Closes #30693 from 
holdenk/SPARK-33716-decommissioning-race-condition-during-pod-snapshot.

Authored-by: Holden Karau 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit bf2c88ccaebd8e27d9fc27c55c9955129541d3e1)
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
index be75311..e81d213 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
@@ -93,7 +93,8 @@ object ExecutorPodsSnapshot extends Logging {
   (
 pod.getStatus == null ||
 pod.getStatus.getPhase == null ||
-pod.getStatus.getPhase.toLowerCase(Locale.ROOT) != "terminating"
+  (pod.getStatus.getPhase.toLowerCase(Locale.ROOT) != "terminating" &&
+   pod.getStatus.getPhase.toLowerCase(Locale.ROOT) != "running")
   ))
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cd0356df -> bf2c88c)

2020-12-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cd0356df [SPARK-33673][SQL] Avoid push down partition filters to 
ParquetScan for DataSourceV2
 add bf2c88c  [SPARK-33716][K8S] Fix potential race condition during pod 
termination

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a84c8d8 -> cd0356df)

2020-12-14 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a84c8d8  [SPARK-33751][SQL] Migrate ALTER VIEW ... AS command to use 
UnresolvedView to resolve the identifier
 add cd0356df [SPARK-33673][SQL] Avoid push down partition filters to 
ParquetScan for DataSourceV2

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/v2/parquet/ParquetScanBuilder.scala | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33770][SQL][TESTS][3.1][3.0] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path

2020-12-14 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d652b47  [SPARK-33770][SQL][TESTS][3.1][3.0] Fix the `ALTER TABLE .. 
DROP PARTITION` tests that delete files out of partition path
d652b47 is described below

commit d652b47ce2693eedf3d465e50a252f6b80fe8ba7
Author: Max Gekk 
AuthorDate: Mon Dec 14 18:13:42 2020 +0900

[SPARK-33770][SQL][TESTS][3.1][3.0] Fix the `ALTER TABLE .. DROP PARTITION` 
tests that delete files out of partition path

### What changes were proposed in this pull request?
Modify the tests that add partitions with `LOCATION`, and where the number 
of nested folders in `LOCATION` doesn't match to the number of partitioned 
columns. In that case, `ALTER TABLE .. DROP PARTITION` tries to access (delete) 
folder out of the "base" path in `LOCATION`.

The problem belongs to Hive's MetaStore method `drop_partition_common`:

https://github.com/apache/hive/blob/8696c82d07d303b6dbb69b4d443ab6f2b241b251/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L4876
which tries to delete empty partition sub-folders recursively starting from 
the most deeper partition sub-folder up to the base folder. In the case when 
the number of sub-folder is not equal to the number of partitioned columns 
`part_vals.size()`, the method will try to list and delete folders out of the 
base path.

### Why are the changes needed?
To fix test failures like 
https://github.com/apache/spark/pull/30643#issuecomment-743774733:
```

org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite.ALTER 
TABLE .. ADD PARTITION Hive V1: SPARK-33521: universal type conversions of 
partition values
sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: File 
file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897
 does not exist;
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.dropPartitions(HiveExternalCatalog.scala:1014)
...
Caused by: sbt.ForkMain$ForkError: 
org.apache.hadoop.hive.metastore.api.MetaException: File 
file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897
 does not exist
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partition_with_environment_context(HiveMetaStore.java:3381)
at sun.reflect.GeneratedMethodAccessor304.invoke(Unknown Source)
```

The issue can be reproduced by the following steps:
1. Create a base folder, for example: `/Users/maximgekk/tmp/part-location`
2. Create a sub-folder in the base folder and drop permissions for it:
```
$ mkdir /Users/maximgekk/tmp/part-location/aaa
$ chmod a-rwx chmod a-rwx /Users/maximgekk/tmp/part-location/aaa
$ ls -al /Users/maximgekk/tmp/part-location
total 0
drwxr-xr-x   3 maximgekk  staff96 Dec 13 18:42 .
drwxr-xr-x  33 maximgekk  staff  1056 Dec 13 18:32 ..
d-   2 maximgekk  staff64 Dec 13 18:42 aaa
```
3. Create a table with a partition folder in the base folder:
```sql
spark-sql> create table tbl (id int) partitioned by (part0 int, part1 int);
spark-sql> alter table tbl add partition (part0=1,part1=2) location 
'/Users/maximgekk/tmp/part-location/tbl';
```
4. Try to drop this partition:
```
spark-sql> alter table tbl drop partition (part0=1,part1=2);
20/12/13 18:46:07 ERROR HiveClientImpl:
==
Attempt to drop the partition specs in table 'tbl' database 'default':
Map(part0 -> 1, part1 -> 2)
In this attempt, the following partitions have been dropped successfully:

The remaining partitions have not been dropped:
[1, 2]
==

Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Error 
accessing file:/Users/maximgekk/tmp/part-location/aaa;
org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing 
file:/Users/maximgekk/tmp/part-location/aaa;
```
The command fails because it tries to access to the sub-folder `aaa` that 
is out of the partition path `/Users/maximgekk/tmp/part-location/tbl`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the affected tests from local IDEA which does not have access to 
folders out of partition paths.

Lead-authored-by: Max Gekk 
Co-authored-by: Maxim Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 9160d59ae379910ca3bbd04ee25d336afff28abd)

[spark] branch branch-3.1 updated (1559135 -> 01294f8)

2020-12-14 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1559135  [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test 
cases
 add 01294f8  [SPARK-33770][SQL][TESTS][3.1][3.0] Fix the `ALTER TABLE .. 
DROP PARTITION` tests that delete files out of partition path

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/catalog/ExternalCatalogSuite.scala|  9 +++--
 .../scala/org/apache/spark/sql/hive/StatisticsSuite.scala| 12 
 .../org/apache/spark/sql/hive/execution/HiveDDLSuite.scala   |  4 ++--
 3 files changed, 17 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b7c8210 -> a84c8d8)

2020-12-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b7c8210  [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test 
cases
 add a84c8d8  [SPARK-33751][SQL] Migrate ALTER VIEW ... AS command to use 
UnresolvedView to resolve the identifier

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala |  8 +---
 .../spark/sql/catalyst/plans/logical/statements.scala |  8 
 .../spark/sql/catalyst/plans/logical/v2Commands.scala | 10 ++
 .../spark/sql/catalyst/parser/DDLParserSuite.scala|  6 --
 .../sql/catalyst/analysis/ResolveSessionCatalog.scala | 19 ++-
 .../apache/spark/sql/execution/command/views.scala|  3 ---
 .../spark/sql/connector/DataSourceV2SQLSuite.scala| 13 +
 .../org/apache/spark/sql/execution/SQLViewSuite.scala | 11 ---
 8 files changed, 34 insertions(+), 44 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases

2020-12-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 1559135  [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test 
cases
1559135 is described below

commit 1559135ea7e5cc41916d3b22fe95cfa307088149
Author: Linhong Liu 
AuthorDate: Mon Dec 14 08:31:50 2020 +

[SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases

### What changes were proposed in this pull request?
Addressed comments in PR #30567, including:
1. add test case for SPARK-33647 and SPARK-33142
2. add migration guide
3. add `getRawTempView` and `getRawGlobalTempView` to return the raw view 
info (i.e. TemporaryViewRelation)
4. other minor code clean

### Why are the changes needed?
Code clean and more test cases

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing and newly added test cases

Closes #30666 from linhongliu-db/SPARK-33142-followup.

Lead-authored-by: Linhong Liu 
Co-authored-by: Linhong Liu 
<67896261+linhongliu...@users.noreply.github.com>
Signed-off-by: Wenchen Fan 
(cherry picked from commit b7c82101352078fb10ab1822bc745c8b4fbb2590)
Signed-off-by: Wenchen Fan 
---
 docs/sql-migration-guide.md|  4 +-
 .../sql/catalyst/catalog/SessionCatalog.scala  | 44 ++
 .../plans/logical/basicLogicalOperators.scala  | 16 
 .../apache/spark/sql/execution/command/views.scala | 16 ++--
 .../org/apache/spark/sql/CachedTableSuite.scala| 13 +++
 .../apache/spark/sql/execution/SQLViewSuite.scala  | 14 ---
 .../spark/sql/execution/SQLViewTestSuite.scala | 24 +++-
 7 files changed, 79 insertions(+), 52 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 2bc04a0..d3ac76f 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -52,7 +52,9 @@ license: |
 
   - In Spark 3.1, refreshing a table will trigger an uncache operation for all 
other caches that reference the table, even if the table itself is not cached. 
In Spark 3.0 the operation will only be triggered if the table itself is cached.
   
-  - In Spark 3.1, creating or altering a view will capture runtime SQL configs 
and store them as view properties. These configs will be applied during the 
parsing and analysis phases of the view resolution. To restore the behavior 
before Spark 3.1, you can set `spark.sql.legacy.useCurrentConfigsForView` to 
`true`.
+  - In Spark 3.1, creating or altering a permanent view will capture runtime 
SQL configs and store them as view properties. These configs will be applied 
during the parsing and analysis phases of the view resolution. To restore the 
behavior before Spark 3.1, you can set 
`spark.sql.legacy.useCurrentConfigsForView` to `true`.
+
+  - In Spark 3.1, the temporary view will have same behaviors with the 
permanent view, i.e. capture and store runtime SQL configs, SQL text, catalog 
and namespace. The capatured view properties will be applied during the parsing 
and analysis phases of the view resolution. To restore the behavior before 
Spark 3.1, you can set `spark.sql.legacy.storeAnalyzedPlanForView` to `true`.
 
   - Since Spark 3.1, CHAR/CHARACTER and VARCHAR types are supported in the 
table schema. Table scan/insertion will respect the char/varchar semantic. If 
char/varchar is used in places other than table schema, an exception will be 
thrown (CAST is an exception that simply treats char/varchar as string like 
before). To restore the behavior before Spark 3.1, which treats them as STRING 
types and ignores a length parameter, e.g. `CHAR(4)`, you can set 
`spark.sql.legacy.charVarcharAsString` to [...]
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 51d7e96..0d259c9 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -605,8 +605,16 @@ class SessionCatalog(
   /**
* Return a local temporary view exactly as it was stored.
*/
+  def getRawTempView(name: String): Option[LogicalPlan] = synchronized {
+tempViews.get(formatTableName(name))
+  }
+
+  /**
+   * Generate a [[View]] operator from the view description if the view stores 
sql text,
+   * otherwise, it is same to `getRawTempView`
+   */
   def getTempView(name: String): Option[LogicalPlan] = synchronized {
-tempViews.get(formatTableName(name)).map(getTempViewPlan)
+getRawTempView(name).map(getTempViewPlan)
   }
 
   def getTempViewNames(): Seq[String] =

[spark] branch master updated (e7fe92f -> b7c8210)

2020-12-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e7fe92f  [SPARK-33546][SQL] Enable row format file format validation 
in CREATE TABLE LIKE
 add b7c8210  [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test 
cases

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  4 +-
 .../sql/catalyst/catalog/SessionCatalog.scala  | 44 ++
 .../plans/logical/basicLogicalOperators.scala  | 16 
 .../apache/spark/sql/execution/command/views.scala | 16 ++--
 .../org/apache/spark/sql/CachedTableSuite.scala| 13 +++
 .../apache/spark/sql/execution/SQLViewSuite.scala  | 14 ---
 .../spark/sql/execution/SQLViewTestSuite.scala | 24 +++-
 7 files changed, 79 insertions(+), 52 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (817f58d -> e7fe92f)

2020-12-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 817f58d  [SPARK-33768][SQL] Remove `retainData` from 
`AlterTableDropPartition`
 add e7fe92f  [SPARK-33546][SQL] Enable row format file format validation 
in CREATE TABLE LIKE

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/parser/AstBuilder.scala |   5 +-
 .../spark/sql/execution/SparkSqlParser.scala   |   9 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 130 -
 3 files changed, 108 insertions(+), 36 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9160d59 -> 817f58d)

2020-12-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9160d59  [SPARK-33770][SQL][TESTS] Fix the `ALTER TABLE .. DROP 
PARTITION` tests that delete files out of partition path
 add 817f58d  [SPARK-33768][SQL] Remove `retainData` from 
`AlterTableDropPartition`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala  | 2 +-
 .../apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala   | 2 +-
 .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala | 3 +--
 .../org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala| 3 +--
 .../scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala | 6 ++
 .../apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala  | 4 ++--
 .../spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala   | 2 +-
 7 files changed, 9 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-33653][SQL][3.1] DSv2: REFRESH TABLE should recache the table itself

[spark] branch master updated (366beda -> 141e26d)

[spark] branch master updated (a99a47c -> 366beda)

[spark] branch master updated (49d3256 -> a99a47c)

[spark] branch master updated (f156718 -> 49d3256)

[spark] branch branch-3.1 updated: [SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS

[spark] branch master updated (5885cc1 -> 412d86e)

[spark] branch master updated: [SPARK-33261][K8S] Add a developer API for custom feature steps

[spark] branch master updated (82aca7e -> bb60fb1)

[spark] branch master updated (839d689 -> 82aca7e)

[spark] branch branch-3.1 updated: [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field

[spark] branch master updated (5f9a7fe -> 839d689)

[spark] branch master updated (bf2c88c -> 5f9a7fe)

[spark] branch branch-2.4 updated: [SPARK-33770][SQL][TESTS][2.4] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path

[spark] branch branch-3.1 updated: [SPARK-33716][K8S] Fix potential race condition during pod termination

[spark] branch master updated (cd0356df -> bf2c88c)

[spark] branch master updated (a84c8d8 -> cd0356df)

[spark] branch branch-3.0 updated: [SPARK-33770][SQL][TESTS][3.1][3.0] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path

[spark] branch branch-3.1 updated (1559135 -> 01294f8)

[spark] branch master updated (b7c8210 -> a84c8d8)

[spark] branch branch-3.1 updated: [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases

[spark] branch master updated (e7fe92f -> b7c8210)

[spark] branch master updated (817f58d -> e7fe92f)

[spark] branch master updated (9160d59 -> 817f58d)

24 matches

Site Navigation

Mail list logo

Footer information