date:20210923

[spark] branch master updated: [SPARK-33832][SQL] Force skew join code simplification and improvement

2021-09-23 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1b2bb38  [SPARK-33832][SQL] Force skew join code simplification and 
improvement
1b2bb38 is described below

commit 1b2bb38ecda7c8826bdf7a47f4dde8a60a3378da
Author: Wenchen Fan 
AuthorDate: Fri Sep 24 13:13:35 2021 +0800

[SPARK-33832][SQL] Force skew join code simplification and improvement

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/32816 to 
simplify and improve the code:
1. Add a `SkewJoinChildWrapper` to wrap the skew join children, so that 
`EnsureRequirements` rule will skip them and save time
2. Remove `SkewJoinAwareCost` and keep using `SimpleCost`. We can put 
`numSkews` in the first 32 bits.

### Why are the changes needed?

code simplification and improvement

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes #34080 from cloud-fan/follow.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 .../execution/adaptive/OptimizeSkewedJoin.scala| 48 ++
 .../sql/execution/adaptive/simpleCosting.scala | 34 ++-
 .../execution/exchange/EnsureRequirements.scala| 12 +++---
 3 files changed, 39 insertions(+), 55 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
index 2fe5b18..4e97f11 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
@@ -21,7 +21,11 @@ import scala.collection.mutable
 
 import org.apache.commons.io.FileUtils
 
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
 import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.physical.Partitioning
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution._
 import org.apache.spark.sql.execution.exchange.{ENSURE_REQUIREMENTS, 
EnsureRequirements}
@@ -196,8 +200,10 @@ case class OptimizeSkewedJoin(
 }
 logDebug(s"number of skewed partitions: left $numSkewedLeft, right 
$numSkewedRight")
 if (numSkewedLeft > 0 || numSkewedRight > 0) {
-  Some((AQEShuffleReadExec(left, leftSidePartitions.toSeq),
-AQEShuffleReadExec(right, rightSidePartitions.toSeq)))
+  Some((
+SkewJoinChildWrapper(AQEShuffleReadExec(left, 
leftSidePartitions.toSeq)),
+SkewJoinChildWrapper(AQEShuffleReadExec(right, 
rightSidePartitions.toSeq))
+  ))
 } else {
   None
 }
@@ -207,25 +213,19 @@ case class OptimizeSkewedJoin(
 case smj @ SortMergeJoinExec(_, _, joinType, _,
 s1 @ SortExec(_, _, ShuffleStage(left: ShuffleQueryStageExec), _),
 s2 @ SortExec(_, _, ShuffleStage(right: ShuffleQueryStageExec), _), 
false) =>
-  val newChildren = tryOptimizeJoinChildren(left, right, joinType)
-  if (newChildren.isDefined) {
-val (newLeft, newRight) = newChildren.get
-smj.copy(
-  left = s1.copy(child = newLeft), right = s2.copy(child = newRight), 
isSkewJoin = true)
-  } else {
-smj
-  }
+  tryOptimizeJoinChildren(left, right, joinType).map {
+case (newLeft, newRight) =>
+  smj.copy(
+left = s1.copy(child = newLeft), right = s2.copy(child = 
newRight), isSkewJoin = true)
+  }.getOrElse(smj)
 
 case shj @ ShuffledHashJoinExec(_, _, joinType, _, _,
 ShuffleStage(left: ShuffleQueryStageExec),
 ShuffleStage(right: ShuffleQueryStageExec), false) =>
-  val newChildren = tryOptimizeJoinChildren(left, right, joinType)
-  if (newChildren.isDefined) {
-val (newLeft, newRight) = newChildren.get
-shj.copy(left = newLeft, right = newRight, isSkewJoin = true)
-  } else {
-shj
-  }
+  tryOptimizeJoinChildren(left, right, joinType).map {
+case (newLeft, newRight) =>
+  shj.copy(left = newLeft, right = newRight, isSkewJoin = true)
+  }.getOrElse(shj)
   }
 
   override def apply(plan: SparkPlan): SparkPlan = {
@@ -252,7 +252,9 @@ case class OptimizeSkewedJoin(
   // SHJ
   //   Shuffle
   //   Shuffle
-  val optimized = ensureRequirements.apply(optimizeSkewJoin(plan))
+  val optimized = 
ensureRequirements.apply(optimizeSkewJoin(plan)).transform {
+case SkewJoinChildWrapper(child) => child
+  }
   val originCost = costEvaluator.evaluateCost(plan)

[spark] branch branch-3.1 updated: Revert "[SPARK-35672][CORE][YARN][3.1] Pass user classpath entries to executors using config instead of command line"

2021-09-23 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 1b54580  Revert "[SPARK-35672][CORE][YARN][3.1] Pass user classpath 
entries to executors using config instead of command line"
1b54580 is described below

commit 1b545804c47fc37b7d54ad8967bf788251dc10fc
Author: Hyukjin Kwon 
AuthorDate: Fri Sep 24 12:50:07 2021 +0900

Revert "[SPARK-35672][CORE][YARN][3.1] Pass user classpath entries to 
executors using config instead of command line"

This reverts commit b4916d4a410820ba00125c00b55ca724b27cc853.
---
 .../executor/CoarseGrainedExecutorBackend.scala| 17 +++---
 .../scala/org/apache/spark/executor/Executor.scala |  2 -
 .../CoarseGrainedExecutorBackendSuite.scala| 17 +++---
 .../spark/deploy/yarn/ApplicationMaster.scala  |  9 ++--
 .../org/apache/spark/deploy/yarn/Client.scala  | 31 +--
 .../spark/deploy/yarn/ExecutorRunnable.scala   | 12 +
 .../YarnCoarseGrainedExecutorBackend.scala |  8 ++-
 .../org/apache/spark/deploy/yarn/ClientSuite.scala | 35 
 .../spark/deploy/yarn/YarnClusterSuite.scala   | 63 --
 9 files changed, 52 insertions(+), 142 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
 
b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
index f8024b0..ca207ae 100644
--- 
a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
@@ -51,6 +51,7 @@ private[spark] class CoarseGrainedExecutorBackend(
 bindAddress: String,
 hostname: String,
 cores: Int,
+userClassPath: Seq[URL],
 env: SparkEnv,
 resourcesFileOpt: Option[String],
 resourceProfile: ResourceProfile)
@@ -113,7 +114,7 @@ private[spark] class CoarseGrainedExecutorBackend(
*/
   private def createClassLoader(): MutableURLClassLoader = {
 val currentLoader = Utils.getContextOrSparkClassLoader
-val urls = getUserClassPath.toArray
+val urls = userClassPath.toArray
 if (env.conf.get(EXECUTOR_USER_CLASS_PATH_FIRST)) {
   new ChildFirstURLClassLoader(urls, currentLoader)
 } else {
@@ -138,8 +139,6 @@ private[spark] class CoarseGrainedExecutorBackend(
 }
   }
 
-  def getUserClassPath: Seq[URL] = Nil
-
   def extractLogUrls: Map[String, String] = {
 val prefix = "SPARK_LOG_URL_"
 sys.env.filterKeys(_.startsWith(prefix))
@@ -156,7 +155,7 @@ private[spark] class CoarseGrainedExecutorBackend(
 case RegisteredExecutor =>
   logInfo("Successfully registered with driver")
   try {
-executor = new Executor(executorId, hostname, env, getUserClassPath, 
isLocal = false,
+executor = new Executor(executorId, hostname, env, userClassPath, 
isLocal = false,
   resources = _resources)
 driver.get.send(LaunchedExecutor(executorId))
   } catch {
@@ -378,6 +377,7 @@ private[spark] object CoarseGrainedExecutorBackend extends 
Logging {
   cores: Int,
   appId: String,
   workerUrl: Option[String],
+  userClassPath: mutable.ListBuffer[URL],
   resourcesFileOpt: Option[String],
   resourceProfileId: Int)
 
@@ -385,7 +385,7 @@ private[spark] object CoarseGrainedExecutorBackend extends 
Logging {
 val createFn: (RpcEnv, Arguments, SparkEnv, ResourceProfile) =>
   CoarseGrainedExecutorBackend = { case (rpcEnv, arguments, env, 
resourceProfile) =>
   new CoarseGrainedExecutorBackend(rpcEnv, arguments.driverUrl, 
arguments.executorId,
-arguments.bindAddress, arguments.hostname, arguments.cores,
+arguments.bindAddress, arguments.hostname, arguments.cores, 
arguments.userClassPath.toSeq,
 env, arguments.resourcesFileOpt, resourceProfile)
 }
 run(parseArguments(args, this.getClass.getCanonicalName.stripSuffix("$")), 
createFn)
@@ -469,6 +469,7 @@ private[spark] object CoarseGrainedExecutorBackend extends 
Logging {
 var resourcesFileOpt: Option[String] = None
 var appId: String = null
 var workerUrl: Option[String] = None
+val userClassPath = new mutable.ListBuffer[URL]()
 var resourceProfileId: Int = DEFAULT_RESOURCE_PROFILE_ID
 
 var argv = args.toList
@@ -499,6 +500,9 @@ private[spark] object CoarseGrainedExecutorBackend extends 
Logging {
   // Worker url is used in spark standalone mode to enforce 
fate-sharing with worker
   workerUrl = Some(value)
   argv = tail
+case ("--user-class-path") :: value :: tail =>
+  userClassPath += new URL(value)
+  argv = tail
 case ("--resourceProfileId") :: value :: tail =>
   resourceProfileId = value.toInt
   argv = tail
@@ -525,7 +529,7 @@ private[spark] object

[spark] branch branch-3.2 updated: Revert "[SPARK-35672][CORE][YARN] Pass user classpath entries to exec…

2021-09-23 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 09a8535  Revert "[SPARK-35672][CORE][YARN] Pass user classpath entries 
to exec…
09a8535 is described below

commit 09a8535cc407091c44e3eb9960292658a9c426c9
Author: Gengliang Wang 
AuthorDate: Fri Sep 24 12:46:22 2021 +0900

Revert "[SPARK-35672][CORE][YARN] Pass user classpath entries to exec…

…utors using config instead of command line"

### What changes were proposed in this pull request?
This reverts commit 866df69c6290b2f8e2726f1325969d23c938c0f2.

### Why are the changes needed?
After the change environment variables were not substituted in user 
classpath entries. Please find an example on SPARK-35672.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #34088 from gengliangwang/revertSPARK-35672.

Authored-by: Gengliang Wang 
Signed-off-by: Hyukjin Kwon 
---
 .../executor/CoarseGrainedExecutorBackend.scala| 17 +++---
 .../scala/org/apache/spark/executor/Executor.scala |  2 -
 .../CoarseGrainedExecutorBackendSuite.scala| 17 +++---
 .../spark/deploy/yarn/ApplicationMaster.scala  |  9 ++--
 .../org/apache/spark/deploy/yarn/Client.scala  | 32 ++-
 .../spark/deploy/yarn/ExecutorRunnable.scala   | 12 +
 .../YarnCoarseGrainedExecutorBackend.scala |  8 ++-
 .../org/apache/spark/deploy/yarn/ClientSuite.scala | 35 
 .../spark/deploy/yarn/YarnClusterSuite.scala   | 63 --
 9 files changed, 53 insertions(+), 142 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
 
b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
index 0f8e6d1..e3be6eb 100644
--- 
a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
@@ -52,6 +52,7 @@ private[spark] class CoarseGrainedExecutorBackend(
 bindAddress: String,
 hostname: String,
 cores: Int,
+userClassPath: Seq[URL],
 env: SparkEnv,
 resourcesFileOpt: Option[String],
 resourceProfile: ResourceProfile)
@@ -123,7 +124,7 @@ private[spark] class CoarseGrainedExecutorBackend(
*/
   private def createClassLoader(): MutableURLClassLoader = {
 val currentLoader = Utils.getContextOrSparkClassLoader
-val urls = getUserClassPath.toArray
+val urls = userClassPath.toArray
 if (env.conf.get(EXECUTOR_USER_CLASS_PATH_FIRST)) {
   new ChildFirstURLClassLoader(urls, currentLoader)
 } else {
@@ -148,8 +149,6 @@ private[spark] class CoarseGrainedExecutorBackend(
 }
   }
 
-  def getUserClassPath: Seq[URL] = Nil
-
   def extractLogUrls: Map[String, String] = {
 val prefix = "SPARK_LOG_URL_"
 sys.env.filterKeys(_.startsWith(prefix))
@@ -166,7 +165,7 @@ private[spark] class CoarseGrainedExecutorBackend(
 case RegisteredExecutor =>
   logInfo("Successfully registered with driver")
   try {
-executor = new Executor(executorId, hostname, env, getUserClassPath, 
isLocal = false,
+executor = new Executor(executorId, hostname, env, userClassPath, 
isLocal = false,
   resources = _resources)
 driver.get.send(LaunchedExecutor(executorId))
   } catch {
@@ -395,6 +394,7 @@ private[spark] object CoarseGrainedExecutorBackend extends 
Logging {
   cores: Int,
   appId: String,
   workerUrl: Option[String],
+  userClassPath: mutable.ListBuffer[URL],
   resourcesFileOpt: Option[String],
   resourceProfileId: Int)
 
@@ -402,7 +402,7 @@ private[spark] object CoarseGrainedExecutorBackend extends 
Logging {
 val createFn: (RpcEnv, Arguments, SparkEnv, ResourceProfile) =>
   CoarseGrainedExecutorBackend = { case (rpcEnv, arguments, env, 
resourceProfile) =>
   new CoarseGrainedExecutorBackend(rpcEnv, arguments.driverUrl, 
arguments.executorId,
-arguments.bindAddress, arguments.hostname, arguments.cores,
+arguments.bindAddress, arguments.hostname, arguments.cores, 
arguments.userClassPath.toSeq,
 env, arguments.resourcesFileOpt, resourceProfile)
 }
 run(parseArguments(args, this.getClass.getCanonicalName.stripSuffix("$")), 
createFn)
@@ -490,6 +490,7 @@ private[spark] object CoarseGrainedExecutorBackend extends 
Logging {
 var resourcesFileOpt: Option[String] = None
 var appId: String = null
 var workerUrl: Option[String] = None
+val userClassPath = new mutable.ListBuffer[URL]()
 var resourceProfileId: Int = DEFAULT_RESOURCE_PROFILE_ID
 
 var argv = args.toList
@@ -520,6 +521,9 @@ private[spark] object CoarseGrainedExecutorBackend extends

[spark] branch branch-3.2 updated: [SPARK-36835][BUILD] Enable createDependencyReducedPom for Maven shaded plugin

2021-09-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 09283d3  [SPARK-36835][BUILD] Enable createDependencyReducedPom for 
Maven shaded plugin
09283d3 is described below

commit 09283d32106f8cb6b5e177200a64f600391f3026
Author: Chao Sun 
AuthorDate: Fri Sep 24 10:16:30 2021 +0800

[SPARK-36835][BUILD] Enable createDependencyReducedPom for Maven shaded 
plugin

### What changes were proposed in this pull request?

Enable `createDependencyReducedPom` for Spark's Maven shaded plugin so that 
the effective pom won't contain those shaded artifacts such as 
`org.eclipse.jetty`

### Why are the changes needed?

At the moment, the effective pom leaks transitive dependencies to 
downstream apps for those shaded artifacts, which potentially will cause issues.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

I manually tested and the `core/dependency-reduced-pom.xml` no longer 
contains dependencies such as `jetty-XX`.

Closes #34085 from sunchao/SPARK-36835.

Authored-by: Chao Sun 
Signed-off-by: Gengliang Wang 
(cherry picked from commit ed88e610f04a1015cff9b11bea13b87affc29ef3)
Signed-off-by: Gengliang Wang 
---
 pom.xml | 1 -
 1 file changed, 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index 4a3bd71..861ebb2 100644
--- a/pom.xml
+++ b/pom.xml
@@ -3015,7 +3015,6 @@
 maven-shade-plugin
 
   false
-  false
   
 
   org.spark-project.spark:unused

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c2c4a48 -> ed88e61)

2021-09-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c2c4a48  Revert "[SPARK-35672][CORE][YARN] Pass user classpath entries 
to executors using config instead of command line"
 add ed88e61  [SPARK-36835][BUILD] Enable createDependencyReducedPom for 
Maven shaded plugin

No new revisions were added by this update.

Summary of changes:
 pom.xml | 1 -
 1 file changed, 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ef7441b -> c2c4a48)

2021-09-23 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ef7441b  [SPARK-36642][SQL] Add df.withMetadata pyspark API
 add c2c4a48  Revert "[SPARK-35672][CORE][YARN] Pass user classpath entries 
to executors using config instead of command line"

No new revisions were added by this update.

Summary of changes:
 .../executor/CoarseGrainedExecutorBackend.scala| 17 +++---
 .../scala/org/apache/spark/executor/Executor.scala |  2 -
 .../CoarseGrainedExecutorBackendSuite.scala| 17 +++---
 .../cluster/k8s/KubernetesExecutorBackend.scala|  2 +-
 .../spark/deploy/yarn/ApplicationMaster.scala  |  9 ++--
 .../org/apache/spark/deploy/yarn/Client.scala  | 32 ++-
 .../spark/deploy/yarn/ExecutorRunnable.scala   | 12 +
 .../YarnCoarseGrainedExecutorBackend.scala |  8 ++-
 .../org/apache/spark/deploy/yarn/ClientSuite.scala | 35 
 .../spark/deploy/yarn/YarnClusterSuite.scala   | 63 --
 10 files changed, 54 insertions(+), 143 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36642][SQL] Add df.withMetadata pyspark API

2021-09-23 Thread weichenxu123

This is an automated email from the ASF dual-hosted git repository.

weichenxu123 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ef7441b  [SPARK-36642][SQL] Add df.withMetadata pyspark API
ef7441b is described below

commit ef7441bb4245e2fb08d70a8025a2c6d8bebfe75b
Author: Liang Zhang 
AuthorDate: Fri Sep 24 08:30:18 2021 +0800

[SPARK-36642][SQL] Add df.withMetadata pyspark API

This PR adds the pyspark API `df.withMetadata(columnName, metadata)`. The 
scala API is added in this PR https://github.com/apache/spark/pull/33853.

### What changes were proposed in this pull request?

To make it easy to use/modify the semantic annotation, we want to have a 
shorter API to update the metadata in a dataframe. Currently we have 
`df.withColumn("col1", col("col1").alias("col1", metadata=metadata))` to update 
the metadata without changing the column name, and this is too verbose. We want 
to have a syntax suger API `df.withMetadata("col1", metadata=metadata)` to 
achieve the same functionality.

### Why are the changes needed?

A bit of background for the frequency of the update: We are working on 
inferring the semantic data types and use them in AutoML and store the semantic 
annotation in the metadata. So in many cases, we will suggest the user update 
the metadata to correct the wrong inference or add the annotation for weak 
inference.

### Does this PR introduce _any_ user-facing change?

Yes.
A syntax suger API `df.withMetadata("col1", metadata=metadata)` to achieve 
the same functionality as`df.withColumn("col1", col("col1").alias("col1", 
metadata=metadata))`.

### How was this patch tested?

doctest.

Closes #34021 from liangz1/withMetadataPython.

Authored-by: Liang Zhang 
Signed-off-by: Weichen Xu 
---
 python/pyspark/sql/dataframe.py | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 081c06e..5a2e8cf 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -15,6 +15,7 @@
 # limitations under the License.
 #
 
+import json
 import sys
 import random
 import warnings
@@ -23,6 +24,7 @@ from functools import reduce
 from html import escape as html_escape
 
 from pyspark import copy_func, since, _NoValue
+from pyspark.context import SparkContext
 from pyspark.rdd import RDD, _load_from_socket, _local_iterator_from_socket
 from pyspark.serializers import BatchedSerializer, PickleSerializer, \
 UTF8Deserializer
@@ -2536,6 +2538,31 @@ class DataFrame(PandasMapOpsMixin, 
PandasConversionMixin):
 """
 return DataFrame(self._jdf.withColumnRenamed(existing, new), 
self.sql_ctx)
 
+def withMetadata(self, columnName, metadata):
+"""Returns a new :class:`DataFrame` by updating an existing column 
with metadata.
+
+.. versionadded:: 3.3.0
+
+Parameters
+--
+columnName : str
+string, name of the existing column to update the metadata.
+metadata : dict
+dict, new metadata to be assigned to df.schema[columnName].metadata
+
+Examples
+
+>>> df_meta = df.withMetadata('age', {'foo': 'bar'})
+>>> df_meta.schema['age'].metadata
+{'foo': 'bar'}
+"""
+if not isinstance(metadata, dict):
+raise TypeError("metadata should be a dict")
+sc = SparkContext._active_spark_context
+jmeta = sc._jvm.org.apache.spark.sql.types.Metadata.fromJson(
+json.dumps(metadata))
+return DataFrame(self._jdf.withMetadata(columnName, jmeta), 
self.sql_ctx)
+
 def drop(self, *cols):
 """Returns a new :class:`DataFrame` that drops the specified column.
 This is a no-op if schema doesn't contain the given column name(s).

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6e815da -> 0b65daa)

2021-09-23 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6e815da  [SPARK-36760][SQL][FOLLOWUP] Add interface 
SupportsPushDownV2Filters
 add 0b65daa  [SPARK-36760][DOCS][FOLLOWUP] Fix wrong JavaDoc style

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/connector/read/SupportsPushDownV2Filters.java  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d8ac7c -> 6e815da)

2021-09-23 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d8ac7c  [SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in 
separate thread pool
 add 6e815da  [SPARK-36760][SQL][FOLLOWUP] Add interface 
SupportsPushDownV2Filters

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/connector/read/SupportsPushDownV2Filters.java   | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r50057 - in /dev/spark/v3.2.0-rc4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2021-09-23 Thread gengliang

Author: gengliang
Date: Thu Sep 23 11:19:12 2021
New Revision: 50057

Log:
Apache Spark v3.2.0-rc4 docs


[This commit notification would consist of 2344 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r50056 - /dev/spark/v3.2.0-rc4-bin/

2021-09-23 Thread gengliang

Author: gengliang
Date: Thu Sep 23 10:53:07 2021
New Revision: 50056

Log:
Apache Spark v3.2.0-rc4

Added:
dev/spark/v3.2.0-rc4-bin/
dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz   (with props)
dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz.asc
dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz.sha512
dev/spark/v3.2.0-rc4-bin/pyspark-3.2.0.tar.gz   (with props)
dev/spark/v3.2.0-rc4-bin/pyspark-3.2.0.tar.gz.asc
dev/spark/v3.2.0-rc4-bin/pyspark-3.2.0.tar.gz.sha512
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-hadoop2.7.tgz.asc
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-hadoop2.7.tgz.sha512
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz   (with 
props)
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz.asc
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz.sha512
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-hadoop3.2.tgz.asc
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-hadoop3.2.tgz.sha512
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-without-hadoop.tgz.asc
dev/spark/v3.2.0-rc4-bin/spark-3.2.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.2.0-rc4-bin/spark-3.2.0.tgz   (with props)
dev/spark/v3.2.0-rc4-bin/spark-3.2.0.tgz.asc
dev/spark/v3.2.0-rc4-bin/spark-3.2.0.tgz.sha512

Added: dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz.asc
==
--- dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz.asc (added)
+++ dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz.asc Thu Sep 23 10:53:07 2021
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJJBAABCgAzFiEEVNI/8r5LtJnK7c4wCYJ7fOVh7E4FAmFMVSoVHGdlbmdsaWFu
+Z0BhcGFjaGUub3JnAAoJEAmCe3zlYexOrT4QAKEK46wn07tMrPTNVwSZBNKZekcl
+/speI9pM0nX94ILwHeFzqCEQMOMJ7rkfmOTw4k6FhAml2zunV6zRPiU7eKgb2QTn
+02nhrukX6zTIxr+z82MJIS5bToyYNhEN1VlmOTQVXezUqw7wT2I/lQxceAU84Dpy
+sj73D1cPUCntv1i/wH0VCHYmU1vVHsOuIoPCBhYqRhARcXb3Dz9GHVwcoXRH/lTH
+5IACddRvtvmc5lincPVRWu1haN4hGoRF5OboTyXImvMCdkNcoBeL4y7bMF+8i+KV
+j2ub0EMoEUoL/i2MuXADjBPFgPlJwK1Zi2OrujM4Luw9CXEfJ8Br91RoOOXSd3zQ
+07sDBFbiL3Yl8T+1H274tN7sCqC1FDRNnu8AMes24KDC76uU23UUvgmxotByeTgv
+dTM0tyGvZf+uD0+9uVyWjkowwY2i+txozZqOcG489i3kcjOTLtZVcu0txCBHssS4
+a/3KT9WmQHqAn8m11ygwzI8c1JO1ObMPlAmXYMtSv+Fl6sxdZQWBKb5pgYVM2E/g
+lIJZlmg0zpyD7pNeNAtMCLvVN6mSFpV2lMZPlYv9r/sM/o+v2DH6Y/vX/ZpMpe+s
+IaF3WXU9YKFdrTOaS9wsAeqWjm/BMYLLBi3+tZxDG4vMpE0wdBLzCuZ1HZ3oRw3d
+7cLq14i7vjQOfNeJ
+=V6iL
+-END PGP SIGNATURE-

Added: dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz.sha512
==
--- dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz.sha512 (added)
+++ dev/spark/v3.2.0-rc4-bin/SparkR_3.2.0.tar.gz.sha512 Thu Sep 23 10:53:07 2021
@@ -0,0 +1,3 @@
+SparkR_3.2.0.tar.gz: 163BE5BC 6DDEFEC5 0BE37C42 56C86489 97BAA21E 61FD489F
+ A759E15A C1C8D5CB BD142A17 CCE56482 EF8C230D 43823B46
+ 4BA68FD0 B07989C5 48AD3560 0ABC48A0

Added: dev/spark/v3.2.0-rc4-bin/pyspark-3.2.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.0-rc4-bin/pyspark-3.2.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.0-rc4-bin/pyspark-3.2.0.tar.gz.asc
==
--- dev/spark/v3.2.0-rc4-bin/pyspark-3.2.0.tar.gz.asc (added)
+++ dev/spark/v3.2.0-rc4-bin/pyspark-3.2.0.tar.gz.asc Thu Sep 23 10:53:07 2021
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJJBAABCgAzFiEEVNI/8r5LtJnK7c4wCYJ7fOVh7E4FAmFMVSwVHGdlbmdsaWFu
+Z0BhcGFjaGUub3JnAAoJEAmCe3zlYexOPakQALRSVqRFLWh8nvIj+yvIzRvsnUzw
+FdsUYZtwqkZuQ00OsyrQUEShG/GiosPaU/QI3usDCl6SAlXPWMClRzBBWCN6Mt7h
+rThfLth65K/g65nzJEG9hYrw0ZIyQfBZ0ZBrsvCvlCNXZ9EXZEaKKTvfwJ7KBkfE
+oRv8c0ZDdf9O0VL9BOw4+JJL5XuvMGfAVq+yD9jZQKy7pO9X72DiyJaFLDURwYly
+pndc581RFft0n3x9uYZs1WWbzTpd4vHAYkk1d28VNvgOYLJzcuj8vjM+RByxmtGy
+5IZEYFKWm/feoPqOQKr7fmNJnCkpLikr12gjuk/l2jIXMmxTOAn3M+OGLQ7XEqk9
+rDrxjxyBuFSBNTKaYZubQpaUs65Cy5Uz67RTUxYsnThK1p/Z1hQt76GSjST7klGf
+CASl2IZbmjW11eZrxa5OoeaoCVfx6I5NDYCtfoEqlqQG9bBo5O9KErcvTFRqHtLU
+W/wzFKiouB/xOaTAjhkv7h8E/Bi0miYYAEb3Y2po3oTjo5eqZlM0Fg7AWCaIfq9t
+srydGoote1feUSsPeZho2RYkW1lMos74kVl8J6C+O3bPmeV8frTLBwpdP8XWPDyo

[spark] branch branch-3.1 updated: [SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in separate thread pool

2021-09-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 8e36217  [SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in 
separate thread pool
8e36217 is described below

commit 8e36217793f5120c4fbe59b48b686a572321414e
Author: yi.wu 
AuthorDate: Thu Sep 23 16:29:54 2021 +0800

[SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in separate thread 
pool

### What changes were proposed in this pull request?

This's a follow-up of https://github.com/apache/spark/pull/34043. This PR 
proposes to only handle shuffle blocks in the separate thread pool and leave 
other blocks the same behavior as it is.

### Why are the changes needed?

To avoid any potential overhead.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass existing tests.

Closes #34076 from Ngone51/spark-36782-follow-up.

Authored-by: yi.wu 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 9d8ac7c8e90c1c2a6060e8f1e8f2f21e19622567)
Signed-off-by: Gengliang Wang 
---
 .../spark/storage/BlockManagerMasterEndpoint.scala | 69 +-
 1 file changed, 41 insertions(+), 28 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
index a7289ef..c22b16a 100644
--- 
a/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
+++ 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
@@ -117,15 +117,20 @@ class BlockManagerMasterEndpoint(
 
 case _updateBlockInfo @
 UpdateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size) =>
-  val response = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
 
-  response.foreach { isSuccess =>
+  @inline def handleResult(success: Boolean): Unit = {
 // SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
 // returns false since the block info would be updated again later.
-if (isSuccess) {
+if (success) {
   
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
 }
-context.reply(isSuccess)
+context.reply(success)
+  }
+
+  if (blockId.isShuffle) {
+updateShuffleBlockInfo(blockId, blockManagerId).foreach(handleResult)
+  } else {
+handleResult(updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size))
   }
 
 case GetLocations(blockId) =>
@@ -566,46 +571,54 @@ class BlockManagerMasterEndpoint(
 id
   }
 
+ private def updateShuffleBlockInfo(blockId: BlockId, blockManagerId: 
BlockManagerId)
+: Future[Boolean] = {
+   blockId match {
+ case ShuffleIndexBlockId(shuffleId, mapId, _) =>
+   // SPARK-36782: Invoke `MapOutputTracker.updateMapOutput` within the 
thread
+   // `dispatcher-BlockManagerMaster` could lead to the deadlock when
+   // `MapOutputTracker.serializeOutputStatuses` broadcasts the serialized 
mapstatues under
+   // the acquired write lock. The broadcast block would report its status 
to
+   // `BlockManagerMasterEndpoint`, while the `BlockManagerMasterEndpoint` 
is occupied by
+   // `updateMapOutput` since it's waiting for the write lock. Thus, we 
use `Future` to call
+   // `updateMapOutput` in a separate thread to avoid the deadlock.
+   Future {
+ // We need to update this at index file because there exists the 
index-only block
+ logDebug(s"Received shuffle index block update for ${shuffleId} 
${mapId}, updating.")
+ mapOutputTracker.updateMapOutput(shuffleId, mapId, blockManagerId)
+ true
+   }
+ case ShuffleDataBlockId(shuffleId: Int, mapId: Long, reduceId: Int) =>
+   logDebug(s"Received shuffle data block update for ${shuffleId} 
${mapId}, ignore.")
+   Future.successful(true)
+ case _ =>
+   logDebug(s"Unexpected shuffle block type ${blockId}" +
+ s"as ${blockId.getClass().getSimpleName()}")
+   Future.successful(false)
+   }
+ }
+
   private def updateBlockInfo(
   blockManagerId: BlockManagerId,
   blockId: BlockId,
   storageLevel: StorageLevel,
   memSize: Long,
-  diskSize: Long): Future[Boolean] = {
+  diskSize: Long): Boolean = {
 logDebug(s"Updating block info on master ${blockId} for ${blockManagerId}")
 
-if (blockId.isShuffle) {
-  blockId match {
-case ShuffleIndexBlockId(shuffleId, mapId, _) =>
-  // We need to update this at index file because there exists the 
index-only block
-  return Future {
-

[spark] branch branch-3.1 updated: [SPARK-36782][CORE] Avoid blocking dispatcher-BlockManagerMaster during UpdateBlockInfo

2021-09-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 1b48a48  [SPARK-36782][CORE] Avoid blocking 
dispatcher-BlockManagerMaster during UpdateBlockInfo
1b48a48 is described below

commit 1b48a487e88cd639eac3117298090ad8ec5c4c47
Author: Fabian A.J. Thiele 
AuthorDate: Thu Sep 23 12:56:49 2021 +0800

[SPARK-36782][CORE] Avoid blocking dispatcher-BlockManagerMaster during 
UpdateBlockInfo

### What changes were proposed in this pull request?
Delegate potentially blocking call to `mapOutputTracker.updateMapOutput` 
from within  `UpdateBlockInfo` from `dispatcher-BlockManagerMaster` to the 
threadpool to avoid blocking the endpoint. This code path is only accessed for 
`ShuffleIndexBlockId`, other blocks are still executed on the 
`dispatcher-BlockManagerMaster` itself.

Change `updateBlockInfo` to return `Future[Boolean]` instead of `Boolean`. 
Response will be sent to RPC caller upon successful completion of the future.

Introduce a unit test that forces `MapOutputTracker` to make a broadcast as 
part of `MapOutputTracker.serializeOutputStatuses` when running decommission 
tests.

### Why are the changes needed?
[SPARK-36782](https://issues.apache.org/jira/browse/SPARK-36782) describes 
a deadlock occurring if the `dispatcher-BlockManagerMaster` is allowed to block 
while waiting for write access to data structures.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test as introduced in this PR.

---

Ping eejbyfeldt for notice.

Closes #34043 from f-thiele/SPARK-36782.

Lead-authored-by: Fabian A.J. Thiele 
Co-authored-by: Emil Ejbyfeldt 
Co-authored-by: Fabian A.J. Thiele 
Signed-off-by: Gengliang Wang 
---
 .../spark/storage/BlockManagerMasterEndpoint.scala | 37 --
 .../BlockManagerDecommissionIntegrationSuite.scala | 13 +++-
 2 files changed, 33 insertions(+), 17 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
index 61591b0..a7289ef 100644
--- 
a/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
+++ 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
@@ -117,12 +117,15 @@ class BlockManagerMasterEndpoint(
 
 case _updateBlockInfo @
 UpdateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size) =>
-  val isSuccess = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
-  context.reply(isSuccess)
-  // SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
-  // returns false since the block info would be updated again later.
-  if (isSuccess) {
-
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
+  val response = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
+
+  response.foreach { isSuccess =>
+// SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
+// returns false since the block info would be updated again later.
+if (isSuccess) {
+  
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
+}
+context.reply(isSuccess)
   }
 
 case GetLocations(blockId) =>
@@ -568,23 +571,25 @@ class BlockManagerMasterEndpoint(
   blockId: BlockId,
   storageLevel: StorageLevel,
   memSize: Long,
-  diskSize: Long): Boolean = {
+  diskSize: Long): Future[Boolean] = {
 logDebug(s"Updating block info on master ${blockId} for ${blockManagerId}")
 
 if (blockId.isShuffle) {
   blockId match {
 case ShuffleIndexBlockId(shuffleId, mapId, _) =>
   // We need to update this at index file because there exists the 
index-only block
-  logDebug(s"Received shuffle index block update for ${shuffleId} 
${mapId}, updating.")
-  mapOutputTracker.updateMapOutput(shuffleId, mapId, blockManagerId)
-  return true
+  return Future {
+logDebug(s"Received shuffle index block update for ${shuffleId} 
${mapId}, updating.")
+mapOutputTracker.updateMapOutput(shuffleId, mapId, blockManagerId)
+true
+  }
 case ShuffleDataBlockId(shuffleId: Int, mapId: Long, reduceId: Int) =>
   logDebug(s"Received shuffle data block update for ${shuffleId} 
${mapId}, ignore.")
-  return true
+  return Future.successful(true)
 case _ =>
   logDebug(s"Unexpected shuffle block type ${blockId}" +
 s"as

[spark] 01/01: Preparing development version 3.2.1-SNAPSHOT

2021-09-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 0fb7127f85de4d1434cecbacb3bb50052b5bbb34
Author: Gengliang Wang 
AuthorDate: Thu Sep 23 08:46:28 2021 +

Preparing development version 3.2.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a310e6..2abad61 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.0
+Version: 3.2.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 83a1f46..5b2f449 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f371fe3..ff66ac6 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 0499fe1..9db4231 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 6c55007..1788b0f 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index c837226..ffe0c1d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index ace7354..be18e9b 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.0
+3.2.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index

[spark] branch branch-3.2 updated (0ad3827 -> 0fb7127)

2021-09-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0ad3827  [SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in 
separate thread pool
 add b609f2f  Preparing Spark release v3.2.0-rc4
 new 0fb7127  Preparing development version 3.2.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v3.2.0-rc4

2021-09-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to tag v3.2.0-rc4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit b609f2fe0c1dd9a7e7b3aedd31ab81e6311b9b3f
Author: Gengliang Wang 
AuthorDate: Thu Sep 23 08:46:22 2021 +

Preparing Spark release v3.2.0-rc4
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 2abad61..4a310e6 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.1
+Version: 3.2.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 5b2f449..83a1f46 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index ff66ac6..f371fe3 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 9db4231..0499fe1 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 1788b0f..6c55007 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index ffe0c1d..c837226 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index be18e9b..ace7354 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.0
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index

[spark] tag v3.2.0-rc4 created (now b609f2f)

2021-09-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to tag v3.2.0-rc4
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at b609f2f  (commit)
This tag includes the following new commits:

 new b609f2f  Preparing Spark release v3.2.0-rc4

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in separate thread pool

2021-09-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 0ad3827  [SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in 
separate thread pool
0ad3827 is described below

commit 0ad382747d689ab445ce72cd12c940e727c0d05a
Author: yi.wu 
AuthorDate: Thu Sep 23 16:29:54 2021 +0800

[SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in separate thread 
pool

### What changes were proposed in this pull request?

This's a follow-up of https://github.com/apache/spark/pull/34043. This PR 
proposes to only handle shuffle blocks in the separate thread pool and leave 
other blocks the same behavior as it is.

### Why are the changes needed?

To avoid any potential overhead.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass existing tests.

Closes #34076 from Ngone51/spark-36782-follow-up.

Authored-by: yi.wu 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 9d8ac7c8e90c1c2a6060e8f1e8f2f21e19622567)
Signed-off-by: Gengliang Wang 
---
 .../spark/storage/BlockManagerMasterEndpoint.scala | 69 +-
 1 file changed, 41 insertions(+), 28 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
index 2225822..ca0e740 100644
--- 
a/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
+++ 
b/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
@@ -117,15 +117,20 @@ class BlockManagerMasterEndpoint(
 
 case _updateBlockInfo @
 UpdateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size) =>
-  val response = updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size)
 
-  response.foreach { isSuccess =>
+  @inline def handleResult(success: Boolean): Unit = {
 // SPARK-30594: we should not post `SparkListenerBlockUpdated` when 
updateBlockInfo
 // returns false since the block info would be updated again later.
-if (isSuccess) {
+if (success) {
   
listenerBus.post(SparkListenerBlockUpdated(BlockUpdatedInfo(_updateBlockInfo)))
 }
-context.reply(isSuccess)
+context.reply(success)
+  }
+
+  if (blockId.isShuffle) {
+updateShuffleBlockInfo(blockId, blockManagerId).foreach(handleResult)
+  } else {
+handleResult(updateBlockInfo(blockManagerId, blockId, storageLevel, 
deserializedSize, size))
   }
 
 case GetLocations(blockId) =>
@@ -571,46 +576,54 @@ class BlockManagerMasterEndpoint(
 id
   }
 
+ private def updateShuffleBlockInfo(blockId: BlockId, blockManagerId: 
BlockManagerId)
+: Future[Boolean] = {
+   blockId match {
+ case ShuffleIndexBlockId(shuffleId, mapId, _) =>
+   // SPARK-36782: Invoke `MapOutputTracker.updateMapOutput` within the 
thread
+   // `dispatcher-BlockManagerMaster` could lead to the deadlock when
+   // `MapOutputTracker.serializeOutputStatuses` broadcasts the serialized 
mapstatues under
+   // the acquired write lock. The broadcast block would report its status 
to
+   // `BlockManagerMasterEndpoint`, while the `BlockManagerMasterEndpoint` 
is occupied by
+   // `updateMapOutput` since it's waiting for the write lock. Thus, we 
use `Future` to call
+   // `updateMapOutput` in a separate thread to avoid the deadlock.
+   Future {
+ // We need to update this at index file because there exists the 
index-only block
+ logDebug(s"Received shuffle index block update for ${shuffleId} 
${mapId}, updating.")
+ mapOutputTracker.updateMapOutput(shuffleId, mapId, blockManagerId)
+ true
+   }
+ case ShuffleDataBlockId(shuffleId: Int, mapId: Long, reduceId: Int) =>
+   logDebug(s"Received shuffle data block update for ${shuffleId} 
${mapId}, ignore.")
+   Future.successful(true)
+ case _ =>
+   logDebug(s"Unexpected shuffle block type ${blockId}" +
+ s"as ${blockId.getClass().getSimpleName()}")
+   Future.successful(false)
+   }
+ }
+
   private def updateBlockInfo(
   blockManagerId: BlockManagerId,
   blockId: BlockId,
   storageLevel: StorageLevel,
   memSize: Long,
-  diskSize: Long): Future[Boolean] = {
+  diskSize: Long): Boolean = {
 logDebug(s"Updating block info on master ${blockId} for ${blockManagerId}")
 
-if (blockId.isShuffle) {
-  blockId match {
-case ShuffleIndexBlockId(shuffleId, mapId, _) =>
-  // We need to update this at index file because there exists the 
index-only block
-  return Future {
-

[spark] branch master updated (6d7ab7b -> 9d8ac7c)

2021-09-23 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6d7ab7b  [SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs
 add 9d8ac7c  [SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in 
separate thread pool

No new revisions were added by this update.

Summary of changes:
 .../spark/storage/BlockManagerMasterEndpoint.scala | 69 +-
 1 file changed, 41 insertions(+), 28 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs

2021-09-23 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 89894a4  [SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs
89894a4 is described below

commit 89894a4b1dcebf411bd5bd046165faa74fffdee5
Author: Michael Chen 
AuthorDate: Thu Sep 23 15:54:33 2021 +0900

[SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs

Fixed explain formatted mode so it doesn't have duplicate node IDs when 
InMemoryRelation is present in query plan.

Having duplicated node IDs in the plan makes it confusing.

Yes, explain formatted string will change.
Notice how `ColumnarToRow` and `InMemoryRelation` have node id of 2.
Before changes =>
```
== Physical Plan ==
AdaptiveSparkPlan (14)
+- == Final Plan ==
   * BroadcastHashJoin Inner BuildLeft (9)
   :- BroadcastQueryStage (5)
   :  +- BroadcastExchange (4)
   : +- * Filter (3)
   :+- * ColumnarToRow (2)
   :   +- InMemoryTableScan (1)
   : +- InMemoryRelation (2)
   :   +- * ColumnarToRow (4)
   :  +- Scan parquet default.t1 (3)
   +- * Filter (8)
  +- * ColumnarToRow (7)
 +- Scan parquet default.t2 (6)
+- == Initial Plan ==
   BroadcastHashJoin Inner BuildLeft (13)
   :- BroadcastExchange (11)
   :  +- Filter (10)
   : +- InMemoryTableScan (1)
   :   +- InMemoryRelation (2)
   : +- * ColumnarToRow (4)
   :+- Scan parquet default.t1 (3)
   +- Filter (12)
  +- Scan parquet default.t2 (6)

(1) InMemoryTableScan
Output [1]: [k#x]
Arguments: [k#x], [isnotnull(k#x)]

(2) InMemoryRelation
Arguments: [k#x], 
CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer401788d5,StorageLevel(disk,
 memory, deserialized, 1 replicas),*(1) ColumnarToRow
+- FileScan parquet default.t1[k#x] Batched: true, DataFilters: [], Format: 
Parquet, Location: InMemoryFileIndex(1 
paths)[file:/Users/mike.chen/code/apacheSpark/spark/spark-warehouse/org.apach...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct
,None)

(3) Scan parquet default.t1
Output [1]: [k#x]
Batched: true
Location: InMemoryFileIndex 
[file:/Users/mike.chen/code/apacheSpark/spark/spark-warehouse/org.apache.spark.sql.ExplainSuiteAE/t1]
ReadSchema: struct

(4) ColumnarToRow [codegen id : 1]
Input [1]: [k#x]

(5) BroadcastQueryStage
Output [1]: [k#x]
Arguments: 0

(6) Scan parquet default.t2
Output [1]: [key#x]
Batched: true
Location: InMemoryFileIndex 
[file:/Users/mike.chen/code/apacheSpark/spark/spark-warehouse/org.apache.spark.sql.ExplainSuiteAE/t2]
PushedFilters: [IsNotNull(key)]
ReadSchema: struct

(7) ColumnarToRow
Input [1]: [key#x]

(8) Filter
Input [1]: [key#x]
Condition : isnotnull(key#x)

(9) BroadcastHashJoin [codegen id : 2]
Left keys [1]: [k#x]
Right keys [1]: [key#x]
Join condition: None

(10) Filter
Input [1]: [k#x]
Condition : isnotnull(k#x)

(11) BroadcastExchange
Input [1]: [k#x]
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
bigint)),false), [id=#x]

(12) Filter
Input [1]: [key#x]
Condition : isnotnull(key#x)

(13) BroadcastHashJoin
Left keys [1]: [k#x]
Right keys [1]: [key#x]
Join condition: None

(14) AdaptiveSparkPlan
Output [2]: [k#x, key#x]
Arguments: isFinalPlan=true
```

After Changes =>
```
== Physical Plan ==
AdaptiveSparkPlan (17)
+- == Final Plan ==
   * BroadcastHashJoin Inner BuildLeft (12)
   :- BroadcastQueryStage (8)
   :  +- BroadcastExchange (7)
   : +- * Filter (6)
   :+- * ColumnarToRow (5)
   :   +- InMemoryTableScan (1)
   : +- InMemoryRelation (2)
   :   +- * ColumnarToRow (4)
   :  +- Scan parquet default.t1 (3)
   +- * Filter (11)
  +- * ColumnarToRow (10)
 +- Scan parquet default.t2 (9)
+- == Initial Plan ==
   BroadcastHashJoin Inner BuildLeft (16)
   :- BroadcastExchange (14)
   :  +- Filter (13)
   : +- InMemoryTableScan (1)
   :   +- InMemoryRelation (2)
   : +- * ColumnarToRow (4)
   :+- Scan parquet default.t1 (3)
   +- Filter (15)
  +- Scan parquet default.t2 (9)

(1) InMemoryTableScan
Output [1]: [k#x]
Arguments: [k#x], [isnotnull(k#x)]

(2) InMemoryRelation
Arguments: [k#x],

[spark] branch master updated (0076eba -> 6d7ab7b)

2021-09-23 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0076eba  [MINOR][SQL][DOCS] Correct the 'options' description on 
UnresolvedRelation
 add 6d7ab7b  [SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/ExplainUtils.scala |  2 +-
 .../scala/org/apache/spark/sql/ExplainSuite.scala | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

2021-09-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new af569d1  [MINOR][SQL][DOCS] Correct the 'options' description on 
UnresolvedRelation
af569d1 is described below

commit af569d1b0ac6b25dbd500804a395964ef7f9e60f
Author: Hyukjin Kwon 
AuthorDate: Wed Sep 22 23:00:15 2021 -0700

[MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

### What changes were proposed in this pull request?

This PR fixes the 'options' description on `UnresolvedRelation`. This 
comment was added in https://github.com/apache/spark/pull/29535 but not valid 
anymore because V1 also uses this `options` (and merge the options with the 
table properties) per https://github.com/apache/spark/pull/29712.

This PR can go through from `master` to `branch-3.1`.

### Why are the changes needed?

To make `UnresolvedRelation.options`'s description clearer.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Scala linter by `dev/linter-scala`.

Closes #34075 from HyukjinKwon/minor-comment-unresolved-releation.

Authored-by: Hyukjin Kwon 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 0076eba8d066936c32790ebc4058c52e2d21a207)
Signed-off-by: Huaxin Gao 
---
 .../main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
index 9f05367..9db038d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
@@ -41,7 +41,7 @@ class UnresolvedException(function: String)
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
  * @param multipartIdentifier table name
- * @param options options to scan this relation. Only applicable to v2 table 
scan.
+ * @param options options to scan this relation.
  */
 case class UnresolvedRelation(
 multipartIdentifier: Seq[String],

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

2021-09-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new b5cb3b6  [MINOR][SQL][DOCS] Correct the 'options' description on 
UnresolvedRelation
b5cb3b6 is described below

commit b5cb3b682a2cecae6d826f7610a2606c48fc9643
Author: Hyukjin Kwon 
AuthorDate: Wed Sep 22 23:00:15 2021 -0700

[MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

### What changes were proposed in this pull request?

This PR fixes the 'options' description on `UnresolvedRelation`. This 
comment was added in https://github.com/apache/spark/pull/29535 but not valid 
anymore because V1 also uses this `options` (and merge the options with the 
table properties) per https://github.com/apache/spark/pull/29712.

This PR can go through from `master` to `branch-3.1`.

### Why are the changes needed?

To make `UnresolvedRelation.options`'s description clearer.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Scala linter by `dev/linter-scala`.

Closes #34075 from HyukjinKwon/minor-comment-unresolved-releation.

Authored-by: Hyukjin Kwon 
Signed-off-by: Huaxin Gao 
(cherry picked from commit 0076eba8d066936c32790ebc4058c52e2d21a207)
Signed-off-by: Huaxin Gao 
---
 .../main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
index 55eca63..ec420c4 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
@@ -41,7 +41,7 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: 
TreeType, function: Str
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
  * @param multipartIdentifier table name
- * @param options options to scan this relation. Only applicable to v2 table 
scan.
+ * @param options options to scan this relation.
  */
 case class UnresolvedRelation(
 multipartIdentifier: Seq[String],

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

2021-09-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0076eba  [MINOR][SQL][DOCS] Correct the 'options' description on 
UnresolvedRelation
0076eba is described below

commit 0076eba8d066936c32790ebc4058c52e2d21a207
Author: Hyukjin Kwon 
AuthorDate: Wed Sep 22 23:00:15 2021 -0700

[MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

### What changes were proposed in this pull request?

This PR fixes the 'options' description on `UnresolvedRelation`. This 
comment was added in https://github.com/apache/spark/pull/29535 but not valid 
anymore because V1 also uses this `options` (and merge the options with the 
table properties) per https://github.com/apache/spark/pull/29712.

This PR can go through from `master` to `branch-3.1`.

### Why are the changes needed?

To make `UnresolvedRelation.options`'s description clearer.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Scala linter by `dev/linter-scala`.

Closes #34075 from HyukjinKwon/minor-comment-unresolved-releation.

Authored-by: Hyukjin Kwon 
Signed-off-by: Huaxin Gao 
---
 .../main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
index 8417203..0785336 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
@@ -40,7 +40,7 @@ class UnresolvedException(function: String)
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
  * @param multipartIdentifier table name
- * @param options options to scan this relation. Only applicable to v2 table 
scan.
+ * @param options options to scan this relation.
  */
 case class UnresolvedRelation(
 multipartIdentifier: Seq[String],

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-33832][SQL] Force skew join code simplification and improvement

[spark] branch branch-3.1 updated: Revert "[SPARK-35672][CORE][YARN][3.1] Pass user classpath entries to executors using config instead of command line"

[spark] branch branch-3.2 updated: Revert "[SPARK-35672][CORE][YARN] Pass user classpath entries to exec…

[spark] branch branch-3.2 updated: [SPARK-36835][BUILD] Enable createDependencyReducedPom for Maven shaded plugin

[spark] branch master updated (c2c4a48 -> ed88e61)

[spark] branch master updated (ef7441b -> c2c4a48)

[spark] branch master updated: [SPARK-36642][SQL] Add df.withMetadata pyspark API

[spark] branch master updated (6e815da -> 0b65daa)

[spark] branch master updated (9d8ac7c -> 6e815da)

svn commit: r50057 - in /dev/spark/v3.2.0-rc4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

svn commit: r50056 - /dev/spark/v3.2.0-rc4-bin/

[spark] branch branch-3.1 updated: [SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in separate thread pool

[spark] branch branch-3.1 updated: [SPARK-36782][CORE] Avoid blocking dispatcher-BlockManagerMaster during UpdateBlockInfo

[spark] 01/01: Preparing development version 3.2.1-SNAPSHOT

[spark] branch branch-3.2 updated (0ad3827 -> 0fb7127)

[spark] 01/01: Preparing Spark release v3.2.0-rc4

[spark] tag v3.2.0-rc4 created (now b609f2f)

[spark] branch branch-3.2 updated: [SPARK-36782][CORE][FOLLOW-UP] Only handle shuffle block in separate thread pool

[spark] branch master updated (6d7ab7b -> 9d8ac7c)

[spark] branch branch-3.2 updated: [SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs

[spark] branch master updated (0076eba -> 6d7ab7b)

[spark] branch branch-3.2 updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

[spark] branch branch-3.1 updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

[spark] branch master updated: [MINOR][SQL][DOCS] Correct the 'options' description on UnresolvedRelation

24 matches

Site Navigation

Mail list logo

Footer information