[spark] branch master updated (cc20154 -> b95a847)

2021-01-07 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cc20154  [SPARK-34005][CORE] Update peak memory metrics for each 
Executor on task end
 add b95a847  [SPARK-34046][SQL][TESTS] Use join hint for constructing 
joins in JoinSuite and WholeStageCodegenSuite

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/JoinSuite.scala | 123 -
 .../sql/execution/WholeStageCodegenSuite.scala |  41 ---
 2 files changed, 66 insertions(+), 98 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-34005][CORE] Update peak memory metrics for each Executor on task end

2021-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cc20154  [SPARK-34005][CORE] Update peak memory metrics for each 
Executor on task end
cc20154 is described below

commit cc201545626ffe556682f45edc370ac6fe29e9df
Author: Kousuke Saruta 
AuthorDate: Thu Jan 7 21:24:15 2021 -0800

[SPARK-34005][CORE] Update peak memory metrics for each Executor on task end

### What changes were proposed in this pull request?

This PR makes `AppStatusListener` update the peak memory metrics for each 
Executor on task end like other peak memory metrics (e.g, stage, executors in a 
stage).

### Why are the changes needed?

When `AppStatusListener#onExecutorMetricsUpdate` is called, peak memory 
metrics for Executors, stages and executors in a stage are updated but 
currently, the metrics only for Executors are not updated on task end.

### Does this PR introduce _any_ user-facing change?

Yes. Executor peak memory metrics is updated more accurately.

### How was this patch tested?

After I run a job with `local-cluster[1,1,1024]` and visited 
`/api/v1//executors`, I confirmed `peakExecutorMemory` metrics is shown 
for an Executor even though the life time of each job is very short .
I also modify the json files for `HistoryServerSuite`.

Closes #31029 from sarutak/update-executor-metrics-on-taskend.

Authored-by: Kousuke Saruta 
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/status/AppStatusListener.scala|  1 +
 .../executor_list_json_expectation.json| 22 ++
 .../executor_memory_usage_expectation.json | 88 ++
 ...executor_node_excludeOnFailure_expectation.json | 88 ++
 ...e_excludeOnFailure_unexcluding_expectation.json | 88 ++
 5 files changed, 287 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala 
b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
index 6cb013b..52d41cd 100644
--- a/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
+++ b/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
@@ -759,6 +759,7 @@ private[spark] class AppStatusListener(
   exec.completedTasks += completedDelta
   exec.failedTasks += failedDelta
   exec.totalDuration += event.taskInfo.duration
+  
exec.peakExecutorMetrics.compareAndUpdatePeakValues(event.taskExecutorMetrics)
 
   // Note: For resubmitted tasks, we continue to use the metrics that 
belong to the
   // first attempt of this task. This may not be 100% accurate because the 
first attempt
diff --git 
a/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json
 
b/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json
index c18a2e3..be12507 100644
--- 
a/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json
+++ 
b/core/src/test/resources/HistoryServerExpectations/executor_list_json_expectation.json
@@ -21,6 +21,28 @@
   "addTime" : "2015-02-03T16:43:00.906GMT",
   "executorLogs" : { },
   "blacklistedInStages" : [ ],
+  "peakMemoryMetrics" : {
+"JVMHeapMemory" : 0,
+"JVMOffHeapMemory" : 0,
+"OnHeapExecutionMemory" : 0,
+"OffHeapExecutionMemory" : 0,
+"OnHeapStorageMemory" : 0,
+"OffHeapStorageMemory" : 0,
+"OnHeapUnifiedMemory" : 0,
+"OffHeapUnifiedMemory" : 0,
+"DirectPoolMemory" : 0,
+"MappedPoolMemory" : 0,
+"ProcessTreeJVMVMemory" : 0,
+"ProcessTreeJVMRSSMemory" : 0,
+"ProcessTreePythonVMemory" : 0,
+"ProcessTreePythonRSSMemory" : 0,
+"ProcessTreeOtherVMemory" : 0,
+"ProcessTreeOtherRSSMemory" : 0,
+"MinorGCCount" : 0,
+"MinorGCTime" : 0,
+"MajorGCCount" : 0,
+"MajorGCTime" : 0
+  },
   "attributes" : { },
   "resources" : { },
   "resourceProfileId" : 0,
diff --git 
a/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json
 
b/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json
index 5144934..0a3eb81 100644
--- 
a/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json
+++ 
b/core/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json
@@ -64,6 +64,28 @@
 "totalOffHeapStorageMemory" : 524288000
   },
   "blacklistedInStages" : [ ],
+  "peakMemoryMetrics" : {
+"JVMHeapMemory" : 0,
+"JVMOffHeapMemory" : 0,
+"OnHeapExecutionMemory" : 0,
+"OffHeapExecutionMemory" : 0,
+"OnHeapStorageMemory" : 0,
+"OffHeapStorageMemory" : 0,
+"OnHeapUnifiedMemory" : 0,
+"OffHeapUnifiedMemory" : 0,
+"DirectPoolMemory" : 0,
+"MappedPoolMemory" : 0,
+"ProcessTreeJ

[spark] branch branch-3.1 updated (78d29fe -> 5685a05)

2021-01-07 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 78d29fe  [SPARK-33818][SQL][DOC] Add descriptions about 
`spark.sql.parser.quotedRegexColumnNames` in the SQL documents
 add 5685a05  [SPARK-33938][SQL][3.1] Optimize Like Any/All by 
LikeSimplification

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/regexpExpressions.scala   |  6 +-
 .../spark/sql/catalyst/optimizer/expressions.scala | 79 +++---
 .../optimizer/LikeSimplificationSuite.scala| 68 +++
 3 files changed, 127 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (9b54da4 -> 0de7f2f)

2021-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9b54da4  [SPARK-33818][SQL][DOC] Add descriptions about 
`spark.sql.parser.quotedRegexColumnNames` in the SQL documents
 add 0de7f2f  [SPARK-34039][SQL] ReplaceTable should invalidate cache

No new revisions were added by this update.

Summary of changes:
 .../datasources/v2/DataSourceV2Strategy.scala  | 21 ++---
 .../datasources/v2/ReplaceTableExec.scala  | 14 +++---
 .../datasources/v2/WriteToDataSourceV2Exec.scala   | 22 ++
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 17 +
 4 files changed, 48 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-33818][SQL][DOC] Add descriptions about `spark.sql.parser.quotedRegexColumnNames` in the SQL documents

2021-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 78d29fe  [SPARK-33818][SQL][DOC] Add descriptions about 
`spark.sql.parser.quotedRegexColumnNames` in the SQL documents
78d29fe is described below

commit 78d29feca632d000783d7a1293bf164e19277fb9
Author: angerszhu 
AuthorDate: Thu Jan 7 18:55:27 2021 -0800

[SPARK-33818][SQL][DOC] Add descriptions about 
`spark.sql.parser.quotedRegexColumnNames` in the SQL documents

### What changes were proposed in this pull request?
According to 
https://github.com/apache/spark/pull/30805#issuecomment-747179899,
doc `spark.sql.parser.quotedRegexColumnNames` since  we need user know 
about this in doc and it's useful.


![image](https://user-images.githubusercontent.com/46485123/103656543-afa4aa80-4fa3-11eb-8cd3-a9d1b87a3489.png)

![image](https://user-images.githubusercontent.com/46485123/103656551-b2070480-4fa3-11eb-9ce7-95cc424242a6.png)

### Why are the changes needed?
Complete doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #30816 from AngersZh/SPARK-33818.

Authored-by: angerszhu 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 9b54da490d55d8c12e0a6b2b4b6e3a2d5b6bed86)
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-ref-syntax-qry-select.md | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index bac7c2b..5820a5c 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -41,7 +41,7 @@ select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | 
DISTINCT ] select_stat
 
 While `select_statement` is defined as
 ```sql
-SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
+SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ named_expression | 
regex_column_names ] [ , ... ] }
 FROM { from_item [ , ... ] }
 [ PIVOT clause ]
 [ LATERAL VIEW clause ] [ ... ] 
@@ -151,6 +151,18 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { 
named_expression [ , ... ] }
 
  Specifies aliases for one or more source window specifications. The 
source window specifications can
  be referenced in the widow definitions in the query.
+ 
+* **regex_column_names**
+
+ When `spark.sql.parser.quotedRegexColumnNames` is true, quoted 
identifiers (using backticks) in `SELECT`
+ statement are interpreted as regular expressions and `SELECT` statement 
can take regex-based column specification.
+ For example, below SQL will only take column `c`:
+
+ ```sql
+ SELECT `(a|b)?+.+` FROM (
+   SELECT 1 as a, 2 as b, 3 as c
+ )
+ ```
 
 ### Related Statements
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8e11ce5 -> 9b54da4)

2021-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8e11ce5  [SPARK-34018][K8S] NPE in ExecutorPodsSnapshot
 add 9b54da4  [SPARK-33818][SQL][DOC] Add descriptions about 
`spark.sql.parser.quotedRegexColumnNames` in the SQL documents

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select.md | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-33100][SQL][3.0] Ignore a semicolon inside a bracketed comment in spark-sql

2021-01-07 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e7d5344  [SPARK-33100][SQL][3.0] Ignore a semicolon inside a bracketed 
comment in spark-sql
e7d5344 is described below

commit e7d53449f198bd8c5ee97d58f285994e31ea2d1a
Author: fwang12 
AuthorDate: Fri Jan 8 10:44:12 2021 +0900

[SPARK-33100][SQL][3.0] Ignore a semicolon inside a bracketed comment in 
spark-sql

### What changes were proposed in this pull request?
Now the spark-sql does not support parse the sql statements with bracketed 
comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the 
sql statements.
Besides, we ignore the comment without any content.

NOTE: This backport comes from https://github.com/apache/spark/pull/29982

### Why are the changes needed?
Spark-sql might split the statements inside bracketed comments and it is 
not correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added UT.

Closes #31033 from turboFei/SPARK-33100.

Authored-by: fwang12 
Signed-off-by: Takeshi Yamamuro 
---
 .../sql/hive/thriftserver/SparkSQLCLIDriver.scala  | 50 ++
 .../spark/sql/hive/thriftserver/CliSuite.scala | 23 ++
 2 files changed, 65 insertions(+), 8 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
index 6abb905..581aa68 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
@@ -518,15 +518,32 @@ private[hive] class SparkSQLCLIDriver extends CliDriver 
with Logging {
   // Note: [SPARK-31595] if there is a `'` in a double quoted string, or a `"` 
in a single quoted
   // string, the origin implementation from Hive will not drop the trailing 
semicolon as expected,
   // hence we refined this function a little bit.
+  // Note: [SPARK-33100] Ignore a semicolon inside a bracketed comment in 
spark-sql.
   private def splitSemiColon(line: String): JList[String] = {
 var insideSingleQuote = false
 var insideDoubleQuote = false
-var insideComment = false
+var insideSimpleComment = false
+var bracketedCommentLevel = 0
 var escape = false
 var beginIndex = 0
+var leavingBracketedComment = false
+var isStatement = false
 val ret = new JArrayList[String]
 
+def insideBracketedComment: Boolean = bracketedCommentLevel > 0
+def insideComment: Boolean = insideSimpleComment || insideBracketedComment
+def statementInProgress(index: Int): Boolean = isStatement || 
(!insideComment &&
+  index > beginIndex && !s"${line.charAt(index)}".trim.isEmpty)
+
 for (index <- 0 until line.length) {
+  // Checks if we need to decrement a bracketed comment level; the last 
character '/' of
+  // bracketed comments is still inside the comment, so 
`insideBracketedComment` must keep true
+  // in the previous loop and we decrement the level here if needed.
+  if (leavingBracketedComment) {
+bracketedCommentLevel -= 1
+leavingBracketedComment = false
+  }
+
   if (line.charAt(index) == '\'' && !insideComment) {
 // take a look to see if it is escaped
 // See the comment above about SPARK-31595
@@ -549,21 +566,34 @@ private[hive] class SparkSQLCLIDriver extends CliDriver 
with Logging {
   // Sample query: select "quoted value --"
   //^^ avoids starting a comment 
if it's inside quotes.
 } else if (hasNext && line.charAt(index + 1) == '-') {
-  // ignore quotes and ;
-  insideComment = true
+  // ignore quotes and ; in simple comment
+  insideSimpleComment = true
 }
   } else if (line.charAt(index) == ';') {
 if (insideSingleQuote || insideDoubleQuote || insideComment) {
   // do not split
 } else {
-  // split, do not include ; itself
-  ret.add(line.substring(beginIndex, index))
+  if (isStatement) {
+// split, do not include ; itself
+ret.add(line.substring(beginIndex, index))
+  }
   beginIndex = index + 1
+  isStatement = false
 }
   } else if (

[spark] branch branch-3.1 updated: [SPARK-34018][K8S] NPE in ExecutorPodsSnapshot

2021-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new b98f170  [SPARK-34018][K8S] NPE in ExecutorPodsSnapshot
b98f170 is described below

commit b98f17019d3a1f9c4e71da507b872e7e97a80769
Author: Holden Karau 
AuthorDate: Thu Jan 7 16:47:37 2021 -0800

[SPARK-34018][K8S] NPE in ExecutorPodsSnapshot

### What changes were proposed in this pull request?

Label both the statuses and ensure the ExecutorPodSnapshot starts with the 
default config to match.

### Why are the changes needed?

The current test depends on the order rather than testing the desired 
property.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Labeled the containers statuses, observed failures, added the default label 
as the initialization point, tests passed again.

Built Spark, ran on K8s cluster verified no NPE in driver log.

Closes #31071 from 
holdenk/SPARK-34018-finishedExecutorWithRunningSidecar-doesnt-correctly-constructt-the-test-case.

Authored-by: Holden Karau 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8e11ce5378a2cf69ec87501e86f7ed5963649cbf)
Signed-off-by: Dongjoon Hyun 
---
 .../cluster/k8s/ExecutorPodsSnapshot.scala | 27 ++
 .../cluster/k8s/ExecutorLifecycleTestUtils.scala   |  3 +++
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
index 37aaca7..cb4d881 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
@@ -41,7 +41,7 @@ private[spark] case class ExecutorPodsSnapshot(executorPods: 
Map[Long, ExecutorP
 
 object ExecutorPodsSnapshot extends Logging {
   private var shouldCheckAllContainers: Boolean = _
-  private var sparkContainerName: String = _
+  private var sparkContainerName: String = DEFAULT_EXECUTOR_CONTAINER_NAME
 
   def apply(executorPods: Seq[Pod]): ExecutorPodsSnapshot = {
 ExecutorPodsSnapshot(toStatesByExecutorId(executorPods))
@@ -80,24 +80,21 @@ object ExecutorPodsSnapshot extends Logging {
   .anyMatch(t => t != null && t.getExitCode != 0)) {
 PodFailed(pod)
   } else {
-// Otherwise look for the Spark container
-val sparkContainerStatusOpt = 
pod.getStatus.getContainerStatuses.asScala
-  .find(_.getName() == sparkContainerName)
-sparkContainerStatusOpt match {
-  case Some(sparkContainerStatus) =>
-sparkContainerStatus.getState.getTerminated match {
-  case t if t.getExitCode != 0 =>
-PodFailed(pod)
-  case t if t.getExitCode == 0 =>
+// Otherwise look for the Spark container and get the exit code if 
present.
+val sparkContainerExitCode = 
pod.getStatus.getContainerStatuses.asScala
+  .find(_.getName() == sparkContainerName).flatMap(x => 
Option(x.getState))
+  .flatMap(x => Option(x.getTerminated)).flatMap(x => 
Option(x.getExitCode))
+  .map(_.toInt)
+sparkContainerExitCode match {
+  case Some(t) =>
+t match {
+  case 0 =>
 PodSucceeded(pod)
   case _ =>
-PodRunning(pod)
+PodFailed(pod)
 }
-  // If we can't find the Spark container status, fall back to the 
pod status. This is
-  // expected to occur during pod startup and other situations.
+  // No exit code means we are running.
   case _ =>
-logDebug(s"Unable to find container ${sparkContainerName} in 
pod ${pod} " +
-  "defaulting to entire pod status (running).")
 PodRunning(pod)
 }
   }
diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorLifecycleTestUtils.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorLifecycleTestUtils.scala
index 225278c..41cba57 100644
--- 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorLifecycleTestUtils.scala
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorLifecycleTestUtils.scala
@@ -115,13 +1

[spark] branch master updated: [SPARK-34018][K8S] NPE in ExecutorPodsSnapshot

2021-01-07 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8e11ce5  [SPARK-34018][K8S] NPE in ExecutorPodsSnapshot
8e11ce5 is described below

commit 8e11ce5378a2cf69ec87501e86f7ed5963649cbf
Author: Holden Karau 
AuthorDate: Thu Jan 7 16:47:37 2021 -0800

[SPARK-34018][K8S] NPE in ExecutorPodsSnapshot

### What changes were proposed in this pull request?

Label both the statuses and ensure the ExecutorPodSnapshot starts with the 
default config to match.

### Why are the changes needed?

The current test depends on the order rather than testing the desired 
property.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Labeled the containers statuses, observed failures, added the default label 
as the initialization point, tests passed again.

Built Spark, ran on K8s cluster verified no NPE in driver log.

Closes #31071 from 
holdenk/SPARK-34018-finishedExecutorWithRunningSidecar-doesnt-correctly-constructt-the-test-case.

Authored-by: Holden Karau 
Signed-off-by: Dongjoon Hyun 
---
 .../cluster/k8s/ExecutorPodsSnapshot.scala | 27 ++
 .../cluster/k8s/ExecutorLifecycleTestUtils.scala   |  3 +++
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
index 37aaca7..cb4d881 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
@@ -41,7 +41,7 @@ private[spark] case class ExecutorPodsSnapshot(executorPods: 
Map[Long, ExecutorP
 
 object ExecutorPodsSnapshot extends Logging {
   private var shouldCheckAllContainers: Boolean = _
-  private var sparkContainerName: String = _
+  private var sparkContainerName: String = DEFAULT_EXECUTOR_CONTAINER_NAME
 
   def apply(executorPods: Seq[Pod]): ExecutorPodsSnapshot = {
 ExecutorPodsSnapshot(toStatesByExecutorId(executorPods))
@@ -80,24 +80,21 @@ object ExecutorPodsSnapshot extends Logging {
   .anyMatch(t => t != null && t.getExitCode != 0)) {
 PodFailed(pod)
   } else {
-// Otherwise look for the Spark container
-val sparkContainerStatusOpt = 
pod.getStatus.getContainerStatuses.asScala
-  .find(_.getName() == sparkContainerName)
-sparkContainerStatusOpt match {
-  case Some(sparkContainerStatus) =>
-sparkContainerStatus.getState.getTerminated match {
-  case t if t.getExitCode != 0 =>
-PodFailed(pod)
-  case t if t.getExitCode == 0 =>
+// Otherwise look for the Spark container and get the exit code if 
present.
+val sparkContainerExitCode = 
pod.getStatus.getContainerStatuses.asScala
+  .find(_.getName() == sparkContainerName).flatMap(x => 
Option(x.getState))
+  .flatMap(x => Option(x.getTerminated)).flatMap(x => 
Option(x.getExitCode))
+  .map(_.toInt)
+sparkContainerExitCode match {
+  case Some(t) =>
+t match {
+  case 0 =>
 PodSucceeded(pod)
   case _ =>
-PodRunning(pod)
+PodFailed(pod)
 }
-  // If we can't find the Spark container status, fall back to the 
pod status. This is
-  // expected to occur during pod startup and other situations.
+  // No exit code means we are running.
   case _ =>
-logDebug(s"Unable to find container ${sparkContainerName} in 
pod ${pod} " +
-  "defaulting to entire pod status (running).")
 PodRunning(pod)
 }
   }
diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorLifecycleTestUtils.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorLifecycleTestUtils.scala
index 225278c..41cba57 100644
--- 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorLifecycleTestUtils.scala
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorLifecycleTestUtils.scala
@@ -115,13 +115,16 @@ object ExecutorLifecycleTestUtils {
   .editOrNewStatus()
 .withPhase("running")
 .add

[spark] branch branch-3.1 updated: [SPARK-34044][DOCS] Add spark.sql.hive.metastore.jars.path to sql-data-sources-hive-tables.md

2021-01-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 980774a  [SPARK-34044][DOCS] Add spark.sql.hive.metastore.jars.path to 
sql-data-sources-hive-tables.md
980774a is described below

commit 980774a8f0484873d5187798d8247aa081cb2ff4
Author: Dongjoon Hyun 
AuthorDate: Fri Jan 8 09:34:40 2021 +0900

[SPARK-34044][DOCS] Add spark.sql.hive.metastore.jars.path to 
sql-data-sources-hive-tables.md

### What changes were proposed in this pull request?

This PR adds new configuration to `sql-data-sources-hive-tables`.

### Why are the changes needed?

SPARK-32852 added a new configuration, `spark.sql.hive.metastore.jars.path`.

### Does this PR introduce _any_ user-facing change?

Yes, but a document only.

### How was this patch tested?

**BEFORE**
![Screen Shot 2021-01-07 at 2 57 57 
PM](https://user-images.githubusercontent.com/9700541/103954318-cc9ec200-50f8-11eb-86d3-cd89b07fcd21.png)

**AFTER**
![Screen Shot 2021-01-07 at 2 56 34 
PM](https://user-images.githubusercontent.com/9700541/103954221-9d885080-50f8-11eb-8938-fb91394a33cb.png)

Closes #31085 from dongjoon-hyun/SPARK-34044.

Authored-by: Dongjoon Hyun 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 5b16d70d6a51720660e7607c859fae4f28691952)
Signed-off-by: HyukjinKwon 
---
 docs/sql-data-sources-hive-tables.md | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/docs/sql-data-sources-hive-tables.md 
b/docs/sql-data-sources-hive-tables.md
index ae3572c..376c204 100644
--- a/docs/sql-data-sources-hive-tables.md
+++ b/docs/sql-data-sources-hive-tables.md
@@ -139,7 +139,7 @@ The following options can be used to configure the version 
of Hive that is used
 builtin
 
   Location of the jars that should be used to instantiate the 
HiveMetastoreClient. This
-  property can be one of three options:
+  property can be one of four options:
   
 builtin
 Use Hive 2.3.7, which is bundled with the Spark assembly when 
-Phive is
@@ -148,6 +148,9 @@ The following options can be used to configure the version 
of Hive that is used
 maven
 Use Hive jars of specified version downloaded from Maven repositories. 
This configuration
 is not generally recommended for production deployments.
+path
+Use Hive jars configured by 
spark.sql.hive.metastore.jars.path
+in comma separated format. Support both local or remote paths.
 A classpath in the standard format for the JVM. This classpath 
must include all of Hive
 and its dependencies, including the correct version of Hadoop. These 
jars only need to be
 present on the driver, but if you are running in yarn cluster mode 
then you must ensure
@@ -157,6 +160,28 @@ The following options can be used to configure the version 
of Hive that is used
 1.4.0
   
   
+spark.sql.hive.metastore.jars.path
+(empty)
+
+  Comma-separated paths of the jars that used to instantiate the 
HiveMetastoreClient.
+  This configuration is useful only when 
spark.sql.hive.metastore.jars is set as path. 
+  
+  The paths can be any of the following format:
+  
+file://path/to/jar/foo.jar
+hdfs://nameservice/path/to/jar/foo.jar
+/path/to/jar/(path without URI scheme follow conf 
fs.defaultFS's URI schema)
+[http/https/ftp]://path/to/jar/foo.jar
+  
+  Note that 1, 2, and 3 support wildcard. For example:
+  
+file://path/to/jar/*,file://path2/to/jar/*/*.jar
+
hdfs://nameservice/path/to/jar/*,hdfs://nameservice2/path/to/jar/*/*.jar
+  
+
+3.1.0
+  
+  
 spark.sql.hive.metastore.sharedPrefixes
 
com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-34044][DOCS] Add spark.sql.hive.metastore.jars.path to sql-data-sources-hive-tables.md

2021-01-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5b16d70  [SPARK-34044][DOCS] Add spark.sql.hive.metastore.jars.path to 
sql-data-sources-hive-tables.md
5b16d70 is described below

commit 5b16d70d6a51720660e7607c859fae4f28691952
Author: Dongjoon Hyun 
AuthorDate: Fri Jan 8 09:34:40 2021 +0900

[SPARK-34044][DOCS] Add spark.sql.hive.metastore.jars.path to 
sql-data-sources-hive-tables.md

### What changes were proposed in this pull request?

This PR adds new configuration to `sql-data-sources-hive-tables`.

### Why are the changes needed?

SPARK-32852 added a new configuration, `spark.sql.hive.metastore.jars.path`.

### Does this PR introduce _any_ user-facing change?

Yes, but a document only.

### How was this patch tested?

**BEFORE**
![Screen Shot 2021-01-07 at 2 57 57 
PM](https://user-images.githubusercontent.com/9700541/103954318-cc9ec200-50f8-11eb-86d3-cd89b07fcd21.png)

**AFTER**
![Screen Shot 2021-01-07 at 2 56 34 
PM](https://user-images.githubusercontent.com/9700541/103954221-9d885080-50f8-11eb-8938-fb91394a33cb.png)

Closes #31085 from dongjoon-hyun/SPARK-34044.

Authored-by: Dongjoon Hyun 
Signed-off-by: HyukjinKwon 
---
 docs/sql-data-sources-hive-tables.md | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/docs/sql-data-sources-hive-tables.md 
b/docs/sql-data-sources-hive-tables.md
index ae3572c..376c204 100644
--- a/docs/sql-data-sources-hive-tables.md
+++ b/docs/sql-data-sources-hive-tables.md
@@ -139,7 +139,7 @@ The following options can be used to configure the version 
of Hive that is used
 builtin
 
   Location of the jars that should be used to instantiate the 
HiveMetastoreClient. This
-  property can be one of three options:
+  property can be one of four options:
   
 builtin
 Use Hive 2.3.7, which is bundled with the Spark assembly when 
-Phive is
@@ -148,6 +148,9 @@ The following options can be used to configure the version 
of Hive that is used
 maven
 Use Hive jars of specified version downloaded from Maven repositories. 
This configuration
 is not generally recommended for production deployments.
+path
+Use Hive jars configured by 
spark.sql.hive.metastore.jars.path
+in comma separated format. Support both local or remote paths.
 A classpath in the standard format for the JVM. This classpath 
must include all of Hive
 and its dependencies, including the correct version of Hadoop. These 
jars only need to be
 present on the driver, but if you are running in yarn cluster mode 
then you must ensure
@@ -157,6 +160,28 @@ The following options can be used to configure the version 
of Hive that is used
 1.4.0
   
   
+spark.sql.hive.metastore.jars.path
+(empty)
+
+  Comma-separated paths of the jars that used to instantiate the 
HiveMetastoreClient.
+  This configuration is useful only when 
spark.sql.hive.metastore.jars is set as path. 
+  
+  The paths can be any of the following format:
+  
+file://path/to/jar/foo.jar
+hdfs://nameservice/path/to/jar/foo.jar
+/path/to/jar/(path without URI scheme follow conf 
fs.defaultFS's URI schema)
+[http/https/ftp]://path/to/jar/foo.jar
+  
+  Note that 1, 2, and 3 support wildcard. For example:
+  
+file://path/to/jar/*,file://path2/to/jar/*/*.jar
+
hdfs://nameservice/path/to/jar/*,hdfs://nameservice2/path/to/jar/*/*.jar
+  
+
+3.1.0
+  
+  
 spark.sql.hive.metastore.sharedPrefixes
 
com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-34041][PYTHON][DOCS] Miscellaneous cleanup for new PySpark documentation

2021-01-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new bfb42d4  [SPARK-34041][PYTHON][DOCS] Miscellaneous cleanup for new 
PySpark documentation
bfb42d4 is described below

commit bfb42d4f14db66b13d6a3791bec09f0bd8b397bc
Author: HyukjinKwon 
AuthorDate: Fri Jan 8 09:28:31 2021 +0900

[SPARK-34041][PYTHON][DOCS] Miscellaneous cleanup for new PySpark 
documentation

### What changes were proposed in this pull request?

This PR proposes to:
- Add a link of quick start in PySpark docs into "Programming Guides" in 
Spark main docs
- `ML` / `MLlib` -> `MLlib (DataFrame-based)` / `MLlib (RDD-based)` in API 
reference page
- Mention other user guides as well because the guide such as 
[ML](http://spark.apache.org/docs/latest/ml-guide.html) and 
[SQL](http://spark.apache.org/docs/latest/sql-programming-guide.html).
- Mention other migration guides as well because PySpark can get affected 
by it.

### Why are the changes needed?

For better documentation.

### Does this PR introduce _any_ user-facing change?

It fixes user-facing docs. However, it's not released out yet.

### How was this patch tested?

Manually tested by running:

```bash
cd docs
SKIP_SCALADOC=1 SKIP_RDOC=1 SKIP_SQLDOC=1 jekyll serve --watch
```

Closes #31082 from HyukjinKwon/SPARK-34041.

Authored-by: HyukjinKwon 
Signed-off-by: HyukjinKwon 
(cherry picked from commit aa388cf3d0ff230eb0397876fe2db03bbe51658e)
Signed-off-by: HyukjinKwon 
---
 docs/_layouts/global.html  |  1 +
 docs/index.md  |  2 ++
 python/docs/source/getting_started/index.rst   |  3 +++
 python/docs/source/migration_guide/index.rst   | 12 ++--
 python/docs/source/reference/pyspark.ml.rst| 12 ++--
 python/docs/source/reference/pyspark.mllib.rst |  4 ++--
 python/docs/source/user_guide/index.rst| 12 
 7 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index de98f29..f10d467 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -84,6 +84,7 @@
 MLlib (Machine Learning)
 GraphX (Graph Processing)
 SparkR (R on Spark)
+PySpark (Python on Spark)
 
 
 
diff --git a/docs/index.md b/docs/index.md
index 8fd169e..c4c2d72 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -113,6 +113,8 @@ options for deployment:
 * [Spark Streaming](streaming-programming-guide.html): processing data streams 
using DStreams (old API)
 * [MLlib](ml-guide.html): applying machine learning algorithms
 * [GraphX](graphx-programming-guide.html): processing graphs 
+* [SparkR](sparkr.html): processing data with Spark in R
+* [PySpark](api/python/getting_started/index.html): processing data with Spark 
in Python
 
 **API Docs:**
 
diff --git a/python/docs/source/getting_started/index.rst 
b/python/docs/source/getting_started/index.rst
index 9fa3352..38b9c93 100644
--- a/python/docs/source/getting_started/index.rst
+++ b/python/docs/source/getting_started/index.rst
@@ -21,6 +21,9 @@ Getting Started
 ===
 
 This page summarizes the basic steps required to setup and get started with 
PySpark.
+There are more guides shared with other languages such as
+`Quick Start `_ in 
Programming Guides
+at `the Spark documentation 
`_.
 
 .. toctree::
 :maxdepth: 2
diff --git a/python/docs/source/migration_guide/index.rst 
b/python/docs/source/migration_guide/index.rst
index 41e36b1..88e768d 100644
--- a/python/docs/source/migration_guide/index.rst
+++ b/python/docs/source/migration_guide/index.rst
@@ -21,8 +21,6 @@ Migration Guide
 ===
 
 This page describes the migration guide specific to PySpark.
-Many items of other migration guides can also be applied when migrating 
PySpark to higher versions because PySpark internally shares other components.
-Please also refer other migration guides such as `Migration Guide: SQL, 
Datasets and DataFrame 
`_.
 
 .. toctree::
:maxdepth: 2
@@ -33,3 +31,13 @@ Please also refer other migration guides such as `Migration 
Guide: SQL, Datasets
pyspark_2.2_to_2.3
pyspark_1.4_to_1.5
pyspark_1.0_1.2_to_1.3
+
+
+Many items of other migration guides can also be applied when migrating 
PySpark to higher versions because PySpark internally shares other components.
+Please also refer other migratio

[spark] branch master updated (7b06acc -> aa388cf)

2021-01-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7b06acc  [SPARK-33100][SQL][FOLLOWUP] Find correct bound of bracketed 
comment in spark-sql
 add aa388cf  [SPARK-34041][PYTHON][DOCS] Miscellaneous cleanup for new 
PySpark documentation

No new revisions were added by this update.

Summary of changes:
 docs/_layouts/global.html  |  1 +
 docs/index.md  |  2 ++
 python/docs/source/getting_started/index.rst   |  3 +++
 python/docs/source/migration_guide/index.rst   | 12 ++--
 python/docs/source/reference/pyspark.ml.rst| 12 ++--
 python/docs/source/reference/pyspark.mllib.rst |  4 ++--
 python/docs/source/user_guide/index.rst| 12 
 7 files changed, 36 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #301: Add Spark 3.1.0 accident as news

2021-01-07 Thread GitBox


dongjoon-hyun commented on a change in pull request #301:
URL: https://github.com/apache/spark-website/pull/301#discussion_r553522692



##
File path: news/_posts/2021-01-07-next-official-release-spark-3.1.1.md
##
@@ -0,0 +1,26 @@
+---
+layout: post
+title: "Next official release: Spark 3.1.1" 
+categories:
+- News
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0.
+There was an accident during Spark 3.1.0 RC1 preparation,
+see [[VOTE] Release Spark 3.1.0 
(RC1)](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Spark-3-1-0-RC1-td30524.html)
 in the Spark dev mailing list.

Review comment:
   Since this is a reference to Apache project decision, Shall we use the 
official Apache mailing list link instead of 3rd party URL, `nabble.com`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HyukjinKwon commented on a change in pull request #301: Add Spark 3.1.0 accident as news

2021-01-07 Thread GitBox


HyukjinKwon commented on a change in pull request #301:
URL: https://github.com/apache/spark-website/pull/301#discussion_r553325368



##
File path: news/_posts/2021-01-07-next-official-release-spark-3.1.1.md
##
@@ -0,0 +1,26 @@
+---
+layout: post
+title: "Next official release: Spark 3.1.1" 
+categories:
+- News
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0.
+There was an accident during Spark 3.1.0 RC1 preparation,
+see [[VOTE] Release Spark 3.1.0 
(RC1)](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Spark-3-1-0-RC1-td30524.html)
 in the Spark dev mailing list.
+
+In short, Spark 3.1.0 RC1 was [unexpectedly published into Maven as Spark 
3.1.0](https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.1.0/)
+while it is not officially released to Apache mirrors. We plan to skip this 
release
+and choose 3.1.1 as the next release to prevent the potential problems to the 
end
+users.
+
+Therefore, Spark 3.1.1 will supersede the unofficial Spark 3.1.0 unexpectedly
+published to Maven. We discourage to use this Spark 3.1.0 for any purpose, and 
there
+are no guarantees on using it such as binary compatibility.

Review comment:
   There are two things. One is that the next release is 3.1.1, and another 
one is that 3.1.0 should not be used. I think this post is short enough not to 
make both sentences bold.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HyukjinKwon commented on a change in pull request #301: Add Spark 3.1.0 accident as news

2021-01-07 Thread GitBox


HyukjinKwon commented on a change in pull request #301:
URL: https://github.com/apache/spark-website/pull/301#discussion_r553320256



##
File path: news/_posts/2021-01-07-next-official-release-spark-3.1.1.md
##
@@ -0,0 +1,26 @@
+---
+layout: post
+title: "Next official release: Spark 3.1.1" 

Review comment:
   I think the title is usually not a sentence. I thought the current title 
is concise enough but I don't mind changing it if others prefer.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on a change in pull request #301: Add Spark 3.1.0 accident as news

2021-01-07 Thread GitBox


gengliangwang commented on a change in pull request #301:
URL: https://github.com/apache/spark-website/pull/301#discussion_r553308061



##
File path: news/_posts/2021-01-07-next-official-release-spark-3.1.1.md
##
@@ -0,0 +1,26 @@
+---
+layout: post
+title: "Next official release: Spark 3.1.1" 

Review comment:
   How about 
   ```
   Next official Spark release is 3.1.1 instead of 3.1.0
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on a change in pull request #301: Add Spark 3.1.0 accident as news

2021-01-07 Thread GitBox


gengliangwang commented on a change in pull request #301:
URL: https://github.com/apache/spark-website/pull/301#discussion_r553303723



##
File path: news/_posts/2021-01-07-next-official-release-spark-3.1.1.md
##
@@ -0,0 +1,26 @@
+---
+layout: post
+title: "Next official release: Spark 3.1.1" 
+categories:
+- News
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0.
+There was an accident during Spark 3.1.0 RC1 preparation,
+see [[VOTE] Release Spark 3.1.0 
(RC1)](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Spark-3-1-0-RC1-td30524.html)
 in the Spark dev mailing list.
+
+In short, Spark 3.1.0 RC1 was [unexpectedly published into Maven as Spark 
3.1.0](https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.1.0/)
+while it is not officially released to Apache mirrors. We plan to skip this 
release

Review comment:
   Nit: let's make it more specific
   `We plan to skip this release` => `We plan to skip 3.1.0 release`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] maropu commented on a change in pull request #301: Add Spark 3.1.0 accident as news

2021-01-07 Thread GitBox


maropu commented on a change in pull request #301:
URL: https://github.com/apache/spark-website/pull/301#discussion_r553300023



##
File path: news/_posts/2021-01-07-next-official-release-spark-3.1.1.md
##
@@ -0,0 +1,26 @@
+---
+layout: post
+title: "Next official release: Spark 3.1.1" 
+categories:
+- News
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+The next official Spark release is Spark 3.1.1 instead of Spark 3.1.0.
+There was an accident during Spark 3.1.0 RC1 preparation,
+see [[VOTE] Release Spark 3.1.0 
(RC1)](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Spark-3-1-0-RC1-td30524.html)
 in the Spark dev mailing list.
+
+In short, Spark 3.1.0 RC1 was [unexpectedly published into Maven as Spark 
3.1.0](https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.1.0/)
+while it is not officially released to Apache mirrors. We plan to skip this 
release
+and choose 3.1.1 as the next release to prevent the potential problems to the 
end
+users.
+
+Therefore, Spark 3.1.1 will supersede the unofficial Spark 3.1.0 unexpectedly
+published to Maven. We discourage to use this Spark 3.1.0 for any purpose, and 
there
+are no guarantees on using it such as binary compatibility.

Review comment:
   nit: since it seems the last statement is important for users, how about 
making it bold? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-33100][SQL][FOLLOWUP] Find correct bound of bracketed comment in spark-sql

2021-01-07 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 874c404  [SPARK-33100][SQL][FOLLOWUP] Find correct bound of bracketed 
comment in spark-sql
874c404 is described below

commit 874c40429c5f99ab02cdfd928d9d7b6caaea16ea
Author: fwang12 
AuthorDate: Thu Jan 7 20:49:37 2021 +0900

[SPARK-33100][SQL][FOLLOWUP] Find correct bound of bracketed comment in 
spark-sql

### What changes were proposed in this pull request?

This PR help find correct bound of bracketed comment in spark-sql.

Here is the log for UT of SPARK-33100 in CliSuite before:
```
2021-01-05 13:22:34.768 - stdout> spark-sql> /* SELECT 'test';*/ SELECT 
'test';
2021-01-05 13:22:41.523 - stderr> Time taken: 6.716 seconds, Fetched 1 
row(s)
2021-01-05 13:22:41.599 - stdout> test
2021-01-05 13:22:41.6 - stdout> spark-sql> ;;/* SELECT 'test';*/ SELECT 
'test';
2021-01-05 13:22:41.709 - stdout> test
2021-01-05 13:22:41.709 - stdout> spark-sql> /* SELECT 'test';*/;; SELECT 
'test';
2021-01-05 13:22:41.902 - stdout> spark-sql> SELECT 'test'; -- SELECT 
'test';
2021-01-05 13:22:41.902 - stderr> Time taken: 0.129 seconds, Fetched 1 
row(s)
2021-01-05 13:22:41.902 - stderr> Error in query:
2021-01-05 13:22:41.902 - stderr> mismatched input '' expecting {'(', 
'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 
'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 
'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 
'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 
'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 
'VALUES', 'WITH'}(line 1, pos 19)
2021-01-05 13:22:42.006 - stderr>
2021-01-05 13:22:42.006 - stderr> == SQL ==
2021-01-05 13:22:42.006 - stderr> /* SELECT 'test';*/
2021-01-05 13:22:42.006 - stderr> ---^^^
2021-01-05 13:22:42.006 - stderr>
2021-01-05 13:22:42.006 - stderr> Time taken: 0.226 seconds, Fetched 1 
row(s)
2021-01-05 13:22:42.006 - stdout> test
```
The root cause is that the insideBracketedComment is not accurate.

For `/* comment */`, the last character `/` is not insideBracketedComment 
and it would be treat as beginning of statements.

In this PR, this issue is fixed.

### Why are the changes needed?
To fix the issue described above.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UT

Closes #31054 from turboFei/SPARK-33100-followup.

Authored-by: fwang12 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 7b06acc28b5c37da6c48bc44c3d921309d4ad3a8)
Signed-off-by: Takeshi Yamamuro 
---
 .../sql/hive/thriftserver/SparkSQLCLIDriver.scala  | 24 +++---
 .../spark/sql/hive/thriftserver/CliSuite.scala |  4 ++--
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
index 9155eac..8606aaa 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
@@ -530,15 +530,24 @@ private[hive] class SparkSQLCLIDriver extends CliDriver 
with Logging {
 var bracketedCommentLevel = 0
 var escape = false
 var beginIndex = 0
-var includingStatement = false
+var leavingBracketedComment = false
+var isStatement = false
 val ret = new JArrayList[String]
 
 def insideBracketedComment: Boolean = bracketedCommentLevel > 0
 def insideComment: Boolean = insideSimpleComment || insideBracketedComment
-def statementBegin(index: Int): Boolean = includingStatement || 
(!insideComment &&
+def statementInProgress(index: Int): Boolean = isStatement || 
(!insideComment &&
   index > beginIndex && !s"${line.charAt(index)}".trim.isEmpty)
 
 for (index <- 0 until line.length) {
+  // Checks if we need to decrement a bracketed comment level; the last 
character '/' of
+  // bracketed comments is still inside the comment, so 
`insideBracketedComment` must keep true
+  // in the previous loop and we decrement the level here if needed.
+  if (leavingBracketedComment) {
+bracketedCommentLevel -= 1
+leavingBracketedComment = false
+  }
+
   if (line.charAt(index) == '\'' && !insideComment) {
 // take a look to see if it is escaped
 // See the comment above about SPARK-31595
@@ -568,12 +577,12 @@ priv

[spark] branch master updated (d36cdd5 -> 7b06acc)

2021-01-07 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d36cdd5  [SPARK-33933][SQL] Materialize BroadcastQueryStage first to 
avoid broadcast timeout in AQE
 add 7b06acc  [SPARK-33100][SQL][FOLLOWUP] Find correct bound of bracketed 
comment in spark-sql

No new revisions were added by this update.

Summary of changes:
 .../sql/hive/thriftserver/SparkSQLCLIDriver.scala  | 24 +++---
 .../spark/sql/hive/thriftserver/CliSuite.scala |  4 ++--
 2 files changed, 19 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HyukjinKwon commented on pull request #301: Add Spark 3.1.0 accident as news

2021-01-07 Thread GitBox


HyukjinKwon commented on pull request #301:
URL: https://github.com/apache/spark-website/pull/301#issuecomment-756059735


   @rxin and @mateiz too FYI.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HyukjinKwon commented on pull request #301: Add Spark 3.1.0 accident as news

2021-01-07 Thread GitBox


HyukjinKwon commented on pull request #301:
URL: https://github.com/apache/spark-website/pull/301#issuecomment-756055929


   cc @srowen, @dongjoon-hyun, @cloud-fan, @tgravescs, @holdenk, @HeartSaVioR 
and @jaceklaskowski FYI



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HyukjinKwon opened a new pull request #301: Add Spark 3.1.0 accident as news

2021-01-07 Thread GitBox


HyukjinKwon opened a new pull request #301:
URL: https://github.com/apache/spark-website/pull/301


   This PR adds news about the recent Spark 3.1.0 accident.
   Note that we should still wait for response from the infra team. This PR is 
blocked by it.
   
   I manually generated via `jekyll build` and excluded unrelated changes in 
HTMLs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-33933][SQL] Materialize BroadcastQueryStage first to avoid broadcast timeout in AQE

2021-01-07 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 86c2a58  [SPARK-33933][SQL] Materialize BroadcastQueryStage first to 
avoid broadcast timeout in AQE
86c2a58 is described below

commit 86c2a5829e46b1def9324fdcf68d502555e65be0
Author: Yu Zhong 
AuthorDate: Thu Jan 7 08:59:26 2021 +

[SPARK-33933][SQL] Materialize BroadcastQueryStage first to avoid broadcast 
timeout in AQE

### What changes were proposed in this pull request?
In AdaptiveSparkPlanExec.getFinalPhysicalPlan, when newStages are 
generated, sort the new stages by class type to make sure BroadcastQueryState 
precede others.
It can make sure the broadcast job are submitted before map jobs to avoid 
waiting for job schedule and cause broadcast timeout.

### Why are the changes needed?
When enable AQE, in getFinalPhysicalPlan, spark traversal the physical plan 
bottom up and create query stage for materialized part by createQueryStages and 
materialize those new created query stages to submit map stages or 
broadcasting. When ShuffleQueryStage are materializing before 
BroadcastQueryStage, the map job and broadcast job are submitted almost at the 
same time, but map job will hold all the computing resources. If the map job 
runs slow (when lots of data needs to process an [...]
The workaround to increase spark.sql.broadcastTimeout doesn't make sense 
and graceful, because the data to broadcast is very small.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
1. Add UT
2. Test the code using dev environment in 
https://issues.apache.org/jira/browse/SPARK-33933

Closes #30998 from zhongyu09/aqe-broadcast.

Authored-by: Yu Zhong 
Signed-off-by: Wenchen Fan 
(cherry picked from commit d36cdd55419c104134f88930206bedccdbe4f3c0)
Signed-off-by: Wenchen Fan 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 11 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 24 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index 89d3b53..aa09f21 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -189,8 +189,17 @@ case class AdaptiveSparkPlanExec(
   stagesToReplace = result.newStages ++ stagesToReplace
   executionId.foreach(onUpdatePlan(_, result.newStages.map(_.plan)))
 
+  // SPARK-33933: we should submit tasks of broadcast stages first, to 
avoid waiting
+  // for tasks to be scheduled and leading to broadcast timeout.
+  val reorderedNewStages = result.newStages
+.sortWith {
+  case (_: BroadcastQueryStageExec, _: BroadcastQueryStageExec) => 
false
+  case (_: BroadcastQueryStageExec, _) => true
+  case _ => false
+}
+
   // Start materialization of all new stages and fail fast if any 
stages failed eagerly
-  result.newStages.foreach { stage =>
+  reorderedNewStages.foreach { stage =>
 try {
   stage.materialize().onComplete { res =>
 if (res.isSuccess) {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
index 69f1565..75993d4 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@@ -1431,4 +1431,28 @@ class AdaptiveQueryExecSuite
   }
 }
   }
+
+  test("SPARK-33933: AQE broadcast should not timeout with slow map tasks") {
+val broadcastTimeoutInSec = 1
+val df = spark.sparkContext.parallelize(Range(0, 100), 100)
+  .flatMap(x => {
+Thread.sleep(20)
+for (i <- Range(0, 100)) yield (x % 26, x % 10)
+  }).toDF("index", "pv")
+val dim = Range(0, 26).map(x => (x, ('a' + x).toChar.toString))
+  .toDF("index", "name")
+val testDf = df.groupBy("index")
+  .agg(sum($"pv").alias("pv"))
+  .join(dim, Seq("index"))
+withSQLConf(SQLConf.BROADCAST_TIMEOUT.key -> 
broadcastTimeoutInSec.toString,
+  SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
+  val startTime = System.currentTimeMillis()
+  val result = testDf.collect()
+  val queryTime = System.currentTimeMillis() - startTime
+  assert(resu

[spark] branch master updated: [SPARK-33933][SQL] Materialize BroadcastQueryStage first to avoid broadcast timeout in AQE

2021-01-07 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d36cdd5  [SPARK-33933][SQL] Materialize BroadcastQueryStage first to 
avoid broadcast timeout in AQE
d36cdd5 is described below

commit d36cdd55419c104134f88930206bedccdbe4f3c0
Author: Yu Zhong 
AuthorDate: Thu Jan 7 08:59:26 2021 +

[SPARK-33933][SQL] Materialize BroadcastQueryStage first to avoid broadcast 
timeout in AQE

### What changes were proposed in this pull request?
In AdaptiveSparkPlanExec.getFinalPhysicalPlan, when newStages are 
generated, sort the new stages by class type to make sure BroadcastQueryState 
precede others.
It can make sure the broadcast job are submitted before map jobs to avoid 
waiting for job schedule and cause broadcast timeout.

### Why are the changes needed?
When enable AQE, in getFinalPhysicalPlan, spark traversal the physical plan 
bottom up and create query stage for materialized part by createQueryStages and 
materialize those new created query stages to submit map stages or 
broadcasting. When ShuffleQueryStage are materializing before 
BroadcastQueryStage, the map job and broadcast job are submitted almost at the 
same time, but map job will hold all the computing resources. If the map job 
runs slow (when lots of data needs to process an [...]
The workaround to increase spark.sql.broadcastTimeout doesn't make sense 
and graceful, because the data to broadcast is very small.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
1. Add UT
2. Test the code using dev environment in 
https://issues.apache.org/jira/browse/SPARK-33933

Closes #30998 from zhongyu09/aqe-broadcast.

Authored-by: Yu Zhong 
Signed-off-by: Wenchen Fan 
---
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 11 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 24 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
index 89d3b53..aa09f21 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
@@ -189,8 +189,17 @@ case class AdaptiveSparkPlanExec(
   stagesToReplace = result.newStages ++ stagesToReplace
   executionId.foreach(onUpdatePlan(_, result.newStages.map(_.plan)))
 
+  // SPARK-33933: we should submit tasks of broadcast stages first, to 
avoid waiting
+  // for tasks to be scheduled and leading to broadcast timeout.
+  val reorderedNewStages = result.newStages
+.sortWith {
+  case (_: BroadcastQueryStageExec, _: BroadcastQueryStageExec) => 
false
+  case (_: BroadcastQueryStageExec, _) => true
+  case _ => false
+}
+
   // Start materialization of all new stages and fail fast if any 
stages failed eagerly
-  result.newStages.foreach { stage =>
+  reorderedNewStages.foreach { stage =>
 try {
   stage.materialize().onComplete { res =>
 if (res.isSuccess) {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
index 69f1565..75993d4 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
@@ -1431,4 +1431,28 @@ class AdaptiveQueryExecSuite
   }
 }
   }
+
+  test("SPARK-33933: AQE broadcast should not timeout with slow map tasks") {
+val broadcastTimeoutInSec = 1
+val df = spark.sparkContext.parallelize(Range(0, 100), 100)
+  .flatMap(x => {
+Thread.sleep(20)
+for (i <- Range(0, 100)) yield (x % 26, x % 10)
+  }).toDF("index", "pv")
+val dim = Range(0, 26).map(x => (x, ('a' + x).toChar.toString))
+  .toDF("index", "name")
+val testDf = df.groupBy("index")
+  .agg(sum($"pv").alias("pv"))
+  .join(dim, Seq("index"))
+withSQLConf(SQLConf.BROADCAST_TIMEOUT.key -> 
broadcastTimeoutInSec.toString,
+  SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
+  val startTime = System.currentTimeMillis()
+  val result = testDf.collect()
+  val queryTime = System.currentTimeMillis() - startTime
+  assert(result.length == 26)
+  // make sure the execution time is large enough
+  assert(queryTime > (broadcastTimeo