date:20210718

[spark] branch branch-3.1 updated: [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new ab94702  [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop 
SparkContext in non-K8s env
ab94702 is described below

commit ab94702a4f3a81942fc26c13d84574506a70eff2
Author: Dongjoon Hyun 
AuthorDate: Sun Jul 18 22:26:23 2021 -0700

[SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in 
non-K8s env

### What changes were proposed in this pull request?

According to the discussion on https://github.com/apache/spark/pull/32283 , 
this PR aims to limit the feature of SPARK-34674 to K8s environment only.

### Why are the changes needed?

To reduce the behavior change in non-K8s environment.

### Does this PR introduce _any_ user-facing change?

The change behavior is consistent with 3.1.1 and older Spark releases.

### How was this patch tested?

N/A

Closes #33403 from dongjoon-hyun/SPARK-36193.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fd3e9ce0b9ee09c7dce9f2e029fe96eac51eab96)
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index fa86da9..818b263 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -953,8 +953,8 @@ private[spark] class SparkSubmit extends Logging {
   case t: Throwable =>
 throw findCause(t)
 } finally {
-  if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
-!isThriftServer(args.mainClass)) {
+  if (args.master.startsWith("k8s") && !isShell(args.primaryResource) &&
+  !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)) {
 try {
   SparkContext.getActive.foreach(_.stop())
 } catch {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new c3a23ce  [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop 
SparkContext in non-K8s env
c3a23ce is described below

commit c3a23ce49bb81682575d1b2d11b9fa51de5e8bd7
Author: Dongjoon Hyun 
AuthorDate: Sun Jul 18 22:26:23 2021 -0700

[SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in 
non-K8s env

### What changes were proposed in this pull request?

According to the discussion on https://github.com/apache/spark/pull/32283 , 
this PR aims to limit the feature of SPARK-34674 to K8s environment only.

### Why are the changes needed?

To reduce the behavior change in non-K8s environment.

### Does this PR introduce _any_ user-facing change?

The change behavior is consistent with 3.1.1 and older Spark releases.

### How was this patch tested?

N/A

Closes #33403 from dongjoon-hyun/SPARK-36193.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fd3e9ce0b9ee09c7dce9f2e029fe96eac51eab96)
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index a65be54..8124650 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -957,8 +957,8 @@ private[spark] class SparkSubmit extends Logging {
   case t: Throwable =>
 throw findCause(t)
 } finally {
-  if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
-!isThriftServer(args.mainClass)) {
+  if (args.master.startsWith("k8s") && !isShell(args.primaryResource) &&
+  !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)) {
 try {
   SparkContext.getActive.foreach(_.stop())
 } catch {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fd3e9ce  [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop 
SparkContext in non-K8s env
fd3e9ce is described below

commit fd3e9ce0b9ee09c7dce9f2e029fe96eac51eab96
Author: Dongjoon Hyun 
AuthorDate: Sun Jul 18 22:26:23 2021 -0700

[SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in 
non-K8s env

### What changes were proposed in this pull request?

According to the discussion on https://github.com/apache/spark/pull/32283 , 
this PR aims to limit the feature of SPARK-34674 to K8s environment only.

### Why are the changes needed?

To reduce the behavior change in non-K8s environment.

### Does this PR introduce _any_ user-facing change?

The change behavior is consistent with 3.1.1 and older Spark releases.

### How was this patch tested?

N/A

Closes #33403 from dongjoon-hyun/SPARK-36193.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index a65be54..8124650 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -957,8 +957,8 @@ private[spark] class SparkSubmit extends Logging {
   case t: Throwable =>
 throw findCause(t)
 } finally {
-  if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
-!isThriftServer(args.mainClass)) {
+  if (args.master.startsWith("k8s") && !isShell(args.primaryResource) &&
+  !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)) {
 try {
   SparkContext.getActive.foreach(_.stop())
 } catch {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new b93fa15  [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2
b93fa15 is described below

commit b93fa15ce2b86c1f4c4b1bda1f612aea947b08c8
Author: William Hyun 
AuthorDate: Sun Jul 18 22:14:24 2021 -0700

[SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2

### What changes were proposed in this pull request?
This PR aims to upgrade scalatest-maven-plugin to version 2.0.2.

### Why are the changes needed?
2.0.2 supports build on JDK 11 officially.
- 
https://github.com/scalatest/scalatest-maven-plugin/commit/f45ce192f313553efc29c201593950e38f419a80

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the CIs.

Closes #33408 from williamhyun/SMP.

Authored-by: William Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit df8bae0689d93ece72a271ed8a3b0243ac77dca2)
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index 3b32207..3054401 100644
--- a/pom.xml
+++ b/pom.xml
@@ -162,7 +162,7 @@
 3.2.2
 2.12.14
 2.12
-2.0.0
+2.0.2
 --test
 
 true

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new df8bae0  [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2
df8bae0 is described below

commit df8bae0689d93ece72a271ed8a3b0243ac77dca2
Author: William Hyun 
AuthorDate: Sun Jul 18 22:14:24 2021 -0700

[SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2

### What changes were proposed in this pull request?
This PR aims to upgrade scalatest-maven-plugin to version 2.0.2.

### Why are the changes needed?
2.0.2 supports build on JDK 11 officially.
- 
https://github.com/scalatest/scalatest-maven-plugin/commit/f45ce192f313553efc29c201593950e38f419a80

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the CIs.

Closes #33408 from williamhyun/SMP.

Authored-by: William Hyun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index 59a03bc..1461f31 100644
--- a/pom.xml
+++ b/pom.xml
@@ -162,7 +162,7 @@
 3.2.2
 2.12.14
 2.12
-2.0.0
+2.0.2
 --test
 
 true

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-35810][PYTHON] Deprecate ps.broadcast API

2021-07-18 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 80a9644  [SPARK-35810][PYTHON] Deprecate ps.broadcast API
80a9644 is described below

commit 80a96443725a32053220409051d1937035141e40
Author: itholic 
AuthorDate: Mon Jul 19 10:44:59 2021 +0900

[SPARK-35810][PYTHON] Deprecate ps.broadcast API

### What changes were proposed in this pull request?

The `broadcast` functions in `pyspark.pandas` is duplicated to 
`DataFrame.spark.hint` with `"broadcast"`.

```python
# The below 2 lines are the same
df.spark.hint("broadcast")
ps.broadcast(df)
```

So, we should remove `broadcast` in the future, and show deprecation 
warning for now.

### Why are the changes needed?

For deduplication of functions

### Does this PR introduce _any_ user-facing change?

They see the deprecation warning when using `broadcast` in `pyspark.pandas`.

```python
>>> ps.broadcast(df)
FutureWarning: `broadcast` has been deprecated and will be removed in a 
future version. use `DataFrame.spark.hint` with 'broadcast' for `name` 
parameter instead.
  warnings.warn(
```

### How was this patch tested?

Manually check the warning message and see the build passed.

Closes #33379 from itholic/SPARK-35810.

Lead-authored-by: itholic 
Co-authored-by: Hyukjin Kwon 
Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 67e6120a851066f183e41f57cc3b10f2f3704df7)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/generic.py   | 10 ++
 python/pyspark/pandas/namespace.py |  8 
 2 files changed, 18 insertions(+)

diff --git a/python/pyspark/pandas/generic.py b/python/pyspark/pandas/generic.py
index c60097e..c1009b0 100644
--- a/python/pyspark/pandas/generic.py
+++ b/python/pyspark/pandas/generic.py
@@ -860,6 +860,11 @@ class Frame(object, metaclass=ABCMeta):
 )
 
 if num_files is not None:
+warnings.warn(
+"`num_files` has been deprecated and might be removed in a 
future version. "
+"Use `DataFrame.spark.repartition` instead.",
+FutureWarning,
+)
 sdf = sdf.repartition(num_files)
 
 builder = sdf.write.mode(mode)
@@ -998,6 +1003,11 @@ class Frame(object, metaclass=ABCMeta):
 sdf = psdf.to_spark(index_col=index_col)  # type: ignore
 
 if num_files is not None:
+warnings.warn(
+"`num_files` has been deprecated and might be removed in a 
future version. "
+"Use `DataFrame.spark.repartition` instead.",
+FutureWarning,
+)
 sdf = sdf.repartition(num_files)
 
 builder = sdf.write.mode(mode)
diff --git a/python/pyspark/pandas/namespace.py 
b/python/pyspark/pandas/namespace.py
index a46926d..9af91cb 100644
--- a/python/pyspark/pandas/namespace.py
+++ b/python/pyspark/pandas/namespace.py
@@ -39,6 +39,7 @@ from distutils.version import LooseVersion
 from functools import reduce
 from io import BytesIO
 import json
+import warnings
 
 import numpy as np
 import pandas as pd
@@ -2822,6 +2823,8 @@ def broadcast(obj: DataFrame) -> DataFrame:
 """
 Marks a DataFrame as small enough for use in broadcast joins.
 
+.. deprecated:: 3.2.0
+Use :func:`DataFrame.spark.hint` instead.
 Parameters
 --
 obj : DataFrame
@@ -2852,6 +2855,11 @@ def broadcast(obj: DataFrame) -> DataFrame:
 ...BroadcastHashJoin...
 ...
 """
+warnings.warn(
+"`broadcast` has been deprecated and might be removed in a future 
version. "
+"Use `DataFrame.spark.hint` with 'broadcast' for `name` parameter 
instead.",
+FutureWarning,
+)
 if not isinstance(obj, DataFrame):
 raise TypeError("Invalid type : expected DataFrame got 
{}".format(type(obj).__name__))
 return DataFrame(

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35810][PYTHON] Deprecate ps.broadcast API

2021-07-18 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 67e6120  [SPARK-35810][PYTHON] Deprecate ps.broadcast API
67e6120 is described below

commit 67e6120a851066f183e41f57cc3b10f2f3704df7
Author: itholic 
AuthorDate: Mon Jul 19 10:44:59 2021 +0900

[SPARK-35810][PYTHON] Deprecate ps.broadcast API

### What changes were proposed in this pull request?

The `broadcast` functions in `pyspark.pandas` is duplicated to 
`DataFrame.spark.hint` with `"broadcast"`.

```python
# The below 2 lines are the same
df.spark.hint("broadcast")
ps.broadcast(df)
```

So, we should remove `broadcast` in the future, and show deprecation 
warning for now.

### Why are the changes needed?

For deduplication of functions

### Does this PR introduce _any_ user-facing change?

They see the deprecation warning when using `broadcast` in `pyspark.pandas`.

```python
>>> ps.broadcast(df)
FutureWarning: `broadcast` has been deprecated and will be removed in a 
future version. use `DataFrame.spark.hint` with 'broadcast' for `name` 
parameter instead.
  warnings.warn(
```

### How was this patch tested?

Manually check the warning message and see the build passed.

Closes #33379 from itholic/SPARK-35810.

Lead-authored-by: itholic 
Co-authored-by: Hyukjin Kwon 
Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/generic.py   | 10 ++
 python/pyspark/pandas/namespace.py |  8 
 2 files changed, 18 insertions(+)

diff --git a/python/pyspark/pandas/generic.py b/python/pyspark/pandas/generic.py
index c60097e..c1009b0 100644
--- a/python/pyspark/pandas/generic.py
+++ b/python/pyspark/pandas/generic.py
@@ -860,6 +860,11 @@ class Frame(object, metaclass=ABCMeta):
 )
 
 if num_files is not None:
+warnings.warn(
+"`num_files` has been deprecated and might be removed in a 
future version. "
+"Use `DataFrame.spark.repartition` instead.",
+FutureWarning,
+)
 sdf = sdf.repartition(num_files)
 
 builder = sdf.write.mode(mode)
@@ -998,6 +1003,11 @@ class Frame(object, metaclass=ABCMeta):
 sdf = psdf.to_spark(index_col=index_col)  # type: ignore
 
 if num_files is not None:
+warnings.warn(
+"`num_files` has been deprecated and might be removed in a 
future version. "
+"Use `DataFrame.spark.repartition` instead.",
+FutureWarning,
+)
 sdf = sdf.repartition(num_files)
 
 builder = sdf.write.mode(mode)
diff --git a/python/pyspark/pandas/namespace.py 
b/python/pyspark/pandas/namespace.py
index a46926d..9af91cb 100644
--- a/python/pyspark/pandas/namespace.py
+++ b/python/pyspark/pandas/namespace.py
@@ -39,6 +39,7 @@ from distutils.version import LooseVersion
 from functools import reduce
 from io import BytesIO
 import json
+import warnings
 
 import numpy as np
 import pandas as pd
@@ -2822,6 +2823,8 @@ def broadcast(obj: DataFrame) -> DataFrame:
 """
 Marks a DataFrame as small enough for use in broadcast joins.
 
+.. deprecated:: 3.2.0
+Use :func:`DataFrame.spark.hint` instead.
 Parameters
 --
 obj : DataFrame
@@ -2852,6 +2855,11 @@ def broadcast(obj: DataFrame) -> DataFrame:
 ...BroadcastHashJoin...
 ...
 """
+warnings.warn(
+"`broadcast` has been deprecated and might be removed in a future 
version. "
+"Use `DataFrame.spark.hint` with 'broadcast' for `name` parameter 
instead.",
+FutureWarning,
+)
 if not isinstance(obj, DataFrame):
 raise TypeError("Invalid type : expected DataFrame got 
{}".format(type(obj).__name__))
 return DataFrame(

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36198][TESTS] Skip UNIDOC generation in PySpark GHA job

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new d5cec45  [SPARK-36198][TESTS] Skip UNIDOC generation in PySpark GHA job
d5cec45 is described below

commit d5cec45c0b0feaf2dd6014cf82bf0d7d25f5ac87
Author: William Hyun 
AuthorDate: Sun Jul 18 17:52:28 2021 -0700

[SPARK-36198][TESTS] Skip UNIDOC generation in PySpark GHA job

### What changes were proposed in this pull request?
This PR aims to skip UNIDOC generation in PySpark GHA job.

### Why are the changes needed?

PySpark GHA jobs do not need to generate Java/Scala doc. This will save 
about 13 minutes in total.
-https://github.com/apache/spark/runs/3098268973?check_suite_focus=true
```
...

Building Unidoc API Documentation

[info] Building Spark unidoc using SBT with these arguments:  -Phadoop-3.2 
-Phive-2.3 -Pscala-2.12 -Phive-thriftserver -Pmesos -Pdocker-integration-tests 
-Phive -Pkinesis-asl -Pspark-ganglia-lgpl -Pkubernetes -Phadoop-cloud -Pyarn 
unidoc
...
[info] Main Java API documentation successful.
[success] Total time: 192 s (03:12), completed Jul 18, 2021 6:08:40 PM
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the GHA.

Closes #33407 from williamhyun/SKIP_UNIDOC.

Authored-by: William Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit c336f73ccddc1d163caa0a619919f3bbc9bf34ab)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 1 +
 dev/run-tests.py | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 62f37d3..66a0eda 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -170,6 +170,7 @@ jobs:
   HIVE_PROFILE: hive2.3
   GITHUB_PREV_SHA: ${{ github.event.before }}
   SPARK_LOCAL_IP: localhost
+  SKIP_UNIDOC: true
 steps:
 - name: Checkout Spark repository
   uses: actions/checkout@v2
diff --git a/dev/run-tests.py b/dev/run-tests.py
index 3055dcc..97523e7 100755
--- a/dev/run-tests.py
+++ b/dev/run-tests.py
@@ -397,7 +397,7 @@ def build_spark_assembly_sbt(extra_profiles, 
checkstyle=False):
 if checkstyle:
 run_java_style_checks(build_profiles)
 
-if not os.environ.get("AMPLAB_JENKINS"):
+if not os.environ.get("AMPLAB_JENKINS") and not 
os.environ.get("SKIP_UNIDOC"):
 build_spark_unidoc_sbt(extra_profiles)
 
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f85855c -> c336f73)

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f85855c  [SPARK-36075][K8S] Support for specifiying executor/driver 
node selector
 add c336f73  [SPARK-36198][TESTS] Skip UNIDOC generation in PySpark GHA job

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 1 +
 dev/run-tests.py | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a9e2156 -> f85855c)

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a9e2156  [SPARK-35460][K8S] verify the content 
of`spark.kubernetes.executor.podNamePrefix` before post it to k8s api-server
 add f85855c  [SPARK-36075][K8S] Support for specifiying executor/driver 
node selector

No new revisions were added by this update.

Summary of changes:
 docs/running-on-kubernetes.md  | 22 +
 .../scala/org/apache/spark/deploy/k8s/Config.scala |  4 
 .../apache/spark/deploy/k8s/KubernetesConf.scala   |  6 +
 .../k8s/features/BasicDriverFeatureStep.scala  |  1 +
 .../k8s/features/BasicExecutorFeatureStep.scala|  1 +
 .../spark/deploy/k8s/KubernetesConfSuite.scala | 28 ++
 .../k8s/features/BasicDriverFeatureStepSuite.scala | 15 
 .../features/BasicExecutorFeatureStepSuite.scala   | 17 +
 8 files changed, 94 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fe94bf0 -> a9e2156)

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fe94bf0  [SPARK-36014][K8S] Use uuid as app id in kubernetes client 
mode
 add a9e2156  [SPARK-35460][K8S] verify the content 
of`spark.kubernetes.executor.podNamePrefix` before post it to k8s api-server

No new revisions were added by this update.

Summary of changes:
 docs/running-on-kubernetes.md  |  5 +-
 .../scala/org/apache/spark/deploy/k8s/Config.scala | 25 +-
 .../k8s/features/BasicExecutorFeatureStep.scala| 10 ++--
 .../k8s/features/DriverServiceFeatureStep.scala|  3 +-
 .../deploy/k8s/submit/KubernetesClientUtils.scala  |  6 ++-
 .../features/BasicExecutorFeatureStepSuite.scala   | 56 --
 6 files changed, 82 insertions(+), 23 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36014][K8S] Use uuid as app id in kubernetes client mode

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fe94bf0  [SPARK-36014][K8S] Use uuid as app id in kubernetes client 
mode
fe94bf0 is described below

commit fe94bf07f9acec302e7d8becd7e576c777337331
Author: ulysses-you 
AuthorDate: Sun Jul 18 15:41:47 2021 -0700

[SPARK-36014][K8S] Use uuid as app id in kubernetes client mode

### What changes were proposed in this pull request?

Use uuid instead of `System. currentTimeMillis` as app id in kubernetes 
client mode.

### Why are the changes needed?

Currently, spark on kubernetes with client mode would use 
`"spark-application-" + System.currentTimeMillis` as app id by default. It 
would cause app id conflict if submit several spark applications to kubernetes 
cluster in a short time.

Unfortunately, the event log use app id as the file name. With the conflict 
event log file, the exception was thrown.

```
Caused by: java.io.FileNotFoundException: File does not exist: 
xxx/spark-application-1624766876324.lz4.inprogress (inode 5984170846) Holder 
does not have any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2697)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:521)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:161)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
```

### Does this PR introduce _any_ user-facing change?

yes

### How was this patch tested?

manual test


![image](https://user-images.githubusercontent.com/12025282/124435341-7a88e180-dda7-11eb-8e62-bdfec6a0ee3b.png)

Closes #33211 from ulysses-you/k8s-appid.

Authored-by: ulysses-you 
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala| 5 -
 .../spark/deploy/k8s/submit/KubernetesClientApplication.scala  | 4 +---
 .../scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala  | 7 ---
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
index 937c5f5..de084da 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
@@ -16,7 +16,7 @@
  */
 package org.apache.spark.deploy.k8s
 
-import java.util.Locale
+import java.util.{Locale, UUID}
 
 import io.fabric8.kubernetes.api.model.{LocalObjectReference, 
LocalObjectReferenceBuilder, Pod}
 
@@ -225,6 +225,9 @@ private[spark] object KubernetesConf {
 new KubernetesExecutorConf(sparkConf.clone(), appId, executorId, 
driverPod, resourceProfileId)
   }
 
+  def getKubernetesAppId(): String =
+s"spark-${UUID.randomUUID().toString.replaceAll("-", "")}"
+
   def getResourceNamePrefix(appName: String): String = {
 val id = KubernetesUtils.uniqueID()
 s"$appName-$id"
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
index 3140502..e3b80b1 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
@@ -16,8 +16,6 @@
  */
 package org.apache.spark.deploy.k8s.submit
 
-import java.util.UUID
-
 import scala.collection.JavaConverters._
 import scala.collection.mutable
 import scala.util.control.Breaks._
@@ -191,7

[spark] branch branch-3.1 updated: [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 62f6761  [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's 
version
62f6761 is described below

commit 62f6761883f855ec97fbc0c69a7da3b0db7f4170
Author: yoda-mon 
AuthorDate: Sun Jul 18 14:26:15 2021 -0700

[SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version

### What changes were proposed in this pull request?

Add reference to kubernetes-client's version

### Why are the changes needed?

Running Spark on Kubernetes potentially has upper limitation of Kubernetes 
version.
I think it is better for users to notice it because Kubernetes update speed 
is so fast that users tends to run Spark Jobs on unsupported version.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

SKIP_API=1 bundle exec jekyll build

Closes #33255 from yoda-mon/add-reference-kubernetes-client.

Authored-by: yoda-mon 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit eea69c122f20577956c4a87a6d8eb59943c1c6f0)
Signed-off-by: Dongjoon Hyun 
---
 docs/running-on-kubernetes.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index b9a018a..125c952 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -51,6 +51,7 @@ you may set up a test cluster on your local machine using
   * Be aware that the default minikube configuration is not enough for running 
Spark applications.
   We recommend 3 CPUs and 4g of memory to be able to start a simple Spark 
application with a single
   executor.
+  * Check [kubernetes-client 
library](https://github.com/fabric8io/kubernetes-client)'s version of your 
Spark environment, and its compatibility with your Kubernetes cluster's version.
 * You must have appropriate permissions to list, create, edit and delete
 [pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can 
verify that you can list these resources
 by running `kubectl auth can-i  pods`.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (92d4563 -> eea69c1)

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92d4563  [MINOR][SQL] Fix typo for config hint in SQLConf.scala
 add eea69c1  [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's 
version

No new revisions were added by this update.

Summary of changes:
 docs/running-on-kubernetes.md | 1 +
 1 file changed, 1 insertion(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 46ddb17  [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's 
version
46ddb17 is described below

commit 46ddb17da4673beb9edeef1886868eadd78cd883
Author: yoda-mon 
AuthorDate: Sun Jul 18 14:26:15 2021 -0700

[SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version

### What changes were proposed in this pull request?

Add reference to kubernetes-client's version

### Why are the changes needed?

Running Spark on Kubernetes potentially has upper limitation of Kubernetes 
version.
I think it is better for users to notice it because Kubernetes update speed 
is so fast that users tends to run Spark Jobs on unsupported version.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

SKIP_API=1 bundle exec jekyll build

Closes #33255 from yoda-mon/add-reference-kubernetes-client.

Authored-by: yoda-mon 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit eea69c122f20577956c4a87a6d8eb59943c1c6f0)
Signed-off-by: Dongjoon Hyun 
---
 docs/running-on-kubernetes.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 530951e..6ca3375 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -53,6 +53,7 @@ you may set up a test cluster on your local machine using
   * Be aware that the default minikube configuration is not enough for running 
Spark applications.
   We recommend 3 CPUs and 4g of memory to be able to start a simple Spark 
application with a single
   executor.
+  * Check [kubernetes-client 
library](https://github.com/fabric8io/kubernetes-client)'s version of your 
Spark environment, and its compatibility with your Kubernetes cluster's version.
 * You must have appropriate permissions to list, create, edit and delete
 [pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can 
verify that you can list these resources
 by running `kubectl auth can-i  pods`.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][SQL] Fix typo for config hint in SQLConf.scala

2021-07-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 92d4563  [MINOR][SQL] Fix typo for config hint in SQLConf.scala
92d4563 is described below

commit 92d45631246e206bdc11f702972306b59f5beb15
Author: Bessenyei Balázs Donát <9086834+bes...@users.noreply.github.com>
AuthorDate: Sun Jul 18 15:33:26 2021 -0500

[MINOR][SQL] Fix typo for config hint in SQLConf.scala

### What changes were proposed in this pull request?

This PR fixes typo for 
`spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation` in 
`SQLConf.scala`.

### Why are the changes needed?

This is a [Broken windows 
theory](https://en.wikipedia.org/wiki/Broken_windows_theory) change.

### Does this PR introduce _any_ user-facing change?

Yes, after merging this PR, the error message for commands such as
```python

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation",
 "true")
```
, users will get a typo-free exception.

### How was this patch tested?

This is a trivial change.

Closes #33389 from bessbd/patch-1.

Authored-by: Bessenyei Balázs Donát 
<9086834+bes...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index b9663bb..0add7f5 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -3437,7 +3437,7 @@ object SQLConf {
 "It was removed to prevent errors like SPARK-23173 for non-default 
value."),
   RemovedConfig(
 "spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation", 
"3.0.0", "false",
-"It was removed to prevent loosing of users data for non-default 
value."),
+"It was removed to prevent loss of user data for non-default value."),
   RemovedConfig("spark.sql.legacy.compareDateTimestampInTimestamp", 
"3.0.0", "true",
 "It was removed to prevent errors like SPARK-23549 for non-default 
value."),
   RemovedConfig("spark.sql.parquet.int64AsTimestampMillis", "3.0.0", 
"false",

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36090][SQL] Support TimestampNTZType in expression Sequence

2021-07-18 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 85f70a1  [SPARK-36090][SQL] Support TimestampNTZType in expression 
Sequence
85f70a1 is described below

commit 85f70a1181b1b11417c197cee411e0ec9ced2373
Author: gengjiaan 
AuthorDate: Sun Jul 18 20:46:23 2021 +0300

[SPARK-36090][SQL] Support TimestampNTZType in expression Sequence

### What changes were proposed in this pull request?
The current implement of `Sequence` accept `TimestampType`, `DateType` and 
`IntegralType`. This PR will let `Sequence` accepts `TimestampNTZType`.

### Why are the changes needed?
We can generate sequence for timestamp without time zone.

### Does this PR introduce _any_ user-facing change?
'Yes'.
This PR will let `Sequence` accepts `TimestampNTZType`.

### How was this patch tested?
New tests.

Closes #33360 from beliefer/SPARK-36090.

Lead-authored-by: gengjiaan 
Co-authored-by: Jiaan Geng 
Signed-off-by: Max Gekk 
(cherry picked from commit 42275bb20d6849ee9df488d9ec1fa402f394ac89)
Signed-off-by: Max Gekk 
---
 .../expressions/collectionOperations.scala |  48 +---
 .../spark/sql/catalyst/util/DateTimeUtils.scala|  21 +++-
 .../expressions/CollectionExpressionsSuite.scala   | 122 -
 3 files changed, 172 insertions(+), 19 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 2883d8d..730b8d0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -2568,7 +2568,7 @@ case class Sequence(
 val typesCorrect =
   startType.sameType(stop.dataType) &&
 (startType match {
-  case TimestampType =>
+  case TimestampType | TimestampNTZType =>
 stepOpt.isEmpty || CalendarIntervalType.acceptsType(stepType) ||
   YearMonthIntervalType.acceptsType(stepType) ||
   DayTimeIntervalType.acceptsType(stepType)
@@ -2614,20 +2614,20 @@ case class Sequence(
   val ct = ClassTag[T](iType.tag.mirror.runtimeClass(iType.tag.tpe))
   new IntegralSequenceImpl(iType)(ct, iType.integral)
 
-case TimestampType =>
+case TimestampType | TimestampNTZType =>
   if (stepOpt.isEmpty || 
CalendarIntervalType.acceptsType(stepOpt.get.dataType)) {
-new TemporalSequenceImpl[Long](LongType, 1, identity, zoneId)
+new TemporalSequenceImpl[Long](LongType, start.dataType, 1, identity, 
zoneId)
   } else if (YearMonthIntervalType.acceptsType(stepOpt.get.dataType)) {
-new PeriodSequenceImpl[Long](LongType, 1, identity, zoneId)
+new PeriodSequenceImpl[Long](LongType, start.dataType, 1, identity, 
zoneId)
   } else {
-new DurationSequenceImpl[Long](LongType, 1, identity, zoneId)
+new DurationSequenceImpl[Long](LongType, start.dataType, 1, identity, 
zoneId)
   }
 
 case DateType =>
   if (stepOpt.isEmpty || 
CalendarIntervalType.acceptsType(stepOpt.get.dataType)) {
-new TemporalSequenceImpl[Int](IntegerType, MICROS_PER_DAY, _.toInt, 
zoneId)
+new TemporalSequenceImpl[Int](IntegerType, start.dataType, 
MICROS_PER_DAY, _.toInt, zoneId)
   } else {
-new PeriodSequenceImpl[Int](IntegerType, MICROS_PER_DAY, _.toInt, 
zoneId)
+new PeriodSequenceImpl[Int](IntegerType, start.dataType, 
MICROS_PER_DAY, _.toInt, zoneId)
   }
   }
 
@@ -2769,8 +2769,9 @@ object Sequence {
   }
 
   private class PeriodSequenceImpl[T: ClassTag]
-  (dt: IntegralType, scale: Long, fromLong: Long => T, zoneId: ZoneId)
-  (implicit num: Integral[T]) extends InternalSequenceBase(dt, scale, 
fromLong, zoneId) {
+  (dt: IntegralType, outerDataType: DataType, scale: Long, fromLong: Long 
=> T, zoneId: ZoneId)
+  (implicit num: Integral[T])
+extends InternalSequenceBase(dt, outerDataType, scale, fromLong, zoneId) {
 
 override val defaultStep: DefaultStep = new DefaultStep(
   (dt.ordering.lteq _).asInstanceOf[LessThanOrEqualFn],
@@ -2794,8 +2795,9 @@ object Sequence {
   }
 
   private class DurationSequenceImpl[T: ClassTag]
-  (dt: IntegralType, scale: Long, fromLong: Long => T, zoneId: ZoneId)
-  (implicit num: Integral[T]) extends InternalSequenceBase(dt, scale, 
fromLong, zoneId) {
+  (dt: IntegralType, outerDataType: DataType, scale: Long, fromLong: Long 
=> T, zoneId: ZoneId)
+  (implicit num: Integral[T])
+extends InternalSequenceBase(dt, outerDataType, scale, fromLong, zoneId) {
 
 override val

[spark] branch master updated (d7df7a8 -> 42275bb)

2021-07-18 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d7df7a8  [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g
 add 42275bb  [SPARK-36090][SQL] Support TimestampNTZType in expression 
Sequence

No new revisions were added by this update.

Summary of changes:
 .../expressions/collectionOperations.scala |  48 +---
 .../spark/sql/catalyst/util/DateTimeUtils.scala|  21 +++-
 .../expressions/CollectionExpressionsSuite.scala   | 122 -
 3 files changed, 172 insertions(+), 19 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 8a7fa43  [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g
8a7fa43 is described below

commit 8a7fa439fad5fd13b29fe919ce178908cbbe816c
Author: Dongjoon Hyun 
AuthorDate: Sun Jul 18 10:15:15 2021 -0700

[SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g

This PR aims to set `MaxMetaspaceSize` to `2g` because it's increasing the 
native memory consumption unlimitedly by default. The unlimited increasing 
memory causes GitHub Action flakiness. The value I observed during `hive` 
module test was over 1.8G and growing.

- 
https://docs.oracle.com/javase/10/gctuning/other-considerations.htm#JSGCT-GUID-BFB89453-60C0-42AC-81CA-87D59B0ACE2E
> Starting with JDK 8, the permanent generation was removed and the class 
metadata is allocated in native memory. The amount of native memory that can be 
used for class metadata is by default unlimited. Use the option 
-XX:MaxMetaspaceSize to put an upper limit on the amount of native memory used 
for class metadata.

In addition, I increased the following memory limit to 4g consistently from 
two places.
```xml
- -Xms2048m
- -Xmx2048m
+ -Xms4g
+ -Xmx4g
```

```scala
- javaOptions += "-Xmx3g",
+ javaOptions ++= "-Xmx4g -XX:MaxMetaspaceSize=2g".split(" ").toSeq,
```

This will reduce the flakiness in CI environment by limiting the memory 
usage explicitly.

When we limit it with `1g`, Hive module fails with `OOM` like the following.
```
java.lang.OutOfMemoryError: Metaspace
Error: Exception in thread "dispatcher-event-loop-110" 
java.lang.OutOfMemoryError: Metaspace
```

No.

Pass the CIs.

Closes #33405 from dongjoon-hyun/SPARK-36195.

Lead-authored-by: Dongjoon Hyun 
Co-authored-by: Kyle Bendickson 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit d7df7a805fcbdf2435df1e78abd9899d3ca10dd2)
Signed-off-by: Dongjoon Hyun 
---
 pom.xml  | 5 +++--
 project/SparkBuild.scala | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/pom.xml b/pom.xml
index a7e3a73..1fb7c5a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2517,6 +2517,7 @@
 
   -Xms1024m
   -Xmx1024m
+  -XX:MaxMetaspaceSize=2g
   -XX:ReservedCodeCacheSize=${CodeCacheSize}
 
 
@@ -2565,7 +2566,7 @@
   **/*Suite.java
 
 
${project.build.directory}/surefire-reports
--ea -Xmx4g -Xss4m 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
+-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true

[spark] branch branch-3.2 updated: [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 8059a7e  [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g
8059a7e is described below

commit 8059a7e5e6726e7ca1401416be90b92c305c5060
Author: Dongjoon Hyun 
AuthorDate: Sun Jul 18 10:15:15 2021 -0700

[SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g

### What changes were proposed in this pull request?

This PR aims to set `MaxMetaspaceSize` to `2g` because it's increasing the 
native memory consumption unlimitedly by default. The unlimited increasing 
memory causes GitHub Action flakiness. The value I observed during `hive` 
module test was over 1.8G and growing.

- 
https://docs.oracle.com/javase/10/gctuning/other-considerations.htm#JSGCT-GUID-BFB89453-60C0-42AC-81CA-87D59B0ACE2E
> Starting with JDK 8, the permanent generation was removed and the class 
metadata is allocated in native memory. The amount of native memory that can be 
used for class metadata is by default unlimited. Use the option 
-XX:MaxMetaspaceSize to put an upper limit on the amount of native memory used 
for class metadata.

In addition, I increased the following memory limit to 4g consistently from 
two places.
```xml
- -Xms2048m
- -Xmx2048m
+ -Xms4g
+ -Xmx4g
```

```scala
- javaOptions += "-Xmx3g",
+ javaOptions ++= "-Xmx4g -XX:MaxMetaspaceSize=2g".split(" ").toSeq,
```

### Why are the changes needed?

This will reduce the flakiness in CI environment by limiting the memory 
usage explicitly.

When we limit it with `1g`, Hive module fails with `OOM` like the following.
```
java.lang.OutOfMemoryError: Metaspace
Error: Exception in thread "dispatcher-event-loop-110" 
java.lang.OutOfMemoryError: Metaspace
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #33405 from dongjoon-hyun/SPARK-36195.

Lead-authored-by: Dongjoon Hyun 
Co-authored-by: Kyle Bendickson 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit d7df7a805fcbdf2435df1e78abd9899d3ca10dd2)
Signed-off-by: Dongjoon Hyun 
---
 pom.xml  | 9 +
 project/SparkBuild.scala | 4 ++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/pom.xml b/pom.xml
index a49894e..3b32207 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2611,8 +2611,9 @@
 
 
   -Xss128m
-  -Xms2048m
-  -Xmx2048m
+  -Xms4g
+  -Xmx4g
+  -XX:MaxMetaspaceSize=2g
   -XX:ReservedCodeCacheSize=${CodeCacheSize}
 
 
@@ -2661,7 +2662,7 @@
   **/*Suite.java
 
 
${project.build.directory}/surefire-reports
--ea -Xmx4g -Xss4m 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
+-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true

[spark] branch master updated: [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g

2021-07-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d7df7a8  [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g
d7df7a8 is described below

commit d7df7a805fcbdf2435df1e78abd9899d3ca10dd2
Author: Dongjoon Hyun 
AuthorDate: Sun Jul 18 10:15:15 2021 -0700

[SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g

### What changes were proposed in this pull request?

This PR aims to set `MaxMetaspaceSize` to `2g` because it's increasing the 
native memory consumption unlimitedly by default. The unlimited increasing 
memory causes GitHub Action flakiness. The value I observed during `hive` 
module test was over 1.8G and growing.

- 
https://docs.oracle.com/javase/10/gctuning/other-considerations.htm#JSGCT-GUID-BFB89453-60C0-42AC-81CA-87D59B0ACE2E
> Starting with JDK 8, the permanent generation was removed and the class 
metadata is allocated in native memory. The amount of native memory that can be 
used for class metadata is by default unlimited. Use the option 
-XX:MaxMetaspaceSize to put an upper limit on the amount of native memory used 
for class metadata.

In addition, I increased the following memory limit to 4g consistently from 
two places.
```xml
- -Xms2048m
- -Xmx2048m
+ -Xms4g
+ -Xmx4g
```

```scala
- javaOptions += "-Xmx3g",
+ javaOptions ++= "-Xmx4g -XX:MaxMetaspaceSize=2g".split(" ").toSeq,
```

### Why are the changes needed?

This will reduce the flakiness in CI environment by limiting the memory 
usage explicitly.

When we limit it with `1g`, Hive module fails with `OOM` like the following.
```
java.lang.OutOfMemoryError: Metaspace
Error: Exception in thread "dispatcher-event-loop-110" 
java.lang.OutOfMemoryError: Metaspace
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #33405 from dongjoon-hyun/SPARK-36195.

Lead-authored-by: Dongjoon Hyun 
Co-authored-by: Kyle Bendickson 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml  | 9 +
 project/SparkBuild.scala | 4 ++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/pom.xml b/pom.xml
index aa69c0e..59a03bc 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2611,8 +2611,9 @@
 
 
   -Xss128m
-  -Xms2048m
-  -Xmx2048m
+  -Xms4g
+  -Xmx4g
+  -XX:MaxMetaspaceSize=2g
   -XX:ReservedCodeCacheSize=${CodeCacheSize}
 
 
@@ -2661,7 +2662,7 @@
   **/*Suite.java
 
 
${project.build.directory}/surefire-reports
--ea -Xmx4g -Xss4m 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
+-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true

[spark] branch branch-3.1 updated: [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env

[spark] branch branch-3.2 updated: [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env

[spark] branch master updated: [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env

[spark] branch branch-3.2 updated: [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2

[spark] branch master updated: [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2

[spark] branch branch-3.2 updated: [SPARK-35810][PYTHON] Deprecate ps.broadcast API

[spark] branch master updated: [SPARK-35810][PYTHON] Deprecate ps.broadcast API

[spark] branch branch-3.2 updated: [SPARK-36198][TESTS] Skip UNIDOC generation in PySpark GHA job

[spark] branch master updated (f85855c -> c336f73)

[spark] branch master updated (a9e2156 -> f85855c)

[spark] branch master updated (fe94bf0 -> a9e2156)

[spark] branch master updated: [SPARK-36014][K8S] Use uuid as app id in kubernetes client mode

[spark] branch branch-3.1 updated: [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version

[spark] branch master updated (92d4563 -> eea69c1)

[spark] branch branch-3.2 updated: [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version

[spark] branch master updated: [MINOR][SQL] Fix typo for config hint in SQLConf.scala

[spark] branch branch-3.2 updated: [SPARK-36090][SQL] Support TimestampNTZType in expression Sequence

[spark] branch master updated (d7df7a8 -> 42275bb)

[spark] branch branch-3.1 updated: [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g

[spark] branch branch-3.2 updated: [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g

[spark] branch master updated: [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g

21 matches

Site Navigation

Mail list logo

Footer information