[spark] branch branch-3.1 updated: [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new ab94702 [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env ab94702 is described below commit ab94702a4f3a81942fc26c13d84574506a70eff2 Author: Dongjoon Hyun AuthorDate: Sun Jul 18 22:26:23 2021 -0700 [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env ### What changes were proposed in this pull request? According to the discussion on https://github.com/apache/spark/pull/32283 , this PR aims to limit the feature of SPARK-34674 to K8s environment only. ### Why are the changes needed? To reduce the behavior change in non-K8s environment. ### Does this PR introduce _any_ user-facing change? The change behavior is consistent with 3.1.1 and older Spark releases. ### How was this patch tested? N/A Closes #33403 from dongjoon-hyun/SPARK-36193. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit fd3e9ce0b9ee09c7dce9f2e029fe96eac51eab96) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index fa86da9..818b263 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -953,8 +953,8 @@ private[spark] class SparkSubmit extends Logging { case t: Throwable => throw findCause(t) } finally { - if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && -!isThriftServer(args.mainClass)) { + if (args.master.startsWith("k8s") && !isShell(args.primaryResource) && + !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)) { try { SparkContext.getActive.foreach(_.stop()) } catch { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new c3a23ce [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env c3a23ce is described below commit c3a23ce49bb81682575d1b2d11b9fa51de5e8bd7 Author: Dongjoon Hyun AuthorDate: Sun Jul 18 22:26:23 2021 -0700 [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env ### What changes were proposed in this pull request? According to the discussion on https://github.com/apache/spark/pull/32283 , this PR aims to limit the feature of SPARK-34674 to K8s environment only. ### Why are the changes needed? To reduce the behavior change in non-K8s environment. ### Does this PR introduce _any_ user-facing change? The change behavior is consistent with 3.1.1 and older Spark releases. ### How was this patch tested? N/A Closes #33403 from dongjoon-hyun/SPARK-36193. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit fd3e9ce0b9ee09c7dce9f2e029fe96eac51eab96) Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index a65be54..8124650 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -957,8 +957,8 @@ private[spark] class SparkSubmit extends Logging { case t: Throwable => throw findCause(t) } finally { - if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && -!isThriftServer(args.mainClass)) { + if (args.master.startsWith("k8s") && !isShell(args.primaryResource) && + !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)) { try { SparkContext.getActive.foreach(_.stop()) } catch { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fd3e9ce [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env fd3e9ce is described below commit fd3e9ce0b9ee09c7dce9f2e029fe96eac51eab96 Author: Dongjoon Hyun AuthorDate: Sun Jul 18 22:26:23 2021 -0700 [SPARK-36193][CORE] Recover SparkSubmit.runMain not to stop SparkContext in non-K8s env ### What changes were proposed in this pull request? According to the discussion on https://github.com/apache/spark/pull/32283 , this PR aims to limit the feature of SPARK-34674 to K8s environment only. ### Why are the changes needed? To reduce the behavior change in non-K8s environment. ### Does this PR introduce _any_ user-facing change? The change behavior is consistent with 3.1.1 and older Spark releases. ### How was this patch tested? N/A Closes #33403 from dongjoon-hyun/SPARK-36193. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index a65be54..8124650 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -957,8 +957,8 @@ private[spark] class SparkSubmit extends Logging { case t: Throwable => throw findCause(t) } finally { - if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && -!isThriftServer(args.mainClass)) { + if (args.master.startsWith("k8s") && !isShell(args.primaryResource) && + !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)) { try { SparkContext.getActive.foreach(_.stop()) } catch { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new b93fa15 [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2 b93fa15 is described below commit b93fa15ce2b86c1f4c4b1bda1f612aea947b08c8 Author: William Hyun AuthorDate: Sun Jul 18 22:14:24 2021 -0700 [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2 ### What changes were proposed in this pull request? This PR aims to upgrade scalatest-maven-plugin to version 2.0.2. ### Why are the changes needed? 2.0.2 supports build on JDK 11 officially. - https://github.com/scalatest/scalatest-maven-plugin/commit/f45ce192f313553efc29c201593950e38f419a80 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #33408 from williamhyun/SMP. Authored-by: William Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit df8bae0689d93ece72a271ed8a3b0243ac77dca2) Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 3b32207..3054401 100644 --- a/pom.xml +++ b/pom.xml @@ -162,7 +162,7 @@ 3.2.2 2.12.14 2.12 -2.0.0 +2.0.2 --test true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new df8bae0 [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2 df8bae0 is described below commit df8bae0689d93ece72a271ed8a3b0243ac77dca2 Author: William Hyun AuthorDate: Sun Jul 18 22:14:24 2021 -0700 [SPARK-36199][BUILD] Bump scalatest-maven-plugin to 2.0.2 ### What changes were proposed in this pull request? This PR aims to upgrade scalatest-maven-plugin to version 2.0.2. ### Why are the changes needed? 2.0.2 supports build on JDK 11 officially. - https://github.com/scalatest/scalatest-maven-plugin/commit/f45ce192f313553efc29c201593950e38f419a80 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #33408 from williamhyun/SMP. Authored-by: William Hyun Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 59a03bc..1461f31 100644 --- a/pom.xml +++ b/pom.xml @@ -162,7 +162,7 @@ 3.2.2 2.12.14 2.12 -2.0.0 +2.0.2 --test true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-35810][PYTHON] Deprecate ps.broadcast API
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 80a9644 [SPARK-35810][PYTHON] Deprecate ps.broadcast API 80a9644 is described below commit 80a96443725a32053220409051d1937035141e40 Author: itholic AuthorDate: Mon Jul 19 10:44:59 2021 +0900 [SPARK-35810][PYTHON] Deprecate ps.broadcast API ### What changes were proposed in this pull request? The `broadcast` functions in `pyspark.pandas` is duplicated to `DataFrame.spark.hint` with `"broadcast"`. ```python # The below 2 lines are the same df.spark.hint("broadcast") ps.broadcast(df) ``` So, we should remove `broadcast` in the future, and show deprecation warning for now. ### Why are the changes needed? For deduplication of functions ### Does this PR introduce _any_ user-facing change? They see the deprecation warning when using `broadcast` in `pyspark.pandas`. ```python >>> ps.broadcast(df) FutureWarning: `broadcast` has been deprecated and will be removed in a future version. use `DataFrame.spark.hint` with 'broadcast' for `name` parameter instead. warnings.warn( ``` ### How was this patch tested? Manually check the warning message and see the build passed. Closes #33379 from itholic/SPARK-35810. Lead-authored-by: itholic Co-authored-by: Hyukjin Kwon Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com> Signed-off-by: Hyukjin Kwon (cherry picked from commit 67e6120a851066f183e41f57cc3b10f2f3704df7) Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/generic.py | 10 ++ python/pyspark/pandas/namespace.py | 8 2 files changed, 18 insertions(+) diff --git a/python/pyspark/pandas/generic.py b/python/pyspark/pandas/generic.py index c60097e..c1009b0 100644 --- a/python/pyspark/pandas/generic.py +++ b/python/pyspark/pandas/generic.py @@ -860,6 +860,11 @@ class Frame(object, metaclass=ABCMeta): ) if num_files is not None: +warnings.warn( +"`num_files` has been deprecated and might be removed in a future version. " +"Use `DataFrame.spark.repartition` instead.", +FutureWarning, +) sdf = sdf.repartition(num_files) builder = sdf.write.mode(mode) @@ -998,6 +1003,11 @@ class Frame(object, metaclass=ABCMeta): sdf = psdf.to_spark(index_col=index_col) # type: ignore if num_files is not None: +warnings.warn( +"`num_files` has been deprecated and might be removed in a future version. " +"Use `DataFrame.spark.repartition` instead.", +FutureWarning, +) sdf = sdf.repartition(num_files) builder = sdf.write.mode(mode) diff --git a/python/pyspark/pandas/namespace.py b/python/pyspark/pandas/namespace.py index a46926d..9af91cb 100644 --- a/python/pyspark/pandas/namespace.py +++ b/python/pyspark/pandas/namespace.py @@ -39,6 +39,7 @@ from distutils.version import LooseVersion from functools import reduce from io import BytesIO import json +import warnings import numpy as np import pandas as pd @@ -2822,6 +2823,8 @@ def broadcast(obj: DataFrame) -> DataFrame: """ Marks a DataFrame as small enough for use in broadcast joins. +.. deprecated:: 3.2.0 +Use :func:`DataFrame.spark.hint` instead. Parameters -- obj : DataFrame @@ -2852,6 +2855,11 @@ def broadcast(obj: DataFrame) -> DataFrame: ...BroadcastHashJoin... ... """ +warnings.warn( +"`broadcast` has been deprecated and might be removed in a future version. " +"Use `DataFrame.spark.hint` with 'broadcast' for `name` parameter instead.", +FutureWarning, +) if not isinstance(obj, DataFrame): raise TypeError("Invalid type : expected DataFrame got {}".format(type(obj).__name__)) return DataFrame( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35810][PYTHON] Deprecate ps.broadcast API
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 67e6120 [SPARK-35810][PYTHON] Deprecate ps.broadcast API 67e6120 is described below commit 67e6120a851066f183e41f57cc3b10f2f3704df7 Author: itholic AuthorDate: Mon Jul 19 10:44:59 2021 +0900 [SPARK-35810][PYTHON] Deprecate ps.broadcast API ### What changes were proposed in this pull request? The `broadcast` functions in `pyspark.pandas` is duplicated to `DataFrame.spark.hint` with `"broadcast"`. ```python # The below 2 lines are the same df.spark.hint("broadcast") ps.broadcast(df) ``` So, we should remove `broadcast` in the future, and show deprecation warning for now. ### Why are the changes needed? For deduplication of functions ### Does this PR introduce _any_ user-facing change? They see the deprecation warning when using `broadcast` in `pyspark.pandas`. ```python >>> ps.broadcast(df) FutureWarning: `broadcast` has been deprecated and will be removed in a future version. use `DataFrame.spark.hint` with 'broadcast' for `name` parameter instead. warnings.warn( ``` ### How was this patch tested? Manually check the warning message and see the build passed. Closes #33379 from itholic/SPARK-35810. Lead-authored-by: itholic Co-authored-by: Hyukjin Kwon Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com> Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/generic.py | 10 ++ python/pyspark/pandas/namespace.py | 8 2 files changed, 18 insertions(+) diff --git a/python/pyspark/pandas/generic.py b/python/pyspark/pandas/generic.py index c60097e..c1009b0 100644 --- a/python/pyspark/pandas/generic.py +++ b/python/pyspark/pandas/generic.py @@ -860,6 +860,11 @@ class Frame(object, metaclass=ABCMeta): ) if num_files is not None: +warnings.warn( +"`num_files` has been deprecated and might be removed in a future version. " +"Use `DataFrame.spark.repartition` instead.", +FutureWarning, +) sdf = sdf.repartition(num_files) builder = sdf.write.mode(mode) @@ -998,6 +1003,11 @@ class Frame(object, metaclass=ABCMeta): sdf = psdf.to_spark(index_col=index_col) # type: ignore if num_files is not None: +warnings.warn( +"`num_files` has been deprecated and might be removed in a future version. " +"Use `DataFrame.spark.repartition` instead.", +FutureWarning, +) sdf = sdf.repartition(num_files) builder = sdf.write.mode(mode) diff --git a/python/pyspark/pandas/namespace.py b/python/pyspark/pandas/namespace.py index a46926d..9af91cb 100644 --- a/python/pyspark/pandas/namespace.py +++ b/python/pyspark/pandas/namespace.py @@ -39,6 +39,7 @@ from distutils.version import LooseVersion from functools import reduce from io import BytesIO import json +import warnings import numpy as np import pandas as pd @@ -2822,6 +2823,8 @@ def broadcast(obj: DataFrame) -> DataFrame: """ Marks a DataFrame as small enough for use in broadcast joins. +.. deprecated:: 3.2.0 +Use :func:`DataFrame.spark.hint` instead. Parameters -- obj : DataFrame @@ -2852,6 +2855,11 @@ def broadcast(obj: DataFrame) -> DataFrame: ...BroadcastHashJoin... ... """ +warnings.warn( +"`broadcast` has been deprecated and might be removed in a future version. " +"Use `DataFrame.spark.hint` with 'broadcast' for `name` parameter instead.", +FutureWarning, +) if not isinstance(obj, DataFrame): raise TypeError("Invalid type : expected DataFrame got {}".format(type(obj).__name__)) return DataFrame( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36198][TESTS] Skip UNIDOC generation in PySpark GHA job
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new d5cec45 [SPARK-36198][TESTS] Skip UNIDOC generation in PySpark GHA job d5cec45 is described below commit d5cec45c0b0feaf2dd6014cf82bf0d7d25f5ac87 Author: William Hyun AuthorDate: Sun Jul 18 17:52:28 2021 -0700 [SPARK-36198][TESTS] Skip UNIDOC generation in PySpark GHA job ### What changes were proposed in this pull request? This PR aims to skip UNIDOC generation in PySpark GHA job. ### Why are the changes needed? PySpark GHA jobs do not need to generate Java/Scala doc. This will save about 13 minutes in total. -https://github.com/apache/spark/runs/3098268973?check_suite_focus=true ``` ... Building Unidoc API Documentation [info] Building Spark unidoc using SBT with these arguments: -Phadoop-3.2 -Phive-2.3 -Pscala-2.12 -Phive-thriftserver -Pmesos -Pdocker-integration-tests -Phive -Pkinesis-asl -Pspark-ganglia-lgpl -Pkubernetes -Phadoop-cloud -Pyarn unidoc ... [info] Main Java API documentation successful. [success] Total time: 192 s (03:12), completed Jul 18, 2021 6:08:40 PM ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GHA. Closes #33407 from williamhyun/SKIP_UNIDOC. Authored-by: William Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit c336f73ccddc1d163caa0a619919f3bbc9bf34ab) Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 1 + dev/run-tests.py | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 62f37d3..66a0eda 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -170,6 +170,7 @@ jobs: HIVE_PROFILE: hive2.3 GITHUB_PREV_SHA: ${{ github.event.before }} SPARK_LOCAL_IP: localhost + SKIP_UNIDOC: true steps: - name: Checkout Spark repository uses: actions/checkout@v2 diff --git a/dev/run-tests.py b/dev/run-tests.py index 3055dcc..97523e7 100755 --- a/dev/run-tests.py +++ b/dev/run-tests.py @@ -397,7 +397,7 @@ def build_spark_assembly_sbt(extra_profiles, checkstyle=False): if checkstyle: run_java_style_checks(build_profiles) -if not os.environ.get("AMPLAB_JENKINS"): +if not os.environ.get("AMPLAB_JENKINS") and not os.environ.get("SKIP_UNIDOC"): build_spark_unidoc_sbt(extra_profiles) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f85855c -> c336f73)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f85855c [SPARK-36075][K8S] Support for specifiying executor/driver node selector add c336f73 [SPARK-36198][TESTS] Skip UNIDOC generation in PySpark GHA job No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 1 + dev/run-tests.py | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a9e2156 -> f85855c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a9e2156 [SPARK-35460][K8S] verify the content of`spark.kubernetes.executor.podNamePrefix` before post it to k8s api-server add f85855c [SPARK-36075][K8S] Support for specifiying executor/driver node selector No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 22 + .../scala/org/apache/spark/deploy/k8s/Config.scala | 4 .../apache/spark/deploy/k8s/KubernetesConf.scala | 6 + .../k8s/features/BasicDriverFeatureStep.scala | 1 + .../k8s/features/BasicExecutorFeatureStep.scala| 1 + .../spark/deploy/k8s/KubernetesConfSuite.scala | 28 ++ .../k8s/features/BasicDriverFeatureStepSuite.scala | 15 .../features/BasicExecutorFeatureStepSuite.scala | 17 + 8 files changed, 94 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fe94bf0 -> a9e2156)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fe94bf0 [SPARK-36014][K8S] Use uuid as app id in kubernetes client mode add a9e2156 [SPARK-35460][K8S] verify the content of`spark.kubernetes.executor.podNamePrefix` before post it to k8s api-server No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 5 +- .../scala/org/apache/spark/deploy/k8s/Config.scala | 25 +- .../k8s/features/BasicExecutorFeatureStep.scala| 10 ++-- .../k8s/features/DriverServiceFeatureStep.scala| 3 +- .../deploy/k8s/submit/KubernetesClientUtils.scala | 6 ++- .../features/BasicExecutorFeatureStepSuite.scala | 56 -- 6 files changed, 82 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36014][K8S] Use uuid as app id in kubernetes client mode
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fe94bf0 [SPARK-36014][K8S] Use uuid as app id in kubernetes client mode fe94bf0 is described below commit fe94bf07f9acec302e7d8becd7e576c777337331 Author: ulysses-you AuthorDate: Sun Jul 18 15:41:47 2021 -0700 [SPARK-36014][K8S] Use uuid as app id in kubernetes client mode ### What changes were proposed in this pull request? Use uuid instead of `System. currentTimeMillis` as app id in kubernetes client mode. ### Why are the changes needed? Currently, spark on kubernetes with client mode would use `"spark-application-" + System.currentTimeMillis` as app id by default. It would cause app id conflict if submit several spark applications to kubernetes cluster in a short time. Unfortunately, the event log use app id as the file name. With the conflict event log file, the exception was thrown. ``` Caused by: java.io.FileNotFoundException: File does not exist: xxx/spark-application-1624766876324.lz4.inprogress (inode 5984170846) Holder does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2697) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:521) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:161) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) ``` ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? manual test ![image](https://user-images.githubusercontent.com/12025282/124435341-7a88e180-dda7-11eb-8e62-bdfec6a0ee3b.png) Closes #33211 from ulysses-you/k8s-appid. Authored-by: ulysses-you Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala| 5 - .../spark/deploy/k8s/submit/KubernetesClientApplication.scala | 4 +--- .../scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala | 7 --- 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala index 937c5f5..de084da 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala @@ -16,7 +16,7 @@ */ package org.apache.spark.deploy.k8s -import java.util.Locale +import java.util.{Locale, UUID} import io.fabric8.kubernetes.api.model.{LocalObjectReference, LocalObjectReferenceBuilder, Pod} @@ -225,6 +225,9 @@ private[spark] object KubernetesConf { new KubernetesExecutorConf(sparkConf.clone(), appId, executorId, driverPod, resourceProfileId) } + def getKubernetesAppId(): String = +s"spark-${UUID.randomUUID().toString.replaceAll("-", "")}" + def getResourceNamePrefix(appName: String): String = { val id = KubernetesUtils.uniqueID() s"$appName-$id" diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala index 3140502..e3b80b1 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala @@ -16,8 +16,6 @@ */ package org.apache.spark.deploy.k8s.submit -import java.util.UUID - import scala.collection.JavaConverters._ import scala.collection.mutable import scala.util.control.Breaks._ @@ -191,7
[spark] branch branch-3.1 updated: [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 62f6761 [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version 62f6761 is described below commit 62f6761883f855ec97fbc0c69a7da3b0db7f4170 Author: yoda-mon AuthorDate: Sun Jul 18 14:26:15 2021 -0700 [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version ### What changes were proposed in this pull request? Add reference to kubernetes-client's version ### Why are the changes needed? Running Spark on Kubernetes potentially has upper limitation of Kubernetes version. I think it is better for users to notice it because Kubernetes update speed is so fast that users tends to run Spark Jobs on unsupported version. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? SKIP_API=1 bundle exec jekyll build Closes #33255 from yoda-mon/add-reference-kubernetes-client. Authored-by: yoda-mon Signed-off-by: Dongjoon Hyun (cherry picked from commit eea69c122f20577956c4a87a6d8eb59943c1c6f0) Signed-off-by: Dongjoon Hyun --- docs/running-on-kubernetes.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index b9a018a..125c952 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -51,6 +51,7 @@ you may set up a test cluster on your local machine using * Be aware that the default minikube configuration is not enough for running Spark applications. We recommend 3 CPUs and 4g of memory to be able to start a simple Spark application with a single executor. + * Check [kubernetes-client library](https://github.com/fabric8io/kubernetes-client)'s version of your Spark environment, and its compatibility with your Kubernetes cluster's version. * You must have appropriate permissions to list, create, edit and delete [pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources by running `kubectl auth can-i pods`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (92d4563 -> eea69c1)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92d4563 [MINOR][SQL] Fix typo for config hint in SQLConf.scala add eea69c1 [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 46ddb17 [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version 46ddb17 is described below commit 46ddb17da4673beb9edeef1886868eadd78cd883 Author: yoda-mon AuthorDate: Sun Jul 18 14:26:15 2021 -0700 [SPARK-36040][DOCS][K8S] Add reference to kubernetes-client's version ### What changes were proposed in this pull request? Add reference to kubernetes-client's version ### Why are the changes needed? Running Spark on Kubernetes potentially has upper limitation of Kubernetes version. I think it is better for users to notice it because Kubernetes update speed is so fast that users tends to run Spark Jobs on unsupported version. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? SKIP_API=1 bundle exec jekyll build Closes #33255 from yoda-mon/add-reference-kubernetes-client. Authored-by: yoda-mon Signed-off-by: Dongjoon Hyun (cherry picked from commit eea69c122f20577956c4a87a6d8eb59943c1c6f0) Signed-off-by: Dongjoon Hyun --- docs/running-on-kubernetes.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 530951e..6ca3375 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -53,6 +53,7 @@ you may set up a test cluster on your local machine using * Be aware that the default minikube configuration is not enough for running Spark applications. We recommend 3 CPUs and 4g of memory to be able to start a simple Spark application with a single executor. + * Check [kubernetes-client library](https://github.com/fabric8io/kubernetes-client)'s version of your Spark environment, and its compatibility with your Kubernetes cluster's version. * You must have appropriate permissions to list, create, edit and delete [pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources by running `kubectl auth can-i pods`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][SQL] Fix typo for config hint in SQLConf.scala
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 92d4563 [MINOR][SQL] Fix typo for config hint in SQLConf.scala 92d4563 is described below commit 92d45631246e206bdc11f702972306b59f5beb15 Author: Bessenyei Balázs Donát <9086834+bes...@users.noreply.github.com> AuthorDate: Sun Jul 18 15:33:26 2021 -0500 [MINOR][SQL] Fix typo for config hint in SQLConf.scala ### What changes were proposed in this pull request? This PR fixes typo for `spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation` in `SQLConf.scala`. ### Why are the changes needed? This is a [Broken windows theory](https://en.wikipedia.org/wiki/Broken_windows_theory) change. ### Does this PR introduce _any_ user-facing change? Yes, after merging this PR, the error message for commands such as ```python spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation", "true") ``` , users will get a typo-free exception. ### How was this patch tested? This is a trivial change. Closes #33389 from bessbd/patch-1. Authored-by: Bessenyei Balázs Donát <9086834+bes...@users.noreply.github.com> Signed-off-by: Sean Owen --- sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index b9663bb..0add7f5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -3437,7 +3437,7 @@ object SQLConf { "It was removed to prevent errors like SPARK-23173 for non-default value."), RemovedConfig( "spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation", "3.0.0", "false", -"It was removed to prevent loosing of users data for non-default value."), +"It was removed to prevent loss of user data for non-default value."), RemovedConfig("spark.sql.legacy.compareDateTimestampInTimestamp", "3.0.0", "true", "It was removed to prevent errors like SPARK-23549 for non-default value."), RemovedConfig("spark.sql.parquet.int64AsTimestampMillis", "3.0.0", "false", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36090][SQL] Support TimestampNTZType in expression Sequence
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 85f70a1 [SPARK-36090][SQL] Support TimestampNTZType in expression Sequence 85f70a1 is described below commit 85f70a1181b1b11417c197cee411e0ec9ced2373 Author: gengjiaan AuthorDate: Sun Jul 18 20:46:23 2021 +0300 [SPARK-36090][SQL] Support TimestampNTZType in expression Sequence ### What changes were proposed in this pull request? The current implement of `Sequence` accept `TimestampType`, `DateType` and `IntegralType`. This PR will let `Sequence` accepts `TimestampNTZType`. ### Why are the changes needed? We can generate sequence for timestamp without time zone. ### Does this PR introduce _any_ user-facing change? 'Yes'. This PR will let `Sequence` accepts `TimestampNTZType`. ### How was this patch tested? New tests. Closes #33360 from beliefer/SPARK-36090. Lead-authored-by: gengjiaan Co-authored-by: Jiaan Geng Signed-off-by: Max Gekk (cherry picked from commit 42275bb20d6849ee9df488d9ec1fa402f394ac89) Signed-off-by: Max Gekk --- .../expressions/collectionOperations.scala | 48 +--- .../spark/sql/catalyst/util/DateTimeUtils.scala| 21 +++- .../expressions/CollectionExpressionsSuite.scala | 122 - 3 files changed, 172 insertions(+), 19 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 2883d8d..730b8d0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -2568,7 +2568,7 @@ case class Sequence( val typesCorrect = startType.sameType(stop.dataType) && (startType match { - case TimestampType => + case TimestampType | TimestampNTZType => stepOpt.isEmpty || CalendarIntervalType.acceptsType(stepType) || YearMonthIntervalType.acceptsType(stepType) || DayTimeIntervalType.acceptsType(stepType) @@ -2614,20 +2614,20 @@ case class Sequence( val ct = ClassTag[T](iType.tag.mirror.runtimeClass(iType.tag.tpe)) new IntegralSequenceImpl(iType)(ct, iType.integral) -case TimestampType => +case TimestampType | TimestampNTZType => if (stepOpt.isEmpty || CalendarIntervalType.acceptsType(stepOpt.get.dataType)) { -new TemporalSequenceImpl[Long](LongType, 1, identity, zoneId) +new TemporalSequenceImpl[Long](LongType, start.dataType, 1, identity, zoneId) } else if (YearMonthIntervalType.acceptsType(stepOpt.get.dataType)) { -new PeriodSequenceImpl[Long](LongType, 1, identity, zoneId) +new PeriodSequenceImpl[Long](LongType, start.dataType, 1, identity, zoneId) } else { -new DurationSequenceImpl[Long](LongType, 1, identity, zoneId) +new DurationSequenceImpl[Long](LongType, start.dataType, 1, identity, zoneId) } case DateType => if (stepOpt.isEmpty || CalendarIntervalType.acceptsType(stepOpt.get.dataType)) { -new TemporalSequenceImpl[Int](IntegerType, MICROS_PER_DAY, _.toInt, zoneId) +new TemporalSequenceImpl[Int](IntegerType, start.dataType, MICROS_PER_DAY, _.toInt, zoneId) } else { -new PeriodSequenceImpl[Int](IntegerType, MICROS_PER_DAY, _.toInt, zoneId) +new PeriodSequenceImpl[Int](IntegerType, start.dataType, MICROS_PER_DAY, _.toInt, zoneId) } } @@ -2769,8 +2769,9 @@ object Sequence { } private class PeriodSequenceImpl[T: ClassTag] - (dt: IntegralType, scale: Long, fromLong: Long => T, zoneId: ZoneId) - (implicit num: Integral[T]) extends InternalSequenceBase(dt, scale, fromLong, zoneId) { + (dt: IntegralType, outerDataType: DataType, scale: Long, fromLong: Long => T, zoneId: ZoneId) + (implicit num: Integral[T]) +extends InternalSequenceBase(dt, outerDataType, scale, fromLong, zoneId) { override val defaultStep: DefaultStep = new DefaultStep( (dt.ordering.lteq _).asInstanceOf[LessThanOrEqualFn], @@ -2794,8 +2795,9 @@ object Sequence { } private class DurationSequenceImpl[T: ClassTag] - (dt: IntegralType, scale: Long, fromLong: Long => T, zoneId: ZoneId) - (implicit num: Integral[T]) extends InternalSequenceBase(dt, scale, fromLong, zoneId) { + (dt: IntegralType, outerDataType: DataType, scale: Long, fromLong: Long => T, zoneId: ZoneId) + (implicit num: Integral[T]) +extends InternalSequenceBase(dt, outerDataType, scale, fromLong, zoneId) { override val
[spark] branch master updated (d7df7a8 -> 42275bb)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d7df7a8 [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g add 42275bb [SPARK-36090][SQL] Support TimestampNTZType in expression Sequence No new revisions were added by this update. Summary of changes: .../expressions/collectionOperations.scala | 48 +--- .../spark/sql/catalyst/util/DateTimeUtils.scala| 21 +++- .../expressions/CollectionExpressionsSuite.scala | 122 - 3 files changed, 172 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 8a7fa43 [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g 8a7fa43 is described below commit 8a7fa439fad5fd13b29fe919ce178908cbbe816c Author: Dongjoon Hyun AuthorDate: Sun Jul 18 10:15:15 2021 -0700 [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g This PR aims to set `MaxMetaspaceSize` to `2g` because it's increasing the native memory consumption unlimitedly by default. The unlimited increasing memory causes GitHub Action flakiness. The value I observed during `hive` module test was over 1.8G and growing. - https://docs.oracle.com/javase/10/gctuning/other-considerations.htm#JSGCT-GUID-BFB89453-60C0-42AC-81CA-87D59B0ACE2E > Starting with JDK 8, the permanent generation was removed and the class metadata is allocated in native memory. The amount of native memory that can be used for class metadata is by default unlimited. Use the option -XX:MaxMetaspaceSize to put an upper limit on the amount of native memory used for class metadata. In addition, I increased the following memory limit to 4g consistently from two places. ```xml - -Xms2048m - -Xmx2048m + -Xms4g + -Xmx4g ``` ```scala - javaOptions += "-Xmx3g", + javaOptions ++= "-Xmx4g -XX:MaxMetaspaceSize=2g".split(" ").toSeq, ``` This will reduce the flakiness in CI environment by limiting the memory usage explicitly. When we limit it with `1g`, Hive module fails with `OOM` like the following. ``` java.lang.OutOfMemoryError: Metaspace Error: Exception in thread "dispatcher-event-loop-110" java.lang.OutOfMemoryError: Metaspace ``` No. Pass the CIs. Closes #33405 from dongjoon-hyun/SPARK-36195. Lead-authored-by: Dongjoon Hyun Co-authored-by: Kyle Bendickson Signed-off-by: Dongjoon Hyun (cherry picked from commit d7df7a805fcbdf2435df1e78abd9899d3ca10dd2) Signed-off-by: Dongjoon Hyun --- pom.xml | 5 +++-- project/SparkBuild.scala | 4 ++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/pom.xml b/pom.xml index a7e3a73..1fb7c5a 100644 --- a/pom.xml +++ b/pom.xml @@ -2517,6 +2517,7 @@ -Xms1024m -Xmx1024m + -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} @@ -2565,7 +2566,7 @@ **/*Suite.java ${project.build.directory}/surefire-reports --ea -Xmx4g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true +-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true
[spark] branch branch-3.2 updated: [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 8059a7e [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g 8059a7e is described below commit 8059a7e5e6726e7ca1401416be90b92c305c5060 Author: Dongjoon Hyun AuthorDate: Sun Jul 18 10:15:15 2021 -0700 [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g ### What changes were proposed in this pull request? This PR aims to set `MaxMetaspaceSize` to `2g` because it's increasing the native memory consumption unlimitedly by default. The unlimited increasing memory causes GitHub Action flakiness. The value I observed during `hive` module test was over 1.8G and growing. - https://docs.oracle.com/javase/10/gctuning/other-considerations.htm#JSGCT-GUID-BFB89453-60C0-42AC-81CA-87D59B0ACE2E > Starting with JDK 8, the permanent generation was removed and the class metadata is allocated in native memory. The amount of native memory that can be used for class metadata is by default unlimited. Use the option -XX:MaxMetaspaceSize to put an upper limit on the amount of native memory used for class metadata. In addition, I increased the following memory limit to 4g consistently from two places. ```xml - -Xms2048m - -Xmx2048m + -Xms4g + -Xmx4g ``` ```scala - javaOptions += "-Xmx3g", + javaOptions ++= "-Xmx4g -XX:MaxMetaspaceSize=2g".split(" ").toSeq, ``` ### Why are the changes needed? This will reduce the flakiness in CI environment by limiting the memory usage explicitly. When we limit it with `1g`, Hive module fails with `OOM` like the following. ``` java.lang.OutOfMemoryError: Metaspace Error: Exception in thread "dispatcher-event-loop-110" java.lang.OutOfMemoryError: Metaspace ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #33405 from dongjoon-hyun/SPARK-36195. Lead-authored-by: Dongjoon Hyun Co-authored-by: Kyle Bendickson Signed-off-by: Dongjoon Hyun (cherry picked from commit d7df7a805fcbdf2435df1e78abd9899d3ca10dd2) Signed-off-by: Dongjoon Hyun --- pom.xml | 9 + project/SparkBuild.scala | 4 ++-- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/pom.xml b/pom.xml index a49894e..3b32207 100644 --- a/pom.xml +++ b/pom.xml @@ -2611,8 +2611,9 @@ -Xss128m - -Xms2048m - -Xmx2048m + -Xms4g + -Xmx4g + -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} @@ -2661,7 +2662,7 @@ **/*Suite.java ${project.build.directory}/surefire-reports --ea -Xmx4g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true +-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true
[spark] branch master updated: [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d7df7a8 [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g d7df7a8 is described below commit d7df7a805fcbdf2435df1e78abd9899d3ca10dd2 Author: Dongjoon Hyun AuthorDate: Sun Jul 18 10:15:15 2021 -0700 [SPARK-36195][BUILD] Set MaxMetaspaceSize JVM option to 2g ### What changes were proposed in this pull request? This PR aims to set `MaxMetaspaceSize` to `2g` because it's increasing the native memory consumption unlimitedly by default. The unlimited increasing memory causes GitHub Action flakiness. The value I observed during `hive` module test was over 1.8G and growing. - https://docs.oracle.com/javase/10/gctuning/other-considerations.htm#JSGCT-GUID-BFB89453-60C0-42AC-81CA-87D59B0ACE2E > Starting with JDK 8, the permanent generation was removed and the class metadata is allocated in native memory. The amount of native memory that can be used for class metadata is by default unlimited. Use the option -XX:MaxMetaspaceSize to put an upper limit on the amount of native memory used for class metadata. In addition, I increased the following memory limit to 4g consistently from two places. ```xml - -Xms2048m - -Xmx2048m + -Xms4g + -Xmx4g ``` ```scala - javaOptions += "-Xmx3g", + javaOptions ++= "-Xmx4g -XX:MaxMetaspaceSize=2g".split(" ").toSeq, ``` ### Why are the changes needed? This will reduce the flakiness in CI environment by limiting the memory usage explicitly. When we limit it with `1g`, Hive module fails with `OOM` like the following. ``` java.lang.OutOfMemoryError: Metaspace Error: Exception in thread "dispatcher-event-loop-110" java.lang.OutOfMemoryError: Metaspace ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #33405 from dongjoon-hyun/SPARK-36195. Lead-authored-by: Dongjoon Hyun Co-authored-by: Kyle Bendickson Signed-off-by: Dongjoon Hyun --- pom.xml | 9 + project/SparkBuild.scala | 4 ++-- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/pom.xml b/pom.xml index aa69c0e..59a03bc 100644 --- a/pom.xml +++ b/pom.xml @@ -2611,8 +2611,9 @@ -Xss128m - -Xms2048m - -Xmx2048m + -Xms4g + -Xmx4g + -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} @@ -2661,7 +2662,7 @@ **/*Suite.java ${project.build.directory}/surefire-reports --ea -Xmx4g -Xss4m -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true +-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true