[spark] branch master updated (89d9b7c -> 81b0785)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode add 81b0785 [SPARK-32455][ML] LogisticRegressionModel prediction optimization No new revisions were added by this update. Summary of changes: .../ml/classification/LogisticRegression.scala | 89 -- 1 file changed, 49 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (89d9b7c -> 81b0785)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode add 81b0785 [SPARK-32455][ML] LogisticRegressionModel prediction optimization No new revisions were added by this update. Summary of changes: .../ml/classification/LogisticRegression.scala | 89 -- 1 file changed, 49 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (89d9b7c -> 81b0785)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode add 81b0785 [SPARK-32455][ML] LogisticRegressionModel prediction optimization No new revisions were added by this update. Summary of changes: .../ml/classification/LogisticRegression.scala | 89 -- 1 file changed, 49 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (89d9b7c -> 81b0785)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode add 81b0785 [SPARK-32455][ML] LogisticRegressionModel prediction optimization No new revisions were added by this update. Summary of changes: .../ml/classification/LogisticRegression.scala | 89 -- 1 file changed, 49 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (89d9b7c -> 81b0785)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode add 81b0785 [SPARK-32455][ML] LogisticRegressionModel prediction optimization No new revisions were added by this update. Summary of changes: .../ml/classification/LogisticRegression.scala | 89 -- 1 file changed, 49 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08a66f8 -> 89d9b7c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08a66f8 [SPARK-32248][BUILD] Recover Java 11 build in Github Actions add 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode No new revisions were added by this update. Summary of changes: docs/job-scheduling.md | 8 ++--- python/pyspark/__init__.py | 5 ++- python/pyspark/context.py | 18 ++ python/pyspark/rdd.py | 10 -- python/pyspark/tests/test_pin_thread.py | 23 - python/pyspark/util.py | 61 + 6 files changed, 110 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08a66f8 -> 89d9b7c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08a66f8 [SPARK-32248][BUILD] Recover Java 11 build in Github Actions add 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode No new revisions were added by this update. Summary of changes: docs/job-scheduling.md | 8 ++--- python/pyspark/__init__.py | 5 ++- python/pyspark/context.py | 18 ++ python/pyspark/rdd.py | 10 -- python/pyspark/tests/test_pin_thread.py | 23 - python/pyspark/util.py | 61 + 6 files changed, 110 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08a66f8 -> 89d9b7c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08a66f8 [SPARK-32248][BUILD] Recover Java 11 build in Github Actions add 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode No new revisions were added by this update. Summary of changes: docs/job-scheduling.md | 8 ++--- python/pyspark/__init__.py | 5 ++- python/pyspark/context.py | 18 ++ python/pyspark/rdd.py | 10 -- python/pyspark/tests/test_pin_thread.py | 23 - python/pyspark/util.py | 61 + 6 files changed, 110 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08a66f8 -> 89d9b7c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08a66f8 [SPARK-32248][BUILD] Recover Java 11 build in Github Actions add 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode No new revisions were added by this update. Summary of changes: docs/job-scheduling.md | 8 ++--- python/pyspark/__init__.py | 5 ++- python/pyspark/context.py | 18 ++ python/pyspark/rdd.py | 10 -- python/pyspark/tests/test_pin_thread.py | 23 - python/pyspark/util.py | 61 + 6 files changed, 110 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08a66f8 -> 89d9b7c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08a66f8 [SPARK-32248][BUILD] Recover Java 11 build in Github Actions add 89d9b7c [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode No new revisions were added by this update. Summary of changes: docs/job-scheduling.md | 8 ++--- python/pyspark/__init__.py | 5 ++- python/pyspark/context.py | 18 ++ python/pyspark/rdd.py | 10 -- python/pyspark/tests/test_pin_thread.py | 23 - python/pyspark/util.py | 61 + 6 files changed, 110 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1638674 -> 08a66f8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1638674 [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource add 08a66f8 [SPARK-32248][BUILD] Recover Java 11 build in Github Actions No new revisions were added by this update. Summary of changes: .github/workflows/master.yml | 25 + 1 file changed, 25 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1638674 -> 08a66f8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1638674 [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource add 08a66f8 [SPARK-32248][BUILD] Recover Java 11 build in Github Actions No new revisions were added by this update. Summary of changes: .github/workflows/master.yml | 25 + 1 file changed, 25 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1638674 -> 08a66f8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1638674 [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource add 08a66f8 [SPARK-32248][BUILD] Recover Java 11 build in Github Actions No new revisions were added by this update. Summary of changes: .github/workflows/master.yml | 25 + 1 file changed, 25 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1638674 -> 08a66f8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1638674 [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource add 08a66f8 [SPARK-32248][BUILD] Recover Java 11 build in Github Actions No new revisions were added by this update. Summary of changes: .github/workflows/master.yml | 25 + 1 file changed, 25 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1638674 -> 08a66f8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1638674 [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource add 08a66f8 [SPARK-32248][BUILD] Recover Java 11 build in Github Actions No new revisions were added by this update. Summary of changes: .github/workflows/master.yml | 25 + 1 file changed, 25 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (50911df -> 1638674)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 50911df [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules add 1638674 [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/status/api/v1/OneApplicationResource.scala | 2 +- core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (50911df -> 1638674)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 50911df [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules add 1638674 [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/status/api/v1/OneApplicationResource.scala | 2 +- core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (50911df -> 1638674)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 50911df [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules add 1638674 [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/status/api/v1/OneApplicationResource.scala | 2 +- core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (50911df -> 1638674)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 50911df [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules add 1638674 [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/status/api/v1/OneApplicationResource.scala | 2 +- core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (50911df -> 1638674)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 50911df [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules add 1638674 [SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in [Stages|OneApplication]Resource No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/status/api/v1/OneApplicationResource.scala | 2 +- core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 4a8f692 [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules 4a8f692 is described below commit 4a8f692f40a53ebf34292dfc28d6a0f95515166c Author: Holden Karau AuthorDate: Wed Jul 29 21:39:14 2020 + [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau Signed-off-by: DB Tsai (cherry picked from commit 50911df08eb7a27494dc83bcec3d09701c2babfe) Signed-off-by: DB Tsai --- pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/pom.xml b/pom.xml index 03a08a1..eccd45a 100644 --- a/pom.xml +++ b/pom.xml @@ -228,6 +228,8 @@ ${session.executionRootDirectory} 1g + +-MM-dd HH:mm:ss z @@ -2107,11 +2109,39 @@ - - org.codehaus.mojo - build-helper-maven-plugin - 3.0.0 - + + org.codehaus.mojo + build-helper-maven-plugin + 3.2.0 + + + module-timestamp-property + validate + + timestamp-property + + + module.build.timestamp + ${maven.build.timestamp.format} + current + America/Los_Angeles + + + + local-timestamp-property + validate + + timestamp-property + + + local.build.timestamp + ${maven.build.timestamp.format} + build + America/Los_Angeles + + + + net.alchim31.maven scala-maven-plugin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d00e104 [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules d00e104 is described below commit d00e1040010ac82df7e41e88314a6162b016f807 Author: Holden Karau AuthorDate: Wed Jul 29 21:39:14 2020 + [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau Signed-off-by: DB Tsai (cherry picked from commit 50911df08eb7a27494dc83bcec3d09701c2babfe) Signed-off-by: DB Tsai --- pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/pom.xml b/pom.xml index 6c67ce4..86cdd1d 100644 --- a/pom.xml +++ b/pom.xml @@ -251,6 +251,8 @@ 1g + +-MM-dd HH:mm:ss z @@ -2344,11 +2346,39 @@ - - org.codehaus.mojo - build-helper-maven-plugin - 3.0.0 - + + org.codehaus.mojo + build-helper-maven-plugin + 3.2.0 + + + module-timestamp-property + validate + + timestamp-property + + + module.build.timestamp + ${maven.build.timestamp.format} + current + America/Los_Angeles + + + + local-timestamp-property + validate + + timestamp-property + + + local.build.timestamp + ${maven.build.timestamp.format} + build + America/Los_Angeles + + + + net.alchim31.maven scala-maven-plugin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 4a8f692 [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules 4a8f692 is described below commit 4a8f692f40a53ebf34292dfc28d6a0f95515166c Author: Holden Karau AuthorDate: Wed Jul 29 21:39:14 2020 + [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau Signed-off-by: DB Tsai (cherry picked from commit 50911df08eb7a27494dc83bcec3d09701c2babfe) Signed-off-by: DB Tsai --- pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/pom.xml b/pom.xml index 03a08a1..eccd45a 100644 --- a/pom.xml +++ b/pom.xml @@ -228,6 +228,8 @@ ${session.executionRootDirectory} 1g + +-MM-dd HH:mm:ss z @@ -2107,11 +2109,39 @@ - - org.codehaus.mojo - build-helper-maven-plugin - 3.0.0 - + + org.codehaus.mojo + build-helper-maven-plugin + 3.2.0 + + + module-timestamp-property + validate + + timestamp-property + + + module.build.timestamp + ${maven.build.timestamp.format} + current + America/Los_Angeles + + + + local-timestamp-property + validate + + timestamp-property + + + local.build.timestamp + ${maven.build.timestamp.format} + build + America/Los_Angeles + + + + net.alchim31.maven scala-maven-plugin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d00e104 [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules d00e104 is described below commit d00e1040010ac82df7e41e88314a6162b016f807 Author: Holden Karau AuthorDate: Wed Jul 29 21:39:14 2020 + [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau Signed-off-by: DB Tsai (cherry picked from commit 50911df08eb7a27494dc83bcec3d09701c2babfe) Signed-off-by: DB Tsai --- pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/pom.xml b/pom.xml index 6c67ce4..86cdd1d 100644 --- a/pom.xml +++ b/pom.xml @@ -251,6 +251,8 @@ 1g + +-MM-dd HH:mm:ss z @@ -2344,11 +2346,39 @@ - - org.codehaus.mojo - build-helper-maven-plugin - 3.0.0 - + + org.codehaus.mojo + build-helper-maven-plugin + 3.2.0 + + + module-timestamp-property + validate + + timestamp-property + + + module.build.timestamp + ${maven.build.timestamp.format} + current + America/Los_Angeles + + + + local-timestamp-property + validate + + timestamp-property + + + local.build.timestamp + ${maven.build.timestamp.format} + build + America/Los_Angeles + + + + net.alchim31.maven scala-maven-plugin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a025a89 -> 50911df)
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a025a89 [SPARK-32332][SQL] Support columnar exchanges add 50911df [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules No new revisions were added by this update. Summary of changes: pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 4a8f692 [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules 4a8f692 is described below commit 4a8f692f40a53ebf34292dfc28d6a0f95515166c Author: Holden Karau AuthorDate: Wed Jul 29 21:39:14 2020 + [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau Signed-off-by: DB Tsai (cherry picked from commit 50911df08eb7a27494dc83bcec3d09701c2babfe) Signed-off-by: DB Tsai --- pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/pom.xml b/pom.xml index 03a08a1..eccd45a 100644 --- a/pom.xml +++ b/pom.xml @@ -228,6 +228,8 @@ ${session.executionRootDirectory} 1g + +-MM-dd HH:mm:ss z @@ -2107,11 +2109,39 @@ - - org.codehaus.mojo - build-helper-maven-plugin - 3.0.0 - + + org.codehaus.mojo + build-helper-maven-plugin + 3.2.0 + + + module-timestamp-property + validate + + timestamp-property + + + module.build.timestamp + ${maven.build.timestamp.format} + current + America/Los_Angeles + + + + local-timestamp-property + validate + + timestamp-property + + + local.build.timestamp + ${maven.build.timestamp.format} + build + America/Los_Angeles + + + + net.alchim31.maven scala-maven-plugin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d00e104 [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules d00e104 is described below commit d00e1040010ac82df7e41e88314a6162b016f807 Author: Holden Karau AuthorDate: Wed Jul 29 21:39:14 2020 + [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau Signed-off-by: DB Tsai (cherry picked from commit 50911df08eb7a27494dc83bcec3d09701c2babfe) Signed-off-by: DB Tsai --- pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/pom.xml b/pom.xml index 6c67ce4..86cdd1d 100644 --- a/pom.xml +++ b/pom.xml @@ -251,6 +251,8 @@ 1g + +-MM-dd HH:mm:ss z @@ -2344,11 +2346,39 @@ - - org.codehaus.mojo - build-helper-maven-plugin - 3.0.0 - + + org.codehaus.mojo + build-helper-maven-plugin + 3.2.0 + + + module-timestamp-property + validate + + timestamp-property + + + module.build.timestamp + ${maven.build.timestamp.format} + current + America/Los_Angeles + + + + local-timestamp-property + validate + + timestamp-property + + + local.build.timestamp + ${maven.build.timestamp.format} + build + America/Los_Angeles + + + + net.alchim31.maven scala-maven-plugin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a025a89 -> 50911df)
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a025a89 [SPARK-32332][SQL] Support columnar exchanges add 50911df [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules No new revisions were added by this update. Summary of changes: pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 4a8f692 [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules 4a8f692 is described below commit 4a8f692f40a53ebf34292dfc28d6a0f95515166c Author: Holden Karau AuthorDate: Wed Jul 29 21:39:14 2020 + [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau Signed-off-by: DB Tsai (cherry picked from commit 50911df08eb7a27494dc83bcec3d09701c2babfe) Signed-off-by: DB Tsai --- pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/pom.xml b/pom.xml index 03a08a1..eccd45a 100644 --- a/pom.xml +++ b/pom.xml @@ -228,6 +228,8 @@ ${session.executionRootDirectory} 1g + +-MM-dd HH:mm:ss z @@ -2107,11 +2109,39 @@ - - org.codehaus.mojo - build-helper-maven-plugin - 3.0.0 - + + org.codehaus.mojo + build-helper-maven-plugin + 3.2.0 + + + module-timestamp-property + validate + + timestamp-property + + + module.build.timestamp + ${maven.build.timestamp.format} + current + America/Los_Angeles + + + + local-timestamp-property + validate + + timestamp-property + + + local.build.timestamp + ${maven.build.timestamp.format} + build + America/Los_Angeles + + + + net.alchim31.maven scala-maven-plugin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d00e104 [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules d00e104 is described below commit d00e1040010ac82df7e41e88314a6162b016f807 Author: Holden Karau AuthorDate: Wed Jul 29 21:39:14 2020 + [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau Signed-off-by: DB Tsai (cherry picked from commit 50911df08eb7a27494dc83bcec3d09701c2babfe) Signed-off-by: DB Tsai --- pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/pom.xml b/pom.xml index 6c67ce4..86cdd1d 100644 --- a/pom.xml +++ b/pom.xml @@ -251,6 +251,8 @@ 1g + +-MM-dd HH:mm:ss z @@ -2344,11 +2346,39 @@ - - org.codehaus.mojo - build-helper-maven-plugin - 3.0.0 - + + org.codehaus.mojo + build-helper-maven-plugin + 3.2.0 + + + module-timestamp-property + validate + + timestamp-property + + + module.build.timestamp + ${maven.build.timestamp.format} + current + America/Los_Angeles + + + + local-timestamp-property + validate + + timestamp-property + + + local.build.timestamp + ${maven.build.timestamp.format} + build + America/Los_Angeles + + + + net.alchim31.maven scala-maven-plugin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a025a89 -> 50911df)
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a025a89 [SPARK-32332][SQL] Support columnar exchanges add 50911df [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules No new revisions were added by this update. Summary of changes: pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 4a8f692 [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules 4a8f692 is described below commit 4a8f692f40a53ebf34292dfc28d6a0f95515166c Author: Holden Karau AuthorDate: Wed Jul 29 21:39:14 2020 + [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau Signed-off-by: DB Tsai (cherry picked from commit 50911df08eb7a27494dc83bcec3d09701c2babfe) Signed-off-by: DB Tsai --- pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/pom.xml b/pom.xml index 03a08a1..eccd45a 100644 --- a/pom.xml +++ b/pom.xml @@ -228,6 +228,8 @@ ${session.executionRootDirectory} 1g + +-MM-dd HH:mm:ss z @@ -2107,11 +2109,39 @@ - - org.codehaus.mojo - build-helper-maven-plugin - 3.0.0 - + + org.codehaus.mojo + build-helper-maven-plugin + 3.2.0 + + + module-timestamp-property + validate + + timestamp-property + + + module.build.timestamp + ${maven.build.timestamp.format} + current + America/Los_Angeles + + + + local-timestamp-property + validate + + timestamp-property + + + local.build.timestamp + ${maven.build.timestamp.format} + build + America/Los_Angeles + + + + net.alchim31.maven scala-maven-plugin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d00e104 [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules d00e104 is described below commit d00e1040010ac82df7e41e88314a6162b016f807 Author: Holden Karau AuthorDate: Wed Jul 29 21:39:14 2020 + [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules ### What changes were proposed in this pull request? Upgrade codehaus maven build helper to allow people to specify a time during the build to avoid snapshot artifacts with different version strings. ### Why are the changes needed? During builds of snapshots the maven may assign different versions to different artifacts based on the time each individual sub-module starts building. The timestamp is used as part of the version string when run `maven deploy` on a snapshot build. This results in different sub-modules having different version strings. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual build while specifying the current time, ensured the time is consistent in the sub components. Open question: Ideally I'd like to backport this as well since it's sort of a bug fix and while it does change a dependency version it's not one that is propagated. I'd like to hear folks thoughts about this. Closes #29274 from holdenk/SPARK-32397-snapshot-artifact-timestamp-differences. Authored-by: Holden Karau Signed-off-by: DB Tsai (cherry picked from commit 50911df08eb7a27494dc83bcec3d09701c2babfe) Signed-off-by: DB Tsai --- pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/pom.xml b/pom.xml index 6c67ce4..86cdd1d 100644 --- a/pom.xml +++ b/pom.xml @@ -251,6 +251,8 @@ 1g + +-MM-dd HH:mm:ss z @@ -2344,11 +2346,39 @@ - - org.codehaus.mojo - build-helper-maven-plugin - 3.0.0 - + + org.codehaus.mojo + build-helper-maven-plugin + 3.2.0 + + + module-timestamp-property + validate + + timestamp-property + + + module.build.timestamp + ${maven.build.timestamp.format} + current + America/Los_Angeles + + + + local-timestamp-property + validate + + timestamp-property + + + local.build.timestamp + ${maven.build.timestamp.format} + build + America/Los_Angeles + + + + net.alchim31.maven scala-maven-plugin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a025a89 -> 50911df)
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a025a89 [SPARK-32332][SQL] Support columnar exchanges add 50911df [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules No new revisions were added by this update. Summary of changes: pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a025a89 -> 50911df)
This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a025a89 [SPARK-32332][SQL] Support columnar exchanges add 50911df [SPARK-32397][BUILD] Allow specifying of time for build to keep time consistent between modules No new revisions were added by this update. Summary of changes: pom.xml | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32332][SQL] Support columnar exchanges
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a025a89 [SPARK-32332][SQL] Support columnar exchanges a025a89 is described below commit a025a89f4ef3a05d7e70c02f03a9826bb97eceac Author: Wenchen Fan AuthorDate: Wed Jul 29 14:21:47 2020 -0500 [SPARK-32332][SQL] Support columnar exchanges ### What changes were proposed in this pull request? This PR adds abstract classes for shuffle and broadcast, so that users can provide their columnar implementations. This PR updates several places to use the abstract exchange classes, and also update `AdaptiveSparkPlanExec` so that the columnar rules can see exchange nodes. This is an alternative of https://github.com/apache/spark/pull/29134 . Close https://github.com/apache/spark/pull/29134 ### Why are the changes needed? To allow columnar exchanges. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests Closes #29262 from cloud-fan/columnar. Authored-by: Wenchen Fan Signed-off-by: Thomas Graves --- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 30 -- .../adaptive/CustomShuffleReaderExec.scala | 21 ++-- .../adaptive/OptimizeLocalShuffleReader.scala | 5 +- .../execution/adaptive/OptimizeSkewedJoin.scala| 4 +- .../sql/execution/adaptive/QueryStageExec.scala| 37 --- .../sql/execution/adaptive/simpleCosting.scala | 6 +- .../execution/exchange/BroadcastExchangeExec.scala | 46 ++-- .../execution/exchange/ShuffleExchangeExec.scala | 57 +- .../execution/streaming/IncrementalExecution.scala | 4 +- .../spark/sql/SparkSessionExtensionSuite.scala | 120 + 10 files changed, 260 insertions(+), 70 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala index 34db0a3..b160b8a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala @@ -100,7 +100,12 @@ case class AdaptiveSparkPlanExec( // The following two rules need to make use of 'CustomShuffleReaderExec.partitionSpecs' // added by `CoalesceShufflePartitions`. So they must be executed after it. OptimizeSkewedJoin(conf), -OptimizeLocalShuffleReader(conf), +OptimizeLocalShuffleReader(conf) + ) + + // A list of physical optimizer rules to be applied right after a new stage is created. The input + // plan to these rules has exchange as its root node. + @transient private val postStageCreationRules = Seq( ApplyColumnarRulesAndInsertTransitions(conf, context.session.sessionState.columnarRules), CollapseCodegenStages(conf) ) @@ -227,7 +232,8 @@ case class AdaptiveSparkPlanExec( } // Run the final plan when there's no more unfinished stages. - currentPhysicalPlan = applyPhysicalRules(result.newPlan, queryStageOptimizerRules) + currentPhysicalPlan = applyPhysicalRules( +result.newPlan, queryStageOptimizerRules ++ postStageCreationRules) isFinalPlan = true executionId.foreach(onUpdatePlan(_, Seq(currentPhysicalPlan))) currentPhysicalPlan @@ -376,10 +382,22 @@ case class AdaptiveSparkPlanExec( private def newQueryStage(e: Exchange): QueryStageExec = { val optimizedPlan = applyPhysicalRules(e.child, queryStageOptimizerRules) val queryStage = e match { - case s: ShuffleExchangeExec => -ShuffleQueryStageExec(currentStageId, s.copy(child = optimizedPlan)) - case b: BroadcastExchangeExec => -BroadcastQueryStageExec(currentStageId, b.copy(child = optimizedPlan)) + case s: ShuffleExchangeLike => +val newShuffle = applyPhysicalRules( + s.withNewChildren(Seq(optimizedPlan)), postStageCreationRules) +if (!newShuffle.isInstanceOf[ShuffleExchangeLike]) { + throw new IllegalStateException( +"Custom columnar rules cannot transform shuffle node to something else.") +} +ShuffleQueryStageExec(currentStageId, newShuffle) + case b: BroadcastExchangeLike => +val newBroadcast = applyPhysicalRules( + b.withNewChildren(Seq(optimizedPlan)), postStageCreationRules) +if (!newBroadcast.isInstanceOf[BroadcastExchangeLike]) { + throw new IllegalStateException( +"Custom columnar rules cannot transform broadcast node to something else.") +} +BroadcastQueryStageExec(currentStageId, newBroadcast) } currentStageId += 1
[spark] branch master updated: [SPARK-30322][DOCS] Add stage level scheduling docs
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e926d41 [SPARK-30322][DOCS] Add stage level scheduling docs e926d41 is described below commit e926d419d305c9400f6f2426ca3e8d04a9180005 Author: Thomas Graves AuthorDate: Wed Jul 29 13:46:28 2020 -0500 [SPARK-30322][DOCS] Add stage level scheduling docs ### What changes were proposed in this pull request? Document the stage level scheduling feature. ### Why are the changes needed? Document the stage level scheduling feature. ### Does this PR introduce _any_ user-facing change? Documentation. ### How was this patch tested? n/a docs only Closes #29292 from tgravescs/SPARK-30322. Authored-by: Thomas Graves Signed-off-by: Thomas Graves --- docs/configuration.md | 7 +++ docs/running-on-yarn.md | 4 2 files changed, 11 insertions(+) diff --git a/docs/configuration.md b/docs/configuration.md index abf7610..62799db 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -3028,3 +3028,10 @@ There are configurations available to request resources for the driver: sp Spark will use the configurations specified to first request containers with the corresponding resources from the cluster manager. Once it gets the container, Spark launches an Executor in that container which will discover what resources the container has and the addresses associated with each resource. The Executor will register with the Driver and report back the resources available to that Executor. The Spark scheduler can then schedule tasks to each Executor and assign specific reso [...] See your cluster manager specific page for requirements and details on each of - [YARN](running-on-yarn.html#resource-allocation-and-configuration-overview), [Kubernetes](running-on-kubernetes.html#resource-allocation-and-configuration-overview) and [Standalone Mode](spark-standalone.html#resource-allocation-and-configuration-overview). It is currently not available with Mesos or local mode. And please also note that local-cluster mode with multiple workers is not supported(see Standalon [...] + +# Stage Level Scheduling Overview + +The stage level scheduling feature allows users to specify task and executor resource requirements at the stage level. This allows for different stages to run with executors that have different resources. A prime example of this is one ETL stage runs with executors with just CPUs, the next stage is an ML stage that needs GPUs. Stage level scheduling allows for user to request different executors that have GPUs when the ML stage runs rather then having to acquire executors with GPUs at th [...] +This is only available for the RDD API in Scala, Java, and Python and requires dynamic allocation to be enabled. It is only available on YARN at this time. See the [YARN](running-on-yarn.html#stage-level-scheduling-overview) page for more implementation details. + +See the `RDD.withResources` and `ResourceProfileBuilder` API's for using this feature. The current implementation acquires new executors for each `ResourceProfile` created and currently has to be an exact match. Spark does not try to fit tasks into an executor that require a different ResourceProfile than the executor was created with. Executors that are not in use will idle timeout with the dynamic allocation logic. The default configuration for this feature is to only allow one Resour [...] diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 36d8f0b..6f7aaf2b 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -641,6 +641,10 @@ If the user has a user defined YARN resource, lets call it `acceleratorX` then t YARN does not tell Spark the addresses of the resources allocated to each container. For that reason, the user must specify a discovery script that gets run by the executor on startup to discover what resources are available to that executor. You can find an example scripts in `examples/src/main/scripts/getGpusResources.sh`. The script must have execute permissions set and the user should setup permissions to not allow malicious users to modify it. The script should write to STDOUT a JSO [...] +# Stage Level Scheduling Overview + +Stage level scheduling is supported on YARN when dynamic allocation is enabled. One thing to note that is YARN specific is that each ResourceProfile requires a different container priority on YARN. The mapping is simply the ResourceProfile id becomes the priority, on YARN lower numbers are higher priority. This means that profiles created earlier will have a higher priority in YARN. Normally this won't matter as Spark finishes one stage before starting another one, the only case this mig [...] + #
[spark] branch master updated (d897825d -> 9dc0237)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d897825d [SPARK-32346][SQL] Support filters pushdown in Avro datasource add 9dc0237 [SPARK-32476][CORE] ResourceAllocator.availableAddrs should be deterministic No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala | 4 ++-- core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d897825d -> 9dc0237)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d897825d [SPARK-32346][SQL] Support filters pushdown in Avro datasource add 9dc0237 [SPARK-32476][CORE] ResourceAllocator.availableAddrs should be deterministic No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala | 4 ++-- core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d897825d -> 9dc0237)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d897825d [SPARK-32346][SQL] Support filters pushdown in Avro datasource add 9dc0237 [SPARK-32476][CORE] ResourceAllocator.availableAddrs should be deterministic No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala | 4 ++-- core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d897825d -> 9dc0237)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d897825d [SPARK-32346][SQL] Support filters pushdown in Avro datasource add 9dc0237 [SPARK-32476][CORE] ResourceAllocator.availableAddrs should be deterministic No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala | 4 ++-- core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d897825d -> 9dc0237)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d897825d [SPARK-32346][SQL] Support filters pushdown in Avro datasource add 9dc0237 [SPARK-32476][CORE] ResourceAllocator.availableAddrs should be deterministic No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala | 4 ++-- core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (40e6a5b -> d897825d)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 40e6a5b [SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassificationModel add d897825d [SPARK-32346][SQL] Support filters pushdown in Avro datasource No new revisions were added by this update. Summary of changes: .../benchmarks/AvroReadBenchmark-jdk11-results.txt | 72 -- .../avro/benchmarks/AvroReadBenchmark-results.txt | 72 -- .../apache/spark/sql/avro/AvroDataToCatalyst.scala | 5 +- .../apache/spark/sql/avro/AvroDeserializer.scala | 40 +++- .../org/apache/spark/sql/avro/AvroFileFormat.scala | 44 + .../org/apache/spark/sql/avro/AvroUtils.scala | 36 ++- .../sql/v2/avro/AvroPartitionReaderFactory.scala | 48 ++- .../org/apache/spark/sql/v2/avro/AvroScan.scala| 24 ++-- .../apache/spark/sql/v2/avro/AvroScanBuilder.scala | 27 +++- .../sql/avro/AvroCatalystDataConversionSuite.scala | 71 + .../org/apache/spark/sql/avro/AvroSuite.scala | 34 +- .../execution/benchmark/AvroReadBenchmark.scala| 64 ++- .../{csv/CSVFilters.scala => OrderedFilters.scala} | 68 ++-- .../spark/sql/catalyst/csv/UnivocityParser.scala | 9 ++- .../org/apache/spark/sql/internal/SQLConf.scala| 8 +++ ...iltersSuite.scala => OrderedFiltersSuite.scala} | 7 +-- 16 files changed, 430 insertions(+), 199 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/{csv/CSVFilters.scala => OrderedFilters.scala} (60%) rename sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/{csv/CSVFiltersSuite.scala => OrderedFiltersSuite.scala} (83%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (40e6a5b -> d897825d)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 40e6a5b [SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassificationModel add d897825d [SPARK-32346][SQL] Support filters pushdown in Avro datasource No new revisions were added by this update. Summary of changes: .../benchmarks/AvroReadBenchmark-jdk11-results.txt | 72 -- .../avro/benchmarks/AvroReadBenchmark-results.txt | 72 -- .../apache/spark/sql/avro/AvroDataToCatalyst.scala | 5 +- .../apache/spark/sql/avro/AvroDeserializer.scala | 40 +++- .../org/apache/spark/sql/avro/AvroFileFormat.scala | 44 + .../org/apache/spark/sql/avro/AvroUtils.scala | 36 ++- .../sql/v2/avro/AvroPartitionReaderFactory.scala | 48 ++- .../org/apache/spark/sql/v2/avro/AvroScan.scala| 24 ++-- .../apache/spark/sql/v2/avro/AvroScanBuilder.scala | 27 +++- .../sql/avro/AvroCatalystDataConversionSuite.scala | 71 + .../org/apache/spark/sql/avro/AvroSuite.scala | 34 +- .../execution/benchmark/AvroReadBenchmark.scala| 64 ++- .../{csv/CSVFilters.scala => OrderedFilters.scala} | 68 ++-- .../spark/sql/catalyst/csv/UnivocityParser.scala | 9 ++- .../org/apache/spark/sql/internal/SQLConf.scala| 8 +++ ...iltersSuite.scala => OrderedFiltersSuite.scala} | 7 +-- 16 files changed, 430 insertions(+), 199 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/{csv/CSVFilters.scala => OrderedFilters.scala} (60%) rename sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/{csv/CSVFiltersSuite.scala => OrderedFiltersSuite.scala} (83%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (40e6a5b -> d897825d)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 40e6a5b [SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassificationModel add d897825d [SPARK-32346][SQL] Support filters pushdown in Avro datasource No new revisions were added by this update. Summary of changes: .../benchmarks/AvroReadBenchmark-jdk11-results.txt | 72 -- .../avro/benchmarks/AvroReadBenchmark-results.txt | 72 -- .../apache/spark/sql/avro/AvroDataToCatalyst.scala | 5 +- .../apache/spark/sql/avro/AvroDeserializer.scala | 40 +++- .../org/apache/spark/sql/avro/AvroFileFormat.scala | 44 + .../org/apache/spark/sql/avro/AvroUtils.scala | 36 ++- .../sql/v2/avro/AvroPartitionReaderFactory.scala | 48 ++- .../org/apache/spark/sql/v2/avro/AvroScan.scala| 24 ++-- .../apache/spark/sql/v2/avro/AvroScanBuilder.scala | 27 +++- .../sql/avro/AvroCatalystDataConversionSuite.scala | 71 + .../org/apache/spark/sql/avro/AvroSuite.scala | 34 +- .../execution/benchmark/AvroReadBenchmark.scala| 64 ++- .../{csv/CSVFilters.scala => OrderedFilters.scala} | 68 ++-- .../spark/sql/catalyst/csv/UnivocityParser.scala | 9 ++- .../org/apache/spark/sql/internal/SQLConf.scala| 8 +++ ...iltersSuite.scala => OrderedFiltersSuite.scala} | 7 +-- 16 files changed, 430 insertions(+), 199 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/{csv/CSVFilters.scala => OrderedFilters.scala} (60%) rename sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/{csv/CSVFiltersSuite.scala => OrderedFiltersSuite.scala} (83%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (40e6a5b -> d897825d)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 40e6a5b [SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassificationModel add d897825d [SPARK-32346][SQL] Support filters pushdown in Avro datasource No new revisions were added by this update. Summary of changes: .../benchmarks/AvroReadBenchmark-jdk11-results.txt | 72 -- .../avro/benchmarks/AvroReadBenchmark-results.txt | 72 -- .../apache/spark/sql/avro/AvroDataToCatalyst.scala | 5 +- .../apache/spark/sql/avro/AvroDeserializer.scala | 40 +++- .../org/apache/spark/sql/avro/AvroFileFormat.scala | 44 + .../org/apache/spark/sql/avro/AvroUtils.scala | 36 ++- .../sql/v2/avro/AvroPartitionReaderFactory.scala | 48 ++- .../org/apache/spark/sql/v2/avro/AvroScan.scala| 24 ++-- .../apache/spark/sql/v2/avro/AvroScanBuilder.scala | 27 +++- .../sql/avro/AvroCatalystDataConversionSuite.scala | 71 + .../org/apache/spark/sql/avro/AvroSuite.scala | 34 +- .../execution/benchmark/AvroReadBenchmark.scala| 64 ++- .../{csv/CSVFilters.scala => OrderedFilters.scala} | 68 ++-- .../spark/sql/catalyst/csv/UnivocityParser.scala | 9 ++- .../org/apache/spark/sql/internal/SQLConf.scala| 8 +++ ...iltersSuite.scala => OrderedFiltersSuite.scala} | 7 +-- 16 files changed, 430 insertions(+), 199 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/{csv/CSVFilters.scala => OrderedFilters.scala} (60%) rename sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/{csv/CSVFiltersSuite.scala => OrderedFiltersSuite.scala} (83%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (40e6a5b -> d897825d)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 40e6a5b [SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassificationModel add d897825d [SPARK-32346][SQL] Support filters pushdown in Avro datasource No new revisions were added by this update. Summary of changes: .../benchmarks/AvroReadBenchmark-jdk11-results.txt | 72 -- .../avro/benchmarks/AvroReadBenchmark-results.txt | 72 -- .../apache/spark/sql/avro/AvroDataToCatalyst.scala | 5 +- .../apache/spark/sql/avro/AvroDeserializer.scala | 40 +++- .../org/apache/spark/sql/avro/AvroFileFormat.scala | 44 + .../org/apache/spark/sql/avro/AvroUtils.scala | 36 ++- .../sql/v2/avro/AvroPartitionReaderFactory.scala | 48 ++- .../org/apache/spark/sql/v2/avro/AvroScan.scala| 24 ++-- .../apache/spark/sql/v2/avro/AvroScanBuilder.scala | 27 +++- .../sql/avro/AvroCatalystDataConversionSuite.scala | 71 + .../org/apache/spark/sql/avro/AvroSuite.scala | 34 +- .../execution/benchmark/AvroReadBenchmark.scala| 64 ++- .../{csv/CSVFilters.scala => OrderedFilters.scala} | 68 ++-- .../spark/sql/catalyst/csv/UnivocityParser.scala | 9 ++- .../org/apache/spark/sql/internal/SQLConf.scala| 8 +++ ...iltersSuite.scala => OrderedFiltersSuite.scala} | 7 +-- 16 files changed, 430 insertions(+), 199 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/{csv/CSVFilters.scala => OrderedFilters.scala} (60%) rename sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/{csv/CSVFiltersSuite.scala => OrderedFiltersSuite.scala} (83%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5eab8d2 -> 40e6a5b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5eab8d2 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic add 40e6a5b [SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassificationModel No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/ml/ann/Layer.scala | 11 ++- .../MultilayerPerceptronClassifier.scala | 93 +- .../scala/org/apache/spark/ml/ann/ANNSuite.scala | 4 +- .../MultilayerPerceptronClassifierSuite.scala | 32 python/docs/source/reference/pyspark.ml.rst| 2 + python/pyspark/ml/classification.py| 49 +++- python/pyspark/ml/tests/test_training_summary.py | 45 ++- 7 files changed, 222 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5eab8d2 -> 40e6a5b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5eab8d2 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic add 40e6a5b [SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassificationModel No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/ml/ann/Layer.scala | 11 ++- .../MultilayerPerceptronClassifier.scala | 93 +- .../scala/org/apache/spark/ml/ann/ANNSuite.scala | 4 +- .../MultilayerPerceptronClassifierSuite.scala | 32 python/docs/source/reference/pyspark.ml.rst| 2 + python/pyspark/ml/classification.py| 49 +++- python/pyspark/ml/tests/test_training_summary.py | 45 ++- 7 files changed, 222 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5eab8d2 -> 40e6a5b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5eab8d2 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic add 40e6a5b [SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassificationModel No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/ml/ann/Layer.scala | 11 ++- .../MultilayerPerceptronClassifier.scala | 93 +- .../scala/org/apache/spark/ml/ann/ANNSuite.scala | 4 +- .../MultilayerPerceptronClassifierSuite.scala | 32 python/docs/source/reference/pyspark.ml.rst| 2 + python/pyspark/ml/classification.py| 49 +++- python/pyspark/ml/tests/test_training_summary.py | 45 ++- 7 files changed, 222 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5eab8d2 -> 40e6a5b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5eab8d2 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic add 40e6a5b [SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassificationModel No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/ml/ann/Layer.scala | 11 ++- .../MultilayerPerceptronClassifier.scala | 93 +- .../scala/org/apache/spark/ml/ann/ANNSuite.scala | 4 +- .../MultilayerPerceptronClassifierSuite.scala | 32 python/docs/source/reference/pyspark.ml.rst| 2 + python/pyspark/ml/classification.py| 49 +++- python/pyspark/ml/tests/test_training_summary.py | 45 ++- 7 files changed, 222 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5eab8d2 -> 40e6a5b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5eab8d2 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic add 40e6a5b [SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassificationModel No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/ml/ann/Layer.scala | 11 ++- .../MultilayerPerceptronClassifier.scala | 93 +- .../scala/org/apache/spark/ml/ann/ANNSuite.scala | 4 +- .../MultilayerPerceptronClassifierSuite.scala | 32 python/docs/source/reference/pyspark.ml.rst| 2 + python/pyspark/ml/classification.py| 49 +++- python/pyspark/ml/tests/test_training_summary.py | 45 ++- 7 files changed, 222 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9be0883 -> 5eab8d2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9be0883 [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread add 5eab8d2 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/util/JsonProtocol.scala | 2 +- .../org/apache/spark/util/JsonProtocolSuite.scala | 96 +++--- 2 files changed, 49 insertions(+), 49 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9be0883 -> 5eab8d2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9be0883 [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread add 5eab8d2 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/util/JsonProtocol.scala | 2 +- .../org/apache/spark/util/JsonProtocolSuite.scala | 96 +++--- 2 files changed, 49 insertions(+), 49 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9be0883 -> 5eab8d2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9be0883 [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread add 5eab8d2 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/util/JsonProtocol.scala | 2 +- .../org/apache/spark/util/JsonProtocolSuite.scala | 96 +++--- 2 files changed, 49 insertions(+), 49 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9be0883 -> 5eab8d2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9be0883 [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread add 5eab8d2 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/util/JsonProtocol.scala | 2 +- .../org/apache/spark/util/JsonProtocolSuite.scala | 96 +++--- 2 files changed, 49 insertions(+), 49 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9be0883 -> 5eab8d2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9be0883 [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread add 5eab8d2 [SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be deterministic No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/util/JsonProtocol.scala | 2 +- .../org/apache/spark/util/JsonProtocolSuite.scala | 96 +++--- 2 files changed, 49 insertions(+), 49 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new e5b5b7e [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread e5b5b7e is described below commit e5b5b7e507ab974bbca3abb0bbf56bf67696d53e Author: Kousuke Saruta AuthorDate: Wed Jul 29 08:44:56 2020 -0500 [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread ### What changes were proposed in this pull request? This PR changes the order between initialization for ExecutorPlugin and starting heartbeat thread in Executor. ### Why are the changes needed? In the current master, heartbeat thread in a executor starts after plugin initialization so if the initialization takes long time, heartbeat is not sent to driver and the executor will be removed from cluster. ### Does this PR introduce _any_ user-facing change? Yes. Plugins for executors will be allowed to take long time for initialization. ### How was this patch tested? New testcase. Closes #29002 from sarutak/fix-heartbeat-issue. Authored-by: Kousuke Saruta Signed-off-by: Thomas Graves (cherry picked from commit 9be088357eff4328248b29a3a49a816756745345) Signed-off-by: Thomas Graves --- .../main/scala/org/apache/spark/TestUtils.scala| 15 - .../scala/org/apache/spark/executor/Executor.scala | 12 ++-- .../org/apache/spark/executor/ExecutorSuite.scala | 72 +- 3 files changed, 89 insertions(+), 10 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala b/core/src/main/scala/org/apache/spark/TestUtils.scala index d459627..1e00769 100644 --- a/core/src/main/scala/org/apache/spark/TestUtils.scala +++ b/core/src/main/scala/org/apache/spark/TestUtils.scala @@ -179,11 +179,20 @@ private[spark] object TestUtils { destDir: File, toStringValue: String = "", baseClass: String = null, - classpathUrls: Seq[URL] = Seq.empty): File = { + classpathUrls: Seq[URL] = Seq.empty, + implementsClasses: Seq[String] = Seq.empty, + extraCodeBody: String = ""): File = { val extendsText = Option(baseClass).map { c => s" extends ${c}" }.getOrElse("") +val implementsText = + "implements " + (implementsClasses :+ "java.io.Serializable").mkString(", ") val sourceFile = new JavaSourceFromString(className, - "public class " + className + extendsText + " implements java.io.Serializable {" + - " @Override public String toString() { return \"" + toStringValue + "\"; }}") + s""" + |public class $className $extendsText $implementsText { + | @Override public String toString() { return "$toStringValue"; } + | + | $extraCodeBody + |} +""".stripMargin) createCompiledClass(className, destDir, sourceFile, classpathUrls) } diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index 8aeb16f..e9f1d9c 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -153,11 +153,6 @@ private[spark] class Executor( // for fetching remote cached RDD blocks, so need to make sure it uses the right classloader too. env.serializerManager.setDefaultClassLoader(replClassLoader) - // Plugins need to load using a class loader that includes the executor's user classpath - private val plugins: Option[PluginContainer] = Utils.withContextClassLoader(replClassLoader) { -PluginContainer(env, resources.asJava) - } - // Max size of direct result. If task result is bigger than this, we use the block manager // to send the result back. private val maxDirectResultSize = Math.min( @@ -218,6 +213,13 @@ private[spark] class Executor( heartbeater.start() + // Plugins need to load using a class loader that includes the executor's user classpath. + // Plugins also needs to be initialized after the heartbeater started + // to avoid blocking to send heartbeat (see SPARK-32175). + private val plugins: Option[PluginContainer] = Utils.withContextClassLoader(replClassLoader) { +PluginContainer(env, resources.asJava) + } + metricsPoller.start() private[executor] def numRunningTasks: Int = runningTasks.size() diff --git a/core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala b/core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala index 31049d1..b198448 100644 --- a/core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala +++ b/core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala @@ -17,7 +17,7 @@ package
[spark] branch master updated: [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9be0883 [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread 9be0883 is described below commit 9be088357eff4328248b29a3a49a816756745345 Author: Kousuke Saruta AuthorDate: Wed Jul 29 08:44:56 2020 -0500 [SPARK-32175][CORE] Fix the order between initialization for ExecutorPlugin and starting heartbeat thread ### What changes were proposed in this pull request? This PR changes the order between initialization for ExecutorPlugin and starting heartbeat thread in Executor. ### Why are the changes needed? In the current master, heartbeat thread in a executor starts after plugin initialization so if the initialization takes long time, heartbeat is not sent to driver and the executor will be removed from cluster. ### Does this PR introduce _any_ user-facing change? Yes. Plugins for executors will be allowed to take long time for initialization. ### How was this patch tested? New testcase. Closes #29002 from sarutak/fix-heartbeat-issue. Authored-by: Kousuke Saruta Signed-off-by: Thomas Graves --- .../main/scala/org/apache/spark/TestUtils.scala| 15 - .../scala/org/apache/spark/executor/Executor.scala | 12 ++-- .../org/apache/spark/executor/ExecutorSuite.scala | 72 +- 3 files changed, 89 insertions(+), 10 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala b/core/src/main/scala/org/apache/spark/TestUtils.scala index 259cc43..6947d1c 100644 --- a/core/src/main/scala/org/apache/spark/TestUtils.scala +++ b/core/src/main/scala/org/apache/spark/TestUtils.scala @@ -179,11 +179,20 @@ private[spark] object TestUtils { destDir: File, toStringValue: String = "", baseClass: String = null, - classpathUrls: Seq[URL] = Seq.empty): File = { + classpathUrls: Seq[URL] = Seq.empty, + implementsClasses: Seq[String] = Seq.empty, + extraCodeBody: String = ""): File = { val extendsText = Option(baseClass).map { c => s" extends ${c}" }.getOrElse("") +val implementsText = + "implements " + (implementsClasses :+ "java.io.Serializable").mkString(", ") val sourceFile = new JavaSourceFromString(className, - "public class " + className + extendsText + " implements java.io.Serializable {" + - " @Override public String toString() { return \"" + toStringValue + "\"; }}") + s""" + |public class $className $extendsText $implementsText { + | @Override public String toString() { return "$toStringValue"; } + | + | $extraCodeBody + |} +""".stripMargin) createCompiledClass(className, destDir, sourceFile, classpathUrls) } diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index bc0f0c0..d220029 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -154,11 +154,6 @@ private[spark] class Executor( // for fetching remote cached RDD blocks, so need to make sure it uses the right classloader too. env.serializerManager.setDefaultClassLoader(replClassLoader) - // Plugins need to load using a class loader that includes the executor's user classpath - private val plugins: Option[PluginContainer] = Utils.withContextClassLoader(replClassLoader) { -PluginContainer(env, resources.asJava) - } - // Max size of direct result. If task result is bigger than this, we use the block manager // to send the result back. private val maxDirectResultSize = Math.min( @@ -225,6 +220,13 @@ private[spark] class Executor( heartbeater.start() + // Plugins need to load using a class loader that includes the executor's user classpath. + // Plugins also needs to be initialized after the heartbeater started + // to avoid blocking to send heartbeat (see SPARK-32175). + private val plugins: Option[PluginContainer] = Utils.withContextClassLoader(replClassLoader) { +PluginContainer(env, resources.asJava) + } + metricsPoller.start() private[executor] def numRunningTasks: Int = runningTasks.size() diff --git a/core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala b/core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala index 31049d1..b198448 100644 --- a/core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala +++ b/core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala @@ -17,7 +17,7 @@ package org.apache.spark.executor -import java.io.{Externalizable, ObjectInput, ObjectOutput} +import java.io.{Externalizable, File,