[spark] branch master updated (8a96f69bb53 -> f0950fea814)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8a96f69bb53 [SPARK-40874][PYTHON] Fix broadcasts in Python UDFs when encryption enabled add f0950fea814 [SPARK-40878][INFRA] pin 'grpcio==1.48.1' 'protobuf==4.21.6' No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (eac40927e7f -> 8a96f69bb53)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from eac40927e7f [SPARK-40871][INFRA] Upgrade actions/github-script to v6 and fix notify workflow add 8a96f69bb53 [SPARK-40874][PYTHON] Fix broadcasts in Python UDFs when encryption enabled No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/api/python/PythonRunner.scala | 2 +- python/pyspark/tests/test_broadcast.py | 14 ++ 2 files changed, 15 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40871][INFRA] Upgrade actions/github-script to v6 and fix notify workflow
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new eac40927e7f [SPARK-40871][INFRA] Upgrade actions/github-script to v6 and fix notify workflow eac40927e7f is described below commit eac40927e7f0e63d254bc4ad1f790b184cd45887 Author: Yikun Jiang AuthorDate: Sat Oct 22 10:37:41 2022 +0900 [SPARK-40871][INFRA] Upgrade actions/github-script to v6 and fix notify workflow ### What changes were proposed in this pull request? Upgrade actions/github-scripts from v3 to v6 and fix notify workflow ### Why are the changes needed? Node.js 12 actions are deprecated. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/. - Since github-script V5, change from `github.*` to `github.rest.*`, but `request`, `paginate` are unchanged. see also https://github.com/actions/github-script#breaking-changes-in-v5 - Since github-script V6, upgrade node12 to node16 ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? - Due to `pull_request_target`, the current PR is not in effect, we can only do test on local : set default branch to V6, and submit the PR https://github.com/Yikun/spark/pull/181 Notify works well: https://user-images.githubusercontent.com/1736354/197310102-6c709716-8a99-422d-8d38-3f770b6925f0.png;> Update status set to failed as expeceted: https://user-images.githubusercontent.com/1736354/197310119-30332769-0553-4ffa-816c-97a5ec0b3c27.png;> And `See test results` set right. https://github.com/Yikun/spark/pull/181/checks?check_run_id=9029035780 Closes #38341 from Yikun/upgrade-actions. Authored-by: Yikun Jiang Signed-off-by: Hyukjin Kwon --- .github/workflows/notify_test_workflow.yml | 6 +++--- .github/workflows/update_build_status.yml | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/notify_test_workflow.yml b/.github/workflows/notify_test_workflow.yml index c9b5c54f362..6fb776d7083 100644 --- a/.github/workflows/notify_test_workflow.yml +++ b/.github/workflows/notify_test_workflow.yml @@ -36,7 +36,7 @@ jobs: checks: write steps: - name: "Notify test workflow" -uses: actions/github-script@f05a81df23035049204b043b50c3322045ce7eb3 # pin@v3 +uses: actions/github-script@v6 with: github-token: ${{ secrets.GITHUB_TOKEN }} script: | @@ -80,7 +80,7 @@ jobs: status = 'completed' const conclusion = 'action_required' - github.checks.create({ + github.rest.checks.create({ owner: context.repo.owner, repo: context.repo.repo, name: name, @@ -132,7 +132,7 @@ jobs: + '/actions/runs/' + run_id - github.checks.create({ + github.rest.checks.create({ owner: context.repo.owner, repo: context.repo.repo, name: name, diff --git a/.github/workflows/update_build_status.yml b/.github/workflows/update_build_status.yml index 7f3826817df..05cf4914a25 100644 --- a/.github/workflows/update_build_status.yml +++ b/.github/workflows/update_build_status.yml @@ -32,7 +32,7 @@ jobs: checks: write steps: - name: "Update build status" -uses: actions/github-script@f05a81df23035049204b043b50c3322045ce7eb3 # pin@v3 +uses: actions/github-script@v6 with: github-token: ${{ secrets.GITHUB_TOKEN }} script: | - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (98f9edabb45 -> 6545a0873df)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 98f9edabb45 [SPARK-40796][CONNECT][FOLLOW-UP] Improve README for proto generated files in Connect Python client add 6545a0873df [SPARK-40796][CONNECT][DOC][FOLLOW-UP] Add check command in Readme No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (748fa2792e4 -> 98f9edabb45)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 748fa2792e4 [SPARK-40863][BUILD] Upgrade dropwizard metrics 4.2.12 add 98f9edabb45 [SPARK-40796][CONNECT][FOLLOW-UP] Improve README for proto generated files in Connect Python client No new revisions were added by this update. Summary of changes: dev/check-codegen-python.py | 4 +++- python/pyspark/sql/connect/README.md | 5 + 2 files changed, 8 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40863][BUILD] Upgrade dropwizard metrics 4.2.12
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 748fa2792e4 [SPARK-40863][BUILD] Upgrade dropwizard metrics 4.2.12 748fa2792e4 is described below commit 748fa2792e488a6b923b32e2898d9bb6e16fb4ca Author: yangjie01 AuthorDate: Fri Oct 21 08:53:29 2022 -0500 [SPARK-40863][BUILD] Upgrade dropwizard metrics 4.2.12 ### What changes were proposed in this pull request? This pr aims upgrade dropwizard metrics from 4.2.10 to 4.2.12. ### Why are the changes needed? The release notes as follows: - https://github.com/dropwizard/metrics/releases/tag/v4.2.11 - https://github.com/dropwizard/metrics/releases/tag/v4.2.12 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions Closes #38328 from LuciferYang/metrics-4212. Authored-by: yangjie01 Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-2-hive-2.3 | 10 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 10 +- pom.xml | 2 +- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index b7850d2fe60..1d1061aaadb 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -195,11 +195,11 @@ log4j-slf4j2-impl/2.19.0//log4j-slf4j2-impl-2.19.0.jar logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar lz4-java/1.8.0//lz4-java-1.8.0.jar mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar -metrics-core/4.2.10//metrics-core-4.2.10.jar -metrics-graphite/4.2.10//metrics-graphite-4.2.10.jar -metrics-jmx/4.2.10//metrics-jmx-4.2.10.jar -metrics-json/4.2.10//metrics-json-4.2.10.jar -metrics-jvm/4.2.10//metrics-jvm-4.2.10.jar +metrics-core/4.2.12//metrics-core-4.2.12.jar +metrics-graphite/4.2.12//metrics-graphite-4.2.12.jar +metrics-jmx/4.2.12//metrics-jmx-4.2.12.jar +metrics-json/4.2.12//metrics-json-4.2.12.jar +metrics-jvm/4.2.12//metrics-jvm-4.2.12.jar minlog/1.3.0//minlog-1.3.0.jar netty-all/4.1.84.Final//netty-all-4.1.84.Final.jar netty-buffer/4.1.84.Final//netty-buffer-4.1.84.Final.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 0f497c99ff9..39a0e617058 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -179,11 +179,11 @@ log4j-slf4j2-impl/2.19.0//log4j-slf4j2-impl-2.19.0.jar logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar lz4-java/1.8.0//lz4-java-1.8.0.jar mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar -metrics-core/4.2.10//metrics-core-4.2.10.jar -metrics-graphite/4.2.10//metrics-graphite-4.2.10.jar -metrics-jmx/4.2.10//metrics-jmx-4.2.10.jar -metrics-json/4.2.10//metrics-json-4.2.10.jar -metrics-jvm/4.2.10//metrics-jvm-4.2.10.jar +metrics-core/4.2.12//metrics-core-4.2.12.jar +metrics-graphite/4.2.12//metrics-graphite-4.2.12.jar +metrics-jmx/4.2.12//metrics-jmx-4.2.12.jar +metrics-json/4.2.12//metrics-json-4.2.12.jar +metrics-jvm/4.2.12//metrics-jvm-4.2.12.jar minlog/1.3.0//minlog-1.3.0.jar netty-all/4.1.84.Final//netty-all-4.1.84.Final.jar netty-buffer/4.1.84.Final//netty-buffer-4.1.84.Final.jar diff --git a/pom.xml b/pom.xml index f8f3aa2fd4f..d933c1c6f6d 100644 --- a/pom.xml +++ b/pom.xml @@ -145,7 +145,7 @@ If you changes codahale.metrics.version, you also need to change the link to metrics.dropwizard.io in docs/monitoring.md. --> -4.2.10 +4.2.12 1.11.1 1.12.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40865][BUILD] Upgrade jodatime to 2.12.0
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9b7c9051930 [SPARK-40865][BUILD] Upgrade jodatime to 2.12.0 9b7c9051930 is described below commit 9b7c90519307eb40b6eaa641d98c894915b1bcdc Author: yangjie01 AuthorDate: Fri Oct 21 08:52:41 2022 -0500 [SPARK-40865][BUILD] Upgrade jodatime to 2.12.0 ### What changes were proposed in this pull request? This pr aims upgrade jodatime to 2.12.0. ### Why are the changes needed? This version includes: - Add translations for ca, el, eu, fi, hi, hu, in, iw, ms, nn, ro, sk, sv, zh. - DateTimeZone data updated to version 2022egtz. The release notes as following: - https://www.joda.org/joda-time/changes-report.html#a2.12.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes #38329 from LuciferYang/joda-212. Authored-by: yangjie01 Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index ee9977e2592..b7850d2fe60 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -147,7 +147,7 @@ jetty-util/6.1.26//jetty-util-6.1.26.jar jetty-util/9.4.49.v20220914//jetty-util-9.4.49.v20220914.jar jetty/6.1.26//jetty-6.1.26.jar jline/2.14.6//jline-2.14.6.jar -joda-time/2.11.2//joda-time-2.11.2.jar +joda-time/2.12.0//joda-time-2.12.0.jar jodd-core/3.5.2//jodd-core-3.5.2.jar jpam/1.1//jpam-1.1.jar json/1.8//json-1.8.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 766a28503e4..0f497c99ff9 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -132,7 +132,7 @@ jettison/1.1//jettison-1.1.jar jetty-util-ajax/9.4.49.v20220914//jetty-util-ajax-9.4.49.v20220914.jar jetty-util/9.4.49.v20220914//jetty-util-9.4.49.v20220914.jar jline/2.14.6//jline-2.14.6.jar -joda-time/2.11.2//joda-time-2.11.2.jar +joda-time/2.12.0//joda-time-2.12.0.jar jodd-core/3.5.2//jodd-core-3.5.2.jar jpam/1.1//jpam-1.1.jar json/1.8//json-1.8.jar diff --git a/pom.xml b/pom.xml index 65dfcdb2234..f8f3aa2fd4f 100644 --- a/pom.xml +++ b/pom.xml @@ -192,7 +192,7 @@ 14.0.1 3.1.7 2.36 -2.11.2 +2.12.0 3.5.2 3.0.0 0.12.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (aea9fb74ca6 -> 26e258c88a5)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from aea9fb74ca6 [MINOR][CORE][SQL][FOLLOWUP] Add missing s prefix to enable string interpolation add 26e258c88a5 [SPARK-40854][CONNECT] Use proper JSON encoding until we have Arrow collection No new revisions were added by this update. Summary of changes: .../src/main/protobuf/spark/connect/base.proto | 9 ++- .../service/SparkConnectStreamHandler.scala| 76 ++ python/pyspark/sql/connect/client.py | 4 +- python/pyspark/sql/connect/proto/base_pb2.py | 36 +- python/pyspark/sql/connect/proto/base_pb2.pyi | 27 5 files changed, 104 insertions(+), 48 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][CORE][SQL][FOLLOWUP] Add missing s prefix to enable string interpolation
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new aea9fb74ca6 [MINOR][CORE][SQL][FOLLOWUP] Add missing s prefix to enable string interpolation aea9fb74ca6 is described below commit aea9fb74ca6bc91604b32696a5638e5c93933d1f Author: Enrico Minack AuthorDate: Fri Oct 21 20:20:04 2022 +0900 [MINOR][CORE][SQL][FOLLOWUP] Add missing s prefix to enable string interpolation ### What changes were proposed in this pull request? Adds missing `s` prefix to enable string interpolation. Complements #38297. ### Why are the changes needed? Strings will not contain substituted values but variable names. ### Does this PR introduce _any_ user-facing change? Log messages will change. ### How was this patch tested? Not tested. Closes #38307 from EnricoMi/branch-fix-string-interpolation-2. Authored-by: Enrico Minack Signed-off-by: Hyukjin Kwon --- core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala | 2 +- .../main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../apache/spark/executor/CoarseGrainedExecutorBackendSuite.scala | 4 ++-- .../spark/ml/regression/GeneralizedLinearRegressionSuite.scala| 2 +- .../scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala | 8 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala index b6d6441925a..4efce34b18c 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala @@ -1883,7 +1883,7 @@ private[spark] class DAGScheduler( if (ignoreStageFailure) { logInfo(s"Ignoring fetch failure from $task of $failedStage attempt " + s"${task.stageAttemptId} when count spark.stage.maxConsecutiveAttempts " + - "as executor ${bmAddress.executorId} is decommissioned and " + + s"as executor ${bmAddress.executorId} is decommissioned and " + s" ${config.STAGE_IGNORE_DECOMMISSION_FETCH_FAILURE.key}=true") } else { failedStage.failedAttemptIds.add(task.stageAttemptId) diff --git a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala index 5004262a71c..1eb588124a7 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala @@ -641,7 +641,7 @@ private[spark] class TaskSchedulerImpl( if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) { logInfo("Notifying ExecutorAllocationManager to allocate more executors to" + " schedule the unschedulable task before aborting" + -" stage ${taskSet.stageId}.") +s" stage ${taskSet.stageId}.") dagScheduler.unschedulableTaskSetAdded(taskSet.taskSet.stageId, taskSet.taskSet.stageAttemptId) updateUnschedulableTaskSetTimeoutAndStartAbortTimer(taskSet, taskIndex) diff --git a/core/src/test/scala/org/apache/spark/executor/CoarseGrainedExecutorBackendSuite.scala b/core/src/test/scala/org/apache/spark/executor/CoarseGrainedExecutorBackendSuite.scala index a12b7034a6d..a8b1304b76f 100644 --- a/core/src/test/scala/org/apache/spark/executor/CoarseGrainedExecutorBackendSuite.scala +++ b/core/src/test/scala/org/apache/spark/executor/CoarseGrainedExecutorBackendSuite.scala @@ -395,7 +395,7 @@ class CoarseGrainedExecutorBackendSuite extends SparkFunSuite // Fake tasks with different taskIds. val taskDescriptions = (1 to numTasks).map { -taskId => new TaskDescription(taskId, 2, "1", "TASK ${taskId}", 19, +taskId => new TaskDescription(taskId, 2, "1", s"TASK $taskId", 19, 1, mutable.Map.empty, mutable.Map.empty, mutable.Map.empty, new Properties, 1, Map(GPU -> new ResourceInformation(GPU, Array("0", "1"))), data) } @@ -483,7 +483,7 @@ class CoarseGrainedExecutorBackendSuite extends SparkFunSuite // Fake tasks with different taskIds. val taskDescriptions = (1 to numTasks).map { -taskId => new TaskDescription(taskId, 2, "1", "TASK ${taskId}", 19, +taskId => new TaskDescription(taskId, 2, "1", s"TASK $taskId", 19, 1, mutable.Map.empty, mutable.Map.empty, mutable.Map.empty, new Properties, 1, Map(GPU -> new ResourceInformation(GPU, Array("0", "1"))), data) } diff --git
[spark-docker] branch master updated: [SPARK-40864] Remove pip/setuptools dynamic upgrade
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 52e5856 [SPARK-40864] Remove pip/setuptools dynamic upgrade 52e5856 is described below commit 52e5856d81e70a9d9e87292c6caf42587ce433df Author: Yikun Jiang AuthorDate: Fri Oct 21 17:02:54 2022 +0800 [SPARK-40864] Remove pip/setuptools dynamic upgrade ### What changes were proposed in this pull request? Remove pip/setuptools dynamic upgrade in dockerfile ### Why are the changes needed? According to [official image suggestion](https://github.com/docker-library/official-images#repeatability), `Rebuilding the same Dockerfile should result in the same version of the image being packaged`. But we used to upgrade pip/setuptools to latest, actually we don't need a latest pip/setuptools for any reason I can think out. I also take a look on [initial commits](https://github.com/apache-spark-on-k8s/spark/commit/befcf0a30651d0335bb57c242a824e43748db33f) for this line, according merge history no more reason for it. ### Does this PR introduce _any_ user-facing change? The OS recommand pip/setuptools version is used. ### How was this patch tested? CI passed. Closes #17 from Yikun/remove-pip. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 1 - 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile | 1 - Dockerfile.template| 1 - 3 files changed, 3 deletions(-) diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile index ac16bdd..8c2761e 100644 --- a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -26,7 +26,6 @@ RUN set -ex && \ ln -s /lib /lib64 && \ apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu && \ apt install -y python3 python3-pip && \ -pip3 install --upgrade pip setuptools && \ apt install -y r-base r-base-dev && \ mkdir -p /opt/spark && \ mkdir /opt/spark/python && \ diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile index c6e433d..6a0017a 100644 --- a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile @@ -26,7 +26,6 @@ RUN set -ex && \ ln -s /lib /lib64 && \ apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu && \ apt install -y python3 python3-pip && \ -pip3 install --upgrade pip setuptools && \ mkdir -p /opt/spark && \ mkdir /opt/spark/python && \ mkdir -p /opt/spark/examples && \ diff --git a/Dockerfile.template b/Dockerfile.template index 2b90fe5..a220247 100644 --- a/Dockerfile.template +++ b/Dockerfile.template @@ -27,7 +27,6 @@ RUN set -ex && \ apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu && \ {%- if HAVE_PY %} apt install -y python3 python3-pip && \ -pip3 install --upgrade pip setuptools && \ {%- endif %} {%- if HAVE_R %} apt install -y r-base r-base-dev && \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40799][BUILD][CONNECT][FOLLOW-UP] Keep the console output consistent of lint-scala script as was
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 140c99c69dd [SPARK-40799][BUILD][CONNECT][FOLLOW-UP] Keep the console output consistent of lint-scala script as was 140c99c69dd is described below commit 140c99c69dd2516726552019afb945f9b06a8c1b Author: Hyukjin Kwon AuthorDate: Fri Oct 21 17:38:37 2022 +0900 [SPARK-40799][BUILD][CONNECT][FOLLOW-UP] Keep the console output consistent of lint-scala script as was ### What changes were proposed in this pull request? This PR proposes to keep `dev/lint-scala` quiet as was. ### Why are the changes needed? To remove noisy output from the `dev/lint-scala` script. **Before** Success ``` Scalastyle checks passed. Using `mvn` from path: /.../spark/build/apache-maven-3.8.6/bin/mvn [INFO] Scanning for projects... [INFO] [INFO] Detecting the operating system and CPU architecture [INFO] [INFO] os.detected.name: osx [INFO] os.detected.arch: x86_64 [INFO] os.detected.version: 10.16 [INFO] os.detected.version.major: 10 [INFO] os.detected.version.minor: 16 [INFO] os.detected.classifier: osx-x86_64 [INFO] [INFO] < org.apache.spark:spark-connect_2.12 >- [INFO] Building Spark Project Connect 3.4.0-SNAPSHOT [INFO] [ jar ]- [INFO] [INFO] --- mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) spark-connect_2.12 --- [INFO] parsed config (v3.5.9): dev/.scalafmt.conf [INFO] Scalafmt results: 0 of 11 were unformatted Details: Formatted: Connect.scala Formatted: DataTypeProtoConverter.scala Formatted: SparkConnectPlanner.scala Formatted: SparkConnectPlugin.scala Formatted: SparkConnectCommandPlanner.scala Formatted: SparkConnectStreamHandler.scala Formatted: SparkConnectService.scala Formatted: package.scala Formatted: SparkConnectProtoSuite.scala Formatted: SparkConnectPlannerSuite.scala Formatted: SparkConnectCommandPlannerSuite.scala [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 5.257 s [INFO] Finished at: 2022-10-21T11:18:19+09:00 [INFO] ``` Failure ``` Scalastyle checks passed. Using `mvn` from path: /Users/hyukjin.kwon/workspace/forked/spark/build/apache-maven-3.8.6/bin/mvn [INFO] Scanning for projects... [INFO] [INFO] Detecting the operating system and CPU architecture [INFO] [INFO] os.detected.name: osx [INFO] os.detected.arch: x86_64 [INFO] os.detected.version: 10.16 [INFO] os.detected.version.major: 10 [INFO] os.detected.version.minor: 16 [INFO] os.detected.classifier: osx-x86_64 [INFO] [INFO] < org.apache.spark:spark-connect_2.12 >- [INFO] Building Spark Project Connect 3.4.0-SNAPSHOT [INFO] [ jar ]- [INFO] [INFO] --- mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli) spark-connect_2.12 --- [INFO] parsed config (v3.5.9): dev/.scalafmt.conf [INFO] Scalafmt results: 0 of 11 were unformatted Details: Formatted: Connect.scala Formatted: DataTypeProtoConverter.scala Formatted: SparkConnectPlanner.scala Formatted: SparkConnectPlugin.scala Formatted: SparkConnectCommandPlanner.scala Formatted: SparkConnectStreamHandler.scala Formatted: SparkConnectService.scala Formatted: package.scala Formatted: SparkConnectProtoSuite.scala Formatted: SparkConnectPlannerSuite.scala Formatted: SparkConnectCommandPlannerSuite.scala [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 5.257 s [INFO] Finished at: 2022-10-21T11:18:19+09:00 [INFO] (python3.9) ➜ spark git:(master) ./dev/lint-scala Scalastyle checks passed. Using `mvn` from path: /Users/hyukjin.kwon/workspace/forked/spark/build/apache-maven-3.8.6/bin/mvn [INFO]
[spark] branch master updated: [SPARK-40839][CONNECT][PYTHON] Implement `DataFrame.sample`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7934f00d124 [SPARK-40839][CONNECT][PYTHON] Implement `DataFrame.sample` 7934f00d124 is described below commit 7934f00d1241431dd59207650693aaad1a319a70 Author: Ruifeng Zheng AuthorDate: Fri Oct 21 17:18:34 2022 +0900 [SPARK-40839][CONNECT][PYTHON] Implement `DataFrame.sample` ### What changes were proposed in this pull request? Implement `DataFrame.sample` in Connect ### Why are the changes needed? for DataFrame API coverage ### Does this PR introduce _any_ user-facing change? Yes, new API ``` def sample( self, fraction: float, *, withReplacement: bool = False, seed: Optional[int] = None, ) -> "DataFrame": ``` ### How was this patch tested? added UT Closes #38310 from zhengruifeng/connect_df_sample. Authored-by: Ruifeng Zheng Signed-off-by: Hyukjin Kwon --- .../main/protobuf/spark/connect/relations.proto| 6 ++- .../org/apache/spark/sql/connect/dsl/package.scala | 3 +- .../sql/connect/planner/SparkConnectPlanner.scala | 5 ++- python/pyspark/sql/connect/dataframe.py| 27 python/pyspark/sql/connect/plan.py | 50 ++ python/pyspark/sql/connect/proto/relations_pb2.py | 6 ++- python/pyspark/sql/connect/proto/relations_pb2.pyi | 19 ++-- .../sql/tests/connect/test_connect_plan_only.py| 18 8 files changed, 125 insertions(+), 9 deletions(-) diff --git a/connector/connect/src/main/protobuf/spark/connect/relations.proto b/connector/connect/src/main/protobuf/spark/connect/relations.proto index 6adf0831ea2..7dbde775ee8 100644 --- a/connector/connect/src/main/protobuf/spark/connect/relations.proto +++ b/connector/connect/src/main/protobuf/spark/connect/relations.proto @@ -201,5 +201,9 @@ message Sample { double lower_bound = 2; double upper_bound = 3; bool with_replacement = 4; - int64 seed = 5; + Seed seed = 5; + + message Seed { +int64 seed = 1; + } } diff --git a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala index 68bbc0487f9..4630c86049c 100644 --- a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala +++ b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala @@ -272,7 +272,8 @@ package object dsl { .setUpperBound(upperBound) .setLowerBound(lowerBound) .setWithReplacement(withReplacement) - .setSeed(seed)) + .setSeed(proto.Sample.Seed.newBuilder().setSeed(seed).build()) + .build()) .build() } diff --git a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala index 92c8bf01cba..880618cc333 100644 --- a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala +++ b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala @@ -31,6 +31,7 @@ import org.apache.spark.sql.catalyst.plans.logical.{Deduplicate, LogicalPlan, Sa import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap import org.apache.spark.sql.execution.QueryExecution import org.apache.spark.sql.types._ +import org.apache.spark.util.Utils final case class InvalidPlanInput( private val message: String = "", @@ -80,7 +81,7 @@ class SparkConnectPlanner(plan: proto.Relation, session: SparkSession) { /** * All fields of [[proto.Sample]] are optional. However, given those are proto primitive types, - * we cannot differentiate if the fied is not or set when the field's value equals to the type + * we cannot differentiate if the field is not or set when the field's value equals to the type * default value. In the future if this ever become a problem, one solution could be that to * wrap such fields into proto messages. */ @@ -89,7 +90,7 @@ class SparkConnectPlanner(plan: proto.Relation, session: SparkSession) { rel.getLowerBound, rel.getUpperBound, rel.getWithReplacement, - rel.getSeed, + if (rel.hasSeed) rel.getSeed.getSeed else Utils.random.nextLong, transformRelation(rel.getInput)) } diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index 5ca747fdd6a..eabcf433ae9 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -206,6 +206,33 @@ class DataFrame(object):
[spark-docker] branch master updated: [SPARK-40866][INFRA] Rename Spark repository as Spark Docker repository in GA
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 6f56ef1 [SPARK-40866][INFRA] Rename Spark repository as Spark Docker repository in GA 6f56ef1 is described below commit 6f56ef1c8c8bccd05069d4590f7ae084d4c72b4d Author: Qian.Sun AuthorDate: Fri Oct 21 16:02:50 2022 +0800 [SPARK-40866][INFRA] Rename Spark repository as Spark Docker repository in GA ### What changes were proposed in this pull request? This PR aim to rename `Spark repository` as `Spark Docker repository` in GA, discussion as https://github.com/apache/spark-docker/pull/15#discussion_r1001440707 ### Why are the changes needed? Actually repository is apache/spark-docker. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the GA Closes #18 from dcoliversun/SPARK-40866. Authored-by: Qian.Sun Signed-off-by: Yikun Jiang --- .github/workflows/main.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index b47245b..08bba68 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -60,7 +60,7 @@ jobs: - ${{ inputs.java }} image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu] steps: - - name: Checkout Spark repository + - name: Checkout Spark Docker repository uses: actions/checkout@v2 - name: Set up QEMU - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40657] Add support for Java classes in Protobuf functions
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fd9e5760bae [SPARK-40657] Add support for Java classes in Protobuf functions fd9e5760bae is described below commit fd9e5760bae847f47c9c108f0e58814748e0d9b1 Author: Raghu Angadi AuthorDate: Fri Oct 21 15:46:50 2022 +0900 [SPARK-40657] Add support for Java classes in Protobuf functions ### What changes were proposed in this pull request? Adds support for compiled Java classes to Protobuf functions. This is tested with Protobuf v3 classes. V2 vs V3 issues will be handled in a separate PR. The main changes in this PR: - Changes to top level API: - Adds new version that takes just the class name. - Changes the order of arguments for existing API with descriptor files (`messageName` and `descFilePath` are swapped). - Protobuf utils methods to create descriptor from Java class name. - Many unit tests are update to check both versions : (1) with descriptor file and (2) with Java class name. - Maven build updates to generate Java classes to use in tests. - Miscellaneous changes: - Adds `proto` to package name in `proto` files used in tests. - A few TODO comments about improvements ### Why are the changes needed? Java compiled classes is a common method for users to provide Protobuf definitions. ### Does this PR introduce _any_ user-facing change? No. This updates interface, but for a new feature in active development. ### How was this patch tested? - Unit tests Closes #38286 from rangadi/protobuf-java. Authored-by: Raghu Angadi Signed-off-by: Jungtaek Lim --- connector/protobuf/pom.xml | 23 +- .../sql/protobuf/CatalystDataToProtobuf.scala | 10 +- .../sql/protobuf/ProtobufDataToCatalyst.scala | 34 ++- .../org/apache/spark/sql/protobuf/functions.scala | 58 +++- .../spark/sql/protobuf/utils/ProtobufUtils.scala | 65 - .../sql/protobuf/utils/SchemaConverters.scala | 4 + .../test/resources/protobuf/catalyst_types.proto | 4 +- .../test/resources/protobuf/functions_suite.proto | 4 +- .../src/test/resources/protobuf/serde_suite.proto | 6 +- .../ProtobufCatalystDataConversionSuite.scala | 97 +-- .../sql/protobuf/ProtobufFunctionsSuite.scala | 318 + .../spark/sql/protobuf/ProtobufSerdeSuite.scala| 9 +- project/SparkBuild.scala | 6 +- python/pyspark/sql/protobuf/functions.py | 22 +- 14 files changed, 437 insertions(+), 223 deletions(-) diff --git a/connector/protobuf/pom.xml b/connector/protobuf/pom.xml index 0515f128b8d..b934c7f831a 100644 --- a/connector/protobuf/pom.xml +++ b/connector/protobuf/pom.xml @@ -83,7 +83,6 @@ ${protobuf.version} compile - target/scala-${scala.binary.version}/classes @@ -110,6 +109,28 @@ + +com.github.os72 +protoc-jar-maven-plugin +3.11.4 + + + +generate-test-sources + + run + + + com.google.protobuf:protoc:${protobuf.version} + ${protobuf.version} + +src/test/resources/protobuf + + test + + + + diff --git a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala index 145100268c2..b9f7907ea8c 100644 --- a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala +++ b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala @@ -25,17 +25,17 @@ import org.apache.spark.sql.types.{BinaryType, DataType} private[protobuf] case class CatalystDataToProtobuf( child: Expression, -descFilePath: String, -messageName: String) +messageName: String, +descFilePath: Option[String] = None) extends UnaryExpression { override def dataType: DataType = BinaryType - @transient private lazy val protoType = -ProtobufUtils.buildDescriptor(descFilePath, messageName) + @transient private lazy val protoDescriptor = +ProtobufUtils.buildDescriptor(messageName, descFilePathOpt = descFilePath) @transient private lazy val serializer = -new ProtobufSerializer(child.dataType, protoType, child.nullable) +new ProtobufSerializer(child.dataType, protoDescriptor, child.nullable) override def nullSafeEval(input: Any): Any = { val dynamicMessage =