[spark] branch master updated (8a96f69bb53 -> f0950fea814)

2022-10-21 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8a96f69bb53 [SPARK-40874][PYTHON] Fix broadcasts in Python UDFs when 
encryption enabled
 add f0950fea814 [SPARK-40878][INFRA] pin 'grpcio==1.48.1' 
'protobuf==4.21.6'

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (eac40927e7f -> 8a96f69bb53)

2022-10-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from eac40927e7f [SPARK-40871][INFRA] Upgrade actions/github-script to v6 
and fix notify workflow
 add 8a96f69bb53 [SPARK-40874][PYTHON] Fix broadcasts in Python UDFs when 
encryption enabled

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/api/python/PythonRunner.scala   |  2 +-
 python/pyspark/tests/test_broadcast.py | 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40871][INFRA] Upgrade actions/github-script to v6 and fix notify workflow

2022-10-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eac40927e7f [SPARK-40871][INFRA] Upgrade actions/github-script to v6 
and fix notify workflow
eac40927e7f is described below

commit eac40927e7f0e63d254bc4ad1f790b184cd45887
Author: Yikun Jiang 
AuthorDate: Sat Oct 22 10:37:41 2022 +0900

[SPARK-40871][INFRA] Upgrade actions/github-script to v6 and fix notify 
workflow

### What changes were proposed in this pull request?
Upgrade actions/github-scripts from v3 to v6 and fix notify workflow

### Why are the changes needed?
Node.js 12 actions are deprecated. For more information see: 
https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/.

- Since github-script V5,  change from `github.*` to `github.rest.*`, but 
`request`, `paginate` are unchanged.
  see also https://github.com/actions/github-script#breaking-changes-in-v5
- Since github-script V6, upgrade node12 to node16

### Does this PR introduce _any_ user-facing change?
No, dev only

### How was this patch tested?
- Due to `pull_request_target`, the current PR is not in effect, we can 
only do test on local : set default branch to V6, and submit the PR 
https://github.com/Yikun/spark/pull/181

Notify works well:
https://user-images.githubusercontent.com/1736354/197310102-6c709716-8a99-422d-8d38-3f770b6925f0.png;>

Update status set to failed as expeceted:
https://user-images.githubusercontent.com/1736354/197310119-30332769-0553-4ffa-816c-97a5ec0b3c27.png;>

And `See test results` set right.
https://github.com/Yikun/spark/pull/181/checks?check_run_id=9029035780

Closes #38341 from Yikun/upgrade-actions.

Authored-by: Yikun Jiang 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/notify_test_workflow.yml | 6 +++---
 .github/workflows/update_build_status.yml  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/.github/workflows/notify_test_workflow.yml 
b/.github/workflows/notify_test_workflow.yml
index c9b5c54f362..6fb776d7083 100644
--- a/.github/workflows/notify_test_workflow.yml
+++ b/.github/workflows/notify_test_workflow.yml
@@ -36,7 +36,7 @@ jobs:
   checks: write
 steps:
   - name: "Notify test workflow"
-uses: actions/github-script@f05a81df23035049204b043b50c3322045ce7eb3 # 
pin@v3
+uses: actions/github-script@v6
 with:
   github-token: ${{ secrets.GITHUB_TOKEN }}
   script: |
@@ -80,7 +80,7 @@ jobs:
   status = 'completed'
   const conclusion = 'action_required'
 
-  github.checks.create({
+  github.rest.checks.create({
 owner: context.repo.owner,
 repo: context.repo.repo,
 name: name,
@@ -132,7 +132,7 @@ jobs:
 + '/actions/runs/'
 + run_id
 
-  github.checks.create({
+  github.rest.checks.create({
 owner: context.repo.owner,
 repo: context.repo.repo,
 name: name,
diff --git a/.github/workflows/update_build_status.yml 
b/.github/workflows/update_build_status.yml
index 7f3826817df..05cf4914a25 100644
--- a/.github/workflows/update_build_status.yml
+++ b/.github/workflows/update_build_status.yml
@@ -32,7 +32,7 @@ jobs:
   checks: write
 steps:
   - name: "Update build status"
-uses: actions/github-script@f05a81df23035049204b043b50c3322045ce7eb3 # 
pin@v3
+uses: actions/github-script@v6
 with:
   github-token: ${{ secrets.GITHUB_TOKEN }}
   script: |


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (98f9edabb45 -> 6545a0873df)

2022-10-21 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 98f9edabb45 [SPARK-40796][CONNECT][FOLLOW-UP] Improve README for proto 
generated files in Connect Python client
 add 6545a0873df [SPARK-40796][CONNECT][DOC][FOLLOW-UP] Add check command 
in Readme

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (748fa2792e4 -> 98f9edabb45)

2022-10-21 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 748fa2792e4 [SPARK-40863][BUILD] Upgrade dropwizard metrics 4.2.12
 add 98f9edabb45 [SPARK-40796][CONNECT][FOLLOW-UP] Improve README for proto 
generated files in Connect Python client

No new revisions were added by this update.

Summary of changes:
 dev/check-codegen-python.py  | 4 +++-
 python/pyspark/sql/connect/README.md | 5 +
 2 files changed, 8 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40863][BUILD] Upgrade dropwizard metrics 4.2.12

2022-10-21 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 748fa2792e4 [SPARK-40863][BUILD] Upgrade dropwizard metrics 4.2.12
748fa2792e4 is described below

commit 748fa2792e488a6b923b32e2898d9bb6e16fb4ca
Author: yangjie01 
AuthorDate: Fri Oct 21 08:53:29 2022 -0500

[SPARK-40863][BUILD] Upgrade dropwizard metrics 4.2.12

### What changes were proposed in this pull request?
This pr aims upgrade dropwizard metrics from 4.2.10 to 4.2.12.

### Why are the changes needed?
The release notes as follows:

- https://github.com/dropwizard/metrics/releases/tag/v4.2.11
- https://github.com/dropwizard/metrics/releases/tag/v4.2.12

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

Closes #38328 from LuciferYang/metrics-4212.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 10 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 10 +-
 pom.xml   |  2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index b7850d2fe60..1d1061aaadb 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -195,11 +195,11 @@ log4j-slf4j2-impl/2.19.0//log4j-slf4j2-impl-2.19.0.jar
 logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
 lz4-java/1.8.0//lz4-java-1.8.0.jar
 mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar
-metrics-core/4.2.10//metrics-core-4.2.10.jar
-metrics-graphite/4.2.10//metrics-graphite-4.2.10.jar
-metrics-jmx/4.2.10//metrics-jmx-4.2.10.jar
-metrics-json/4.2.10//metrics-json-4.2.10.jar
-metrics-jvm/4.2.10//metrics-jvm-4.2.10.jar
+metrics-core/4.2.12//metrics-core-4.2.12.jar
+metrics-graphite/4.2.12//metrics-graphite-4.2.12.jar
+metrics-jmx/4.2.12//metrics-jmx-4.2.12.jar
+metrics-json/4.2.12//metrics-json-4.2.12.jar
+metrics-jvm/4.2.12//metrics-jvm-4.2.12.jar
 minlog/1.3.0//minlog-1.3.0.jar
 netty-all/4.1.84.Final//netty-all-4.1.84.Final.jar
 netty-buffer/4.1.84.Final//netty-buffer-4.1.84.Final.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 0f497c99ff9..39a0e617058 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -179,11 +179,11 @@ log4j-slf4j2-impl/2.19.0//log4j-slf4j2-impl-2.19.0.jar
 logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
 lz4-java/1.8.0//lz4-java-1.8.0.jar
 mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar
-metrics-core/4.2.10//metrics-core-4.2.10.jar
-metrics-graphite/4.2.10//metrics-graphite-4.2.10.jar
-metrics-jmx/4.2.10//metrics-jmx-4.2.10.jar
-metrics-json/4.2.10//metrics-json-4.2.10.jar
-metrics-jvm/4.2.10//metrics-jvm-4.2.10.jar
+metrics-core/4.2.12//metrics-core-4.2.12.jar
+metrics-graphite/4.2.12//metrics-graphite-4.2.12.jar
+metrics-jmx/4.2.12//metrics-jmx-4.2.12.jar
+metrics-json/4.2.12//metrics-json-4.2.12.jar
+metrics-jvm/4.2.12//metrics-jvm-4.2.12.jar
 minlog/1.3.0//minlog-1.3.0.jar
 netty-all/4.1.84.Final//netty-all-4.1.84.Final.jar
 netty-buffer/4.1.84.Final//netty-buffer-4.1.84.Final.jar
diff --git a/pom.xml b/pom.xml
index f8f3aa2fd4f..d933c1c6f6d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -145,7 +145,7 @@
 If you changes codahale.metrics.version, you also need to change
 the link to metrics.dropwizard.io in docs/monitoring.md.
 -->
-4.2.10
+4.2.12
 
 1.11.1
 1.12.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40865][BUILD] Upgrade jodatime to 2.12.0

2022-10-21 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9b7c9051930 [SPARK-40865][BUILD] Upgrade jodatime to 2.12.0
9b7c9051930 is described below

commit 9b7c90519307eb40b6eaa641d98c894915b1bcdc
Author: yangjie01 
AuthorDate: Fri Oct 21 08:52:41 2022 -0500

[SPARK-40865][BUILD] Upgrade jodatime to 2.12.0

### What changes were proposed in this pull request?
This pr aims upgrade  jodatime to 2.12.0.

### Why are the changes needed?
This version includes:

- Add translations for ca, el, eu, fi, hi, hu, in, iw, ms, nn, ro, sk, sv, 
zh.
- DateTimeZone data updated to version 2022egtz.

The release notes as following:

- https://www.joda.org/joda-time/changes-report.html#a2.12.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #38329 from LuciferYang/joda-212.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index ee9977e2592..b7850d2fe60 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -147,7 +147,7 @@ jetty-util/6.1.26//jetty-util-6.1.26.jar
 jetty-util/9.4.49.v20220914//jetty-util-9.4.49.v20220914.jar
 jetty/6.1.26//jetty-6.1.26.jar
 jline/2.14.6//jline-2.14.6.jar
-joda-time/2.11.2//joda-time-2.11.2.jar
+joda-time/2.12.0//joda-time-2.12.0.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
 jpam/1.1//jpam-1.1.jar
 json/1.8//json-1.8.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 766a28503e4..0f497c99ff9 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -132,7 +132,7 @@ jettison/1.1//jettison-1.1.jar
 jetty-util-ajax/9.4.49.v20220914//jetty-util-ajax-9.4.49.v20220914.jar
 jetty-util/9.4.49.v20220914//jetty-util-9.4.49.v20220914.jar
 jline/2.14.6//jline-2.14.6.jar
-joda-time/2.11.2//joda-time-2.11.2.jar
+joda-time/2.12.0//joda-time-2.12.0.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
 jpam/1.1//jpam-1.1.jar
 json/1.8//json-1.8.jar
diff --git a/pom.xml b/pom.xml
index 65dfcdb2234..f8f3aa2fd4f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -192,7 +192,7 @@
 14.0.1
 3.1.7
 2.36
-2.11.2
+2.12.0
 3.5.2
 3.0.0
 0.12.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (aea9fb74ca6 -> 26e258c88a5)

2022-10-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from aea9fb74ca6 [MINOR][CORE][SQL][FOLLOWUP] Add missing s prefix to 
enable string interpolation
 add 26e258c88a5 [SPARK-40854][CONNECT] Use proper JSON encoding until we 
have Arrow collection

No new revisions were added by this update.

Summary of changes:
 .../src/main/protobuf/spark/connect/base.proto |  9 ++-
 .../service/SparkConnectStreamHandler.scala| 76 ++
 python/pyspark/sql/connect/client.py   |  4 +-
 python/pyspark/sql/connect/proto/base_pb2.py   | 36 +-
 python/pyspark/sql/connect/proto/base_pb2.pyi  | 27 
 5 files changed, 104 insertions(+), 48 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR][CORE][SQL][FOLLOWUP] Add missing s prefix to enable string interpolation

2022-10-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aea9fb74ca6 [MINOR][CORE][SQL][FOLLOWUP] Add missing s prefix to 
enable string interpolation
aea9fb74ca6 is described below

commit aea9fb74ca6bc91604b32696a5638e5c93933d1f
Author: Enrico Minack 
AuthorDate: Fri Oct 21 20:20:04 2022 +0900

[MINOR][CORE][SQL][FOLLOWUP] Add missing s prefix to enable string 
interpolation

### What changes were proposed in this pull request?
Adds missing `s` prefix to enable string interpolation. Complements #38297.

### Why are the changes needed?
Strings will not contain substituted values but variable names.

### Does this PR introduce _any_ user-facing change?
Log messages will change.

### How was this patch tested?
Not tested.

Closes #38307 from EnricoMi/branch-fix-string-interpolation-2.

Authored-by: Enrico Minack 
Signed-off-by: Hyukjin Kwon 
---
 core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala | 2 +-
 .../main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +-
 .../apache/spark/executor/CoarseGrainedExecutorBackendSuite.scala | 4 ++--
 .../spark/ml/regression/GeneralizedLinearRegressionSuite.scala| 2 +-
 .../scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala | 8 
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
index b6d6441925a..4efce34b18c 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@@ -1883,7 +1883,7 @@ private[spark] class DAGScheduler(
   if (ignoreStageFailure) {
 logInfo(s"Ignoring fetch failure from $task of $failedStage 
attempt " +
   s"${task.stageAttemptId} when count 
spark.stage.maxConsecutiveAttempts " +
-  "as executor ${bmAddress.executorId} is decommissioned and " +
+  s"as executor ${bmAddress.executorId} is decommissioned and " +
   s" ${config.STAGE_IGNORE_DECOMMISSION_FETCH_FAILURE.key}=true")
   } else {
 failedStage.failedAttemptIds.add(task.stageAttemptId)
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
index 5004262a71c..1eb588124a7 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
@@ -641,7 +641,7 @@ private[spark] class TaskSchedulerImpl(
 if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) {
   logInfo("Notifying ExecutorAllocationManager to allocate 
more executors to" +
 " schedule the unschedulable task before aborting" +
-" stage ${taskSet.stageId}.")
+s" stage ${taskSet.stageId}.")
   
dagScheduler.unschedulableTaskSetAdded(taskSet.taskSet.stageId,
 taskSet.taskSet.stageAttemptId)
   
updateUnschedulableTaskSetTimeoutAndStartAbortTimer(taskSet, taskIndex)
diff --git 
a/core/src/test/scala/org/apache/spark/executor/CoarseGrainedExecutorBackendSuite.scala
 
b/core/src/test/scala/org/apache/spark/executor/CoarseGrainedExecutorBackendSuite.scala
index a12b7034a6d..a8b1304b76f 100644
--- 
a/core/src/test/scala/org/apache/spark/executor/CoarseGrainedExecutorBackendSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/executor/CoarseGrainedExecutorBackendSuite.scala
@@ -395,7 +395,7 @@ class CoarseGrainedExecutorBackendSuite extends 
SparkFunSuite
 
   // Fake tasks with different taskIds.
   val taskDescriptions = (1 to numTasks).map {
-taskId => new TaskDescription(taskId, 2, "1", "TASK ${taskId}", 19,
+taskId => new TaskDescription(taskId, 2, "1", s"TASK $taskId", 19,
   1, mutable.Map.empty, mutable.Map.empty, mutable.Map.empty, new 
Properties, 1,
   Map(GPU -> new ResourceInformation(GPU, Array("0", "1"))), data)
   }
@@ -483,7 +483,7 @@ class CoarseGrainedExecutorBackendSuite extends 
SparkFunSuite
 
   // Fake tasks with different taskIds.
   val taskDescriptions = (1 to numTasks).map {
-taskId => new TaskDescription(taskId, 2, "1", "TASK ${taskId}", 19,
+taskId => new TaskDescription(taskId, 2, "1", s"TASK $taskId", 19,
   1, mutable.Map.empty, mutable.Map.empty, mutable.Map.empty, new 
Properties, 1,
   Map(GPU -> new ResourceInformation(GPU, Array("0", "1"))), data)
   }
diff --git 

[spark-docker] branch master updated: [SPARK-40864] Remove pip/setuptools dynamic upgrade

2022-10-21 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 52e5856  [SPARK-40864] Remove pip/setuptools dynamic upgrade
52e5856 is described below

commit 52e5856d81e70a9d9e87292c6caf42587ce433df
Author: Yikun Jiang 
AuthorDate: Fri Oct 21 17:02:54 2022 +0800

[SPARK-40864] Remove pip/setuptools dynamic upgrade

### What changes were proposed in this pull request?
Remove pip/setuptools dynamic upgrade in dockerfile

### Why are the changes needed?
According to [official image 
suggestion](https://github.com/docker-library/official-images#repeatability), 
`Rebuilding the same Dockerfile should result in the same version of the image 
being packaged`.

But we used to upgrade pip/setuptools to latest, actually we don't need a 
latest pip/setuptools for any reason I can think out. I also take a look on 
[initial 
commits](https://github.com/apache-spark-on-k8s/spark/commit/befcf0a30651d0335bb57c242a824e43748db33f)
 for this line, according merge history no more reason for it.

### Does this PR introduce _any_ user-facing change?
The OS recommand pip/setuptools version is used.

### How was this patch tested?

CI passed.

Closes #17 from Yikun/remove-pip.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 1 -
 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile   | 1 -
 Dockerfile.template| 1 -
 3 files changed, 3 deletions(-)

diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
index ac16bdd..8c2761e 100644
--- a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile
@@ -26,7 +26,6 @@ RUN set -ex && \
 ln -s /lib /lib64 && \
 apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu && \
 apt install -y python3 python3-pip && \
-pip3 install --upgrade pip setuptools && \
 apt install -y r-base r-base-dev && \
 mkdir -p /opt/spark && \
 mkdir /opt/spark/python && \
diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile 
b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
index c6e433d..6a0017a 100644
--- a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
+++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile
@@ -26,7 +26,6 @@ RUN set -ex && \
 ln -s /lib /lib64 && \
 apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu && \
 apt install -y python3 python3-pip && \
-pip3 install --upgrade pip setuptools && \
 mkdir -p /opt/spark && \
 mkdir /opt/spark/python && \
 mkdir -p /opt/spark/examples && \
diff --git a/Dockerfile.template b/Dockerfile.template
index 2b90fe5..a220247 100644
--- a/Dockerfile.template
+++ b/Dockerfile.template
@@ -27,7 +27,6 @@ RUN set -ex && \
 apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user 
libnss3 procps net-tools gosu && \
 {%- if HAVE_PY %}
 apt install -y python3 python3-pip && \
-pip3 install --upgrade pip setuptools && \
 {%- endif %}
 {%- if HAVE_R %}
 apt install -y r-base r-base-dev && \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40799][BUILD][CONNECT][FOLLOW-UP] Keep the console output consistent of lint-scala script as was

2022-10-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 140c99c69dd [SPARK-40799][BUILD][CONNECT][FOLLOW-UP] Keep the console 
output consistent of lint-scala script as was
140c99c69dd is described below

commit 140c99c69dd2516726552019afb945f9b06a8c1b
Author: Hyukjin Kwon 
AuthorDate: Fri Oct 21 17:38:37 2022 +0900

[SPARK-40799][BUILD][CONNECT][FOLLOW-UP] Keep the console output consistent 
of lint-scala script as was

### What changes were proposed in this pull request?

This PR proposes to keep `dev/lint-scala` quiet as was.

### Why are the changes needed?

To remove noisy output from the `dev/lint-scala` script.

**Before**

Success

```
Scalastyle checks passed.
Using `mvn` from path: /.../spark/build/apache-maven-3.8.6/bin/mvn
[INFO] Scanning for projects...
[INFO] 

[INFO] Detecting the operating system and CPU architecture
[INFO] 

[INFO] os.detected.name: osx
[INFO] os.detected.arch: x86_64
[INFO] os.detected.version: 10.16
[INFO] os.detected.version.major: 10
[INFO] os.detected.version.minor: 16
[INFO] os.detected.classifier: osx-x86_64
[INFO]
[INFO] < org.apache.spark:spark-connect_2.12 
>-
[INFO] Building Spark Project Connect 3.4.0-SNAPSHOT
[INFO] [ jar 
]-
[INFO]
[INFO] --- mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli)  
spark-connect_2.12 ---
[INFO] parsed config (v3.5.9): dev/.scalafmt.conf
[INFO] Scalafmt results: 0 of 11 were unformatted
Details:
Formatted: Connect.scala
Formatted: DataTypeProtoConverter.scala
Formatted: SparkConnectPlanner.scala
Formatted: SparkConnectPlugin.scala
Formatted: SparkConnectCommandPlanner.scala
Formatted: SparkConnectStreamHandler.scala
Formatted: SparkConnectService.scala
Formatted: package.scala
Formatted: SparkConnectProtoSuite.scala
Formatted: SparkConnectPlannerSuite.scala
Formatted: SparkConnectCommandPlannerSuite.scala

[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time:  5.257 s
[INFO] Finished at: 2022-10-21T11:18:19+09:00
[INFO] 


```

Failure

```
Scalastyle checks passed.
Using `mvn` from path: 
/Users/hyukjin.kwon/workspace/forked/spark/build/apache-maven-3.8.6/bin/mvn
[INFO] Scanning for projects...
[INFO] 

[INFO] Detecting the operating system and CPU architecture
[INFO] 

[INFO] os.detected.name: osx
[INFO] os.detected.arch: x86_64
[INFO] os.detected.version: 10.16
[INFO] os.detected.version.major: 10
[INFO] os.detected.version.minor: 16
[INFO] os.detected.classifier: osx-x86_64
[INFO]
[INFO] < org.apache.spark:spark-connect_2.12 
>-
[INFO] Building Spark Project Connect 3.4.0-SNAPSHOT
[INFO] [ jar 
]-
[INFO]
[INFO] --- mvn-scalafmt_2.12:1.1.1640084764.9f463a9:format (default-cli)  
spark-connect_2.12 ---
[INFO] parsed config (v3.5.9): dev/.scalafmt.conf
[INFO] Scalafmt results: 0 of 11 were unformatted
Details:
Formatted: Connect.scala
Formatted: DataTypeProtoConverter.scala
Formatted: SparkConnectPlanner.scala
Formatted: SparkConnectPlugin.scala
Formatted: SparkConnectCommandPlanner.scala
Formatted: SparkConnectStreamHandler.scala
Formatted: SparkConnectService.scala
Formatted: package.scala
Formatted: SparkConnectProtoSuite.scala
Formatted: SparkConnectPlannerSuite.scala
Formatted: SparkConnectCommandPlannerSuite.scala

[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time:  5.257 s
[INFO] Finished at: 2022-10-21T11:18:19+09:00
[INFO] 

(python3.9) ➜  spark git:(master) ./dev/lint-scala
Scalastyle checks passed.
Using `mvn` from path: 
/Users/hyukjin.kwon/workspace/forked/spark/build/apache-maven-3.8.6/bin/mvn
[INFO] 

[spark] branch master updated: [SPARK-40839][CONNECT][PYTHON] Implement `DataFrame.sample`

2022-10-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7934f00d124 [SPARK-40839][CONNECT][PYTHON] Implement `DataFrame.sample`
7934f00d124 is described below

commit 7934f00d1241431dd59207650693aaad1a319a70
Author: Ruifeng Zheng 
AuthorDate: Fri Oct 21 17:18:34 2022 +0900

[SPARK-40839][CONNECT][PYTHON] Implement `DataFrame.sample`

### What changes were proposed in this pull request?
Implement `DataFrame.sample` in Connect

### Why are the changes needed?
for DataFrame API coverage

### Does this PR introduce _any_ user-facing change?
Yes, new API

```
def sample(
self,
fraction: float,
*,
withReplacement: bool = False,
seed: Optional[int] = None,
) -> "DataFrame":
```

### How was this patch tested?
added UT

Closes #38310 from zhengruifeng/connect_df_sample.

Authored-by: Ruifeng Zheng 
Signed-off-by: Hyukjin Kwon 
---
 .../main/protobuf/spark/connect/relations.proto|  6 ++-
 .../org/apache/spark/sql/connect/dsl/package.scala |  3 +-
 .../sql/connect/planner/SparkConnectPlanner.scala  |  5 ++-
 python/pyspark/sql/connect/dataframe.py| 27 
 python/pyspark/sql/connect/plan.py | 50 ++
 python/pyspark/sql/connect/proto/relations_pb2.py  |  6 ++-
 python/pyspark/sql/connect/proto/relations_pb2.pyi | 19 ++--
 .../sql/tests/connect/test_connect_plan_only.py| 18 
 8 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/connector/connect/src/main/protobuf/spark/connect/relations.proto 
b/connector/connect/src/main/protobuf/spark/connect/relations.proto
index 6adf0831ea2..7dbde775ee8 100644
--- a/connector/connect/src/main/protobuf/spark/connect/relations.proto
+++ b/connector/connect/src/main/protobuf/spark/connect/relations.proto
@@ -201,5 +201,9 @@ message Sample {
   double lower_bound = 2;
   double upper_bound = 3;
   bool with_replacement = 4;
-  int64 seed = 5;
+  Seed seed = 5;
+
+  message Seed {
+int64 seed = 1;
+  }
 }
diff --git 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
index 68bbc0487f9..4630c86049c 100644
--- 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
+++ 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
@@ -272,7 +272,8 @@ package object dsl {
   .setUpperBound(upperBound)
   .setLowerBound(lowerBound)
   .setWithReplacement(withReplacement)
-  .setSeed(seed))
+  .setSeed(proto.Sample.Seed.newBuilder().setSeed(seed).build())
+  .build())
   .build()
   }
 
diff --git 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 92c8bf01cba..880618cc333 100644
--- 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -31,6 +31,7 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{Deduplicate, LogicalPlan, Sa
 import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
 import org.apache.spark.sql.execution.QueryExecution
 import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
 
 final case class InvalidPlanInput(
 private val message: String = "",
@@ -80,7 +81,7 @@ class SparkConnectPlanner(plan: proto.Relation, session: 
SparkSession) {
 
   /**
* All fields of [[proto.Sample]] are optional. However, given those are 
proto primitive types,
-   * we cannot differentiate if the fied is not or set when the field's value 
equals to the type
+   * we cannot differentiate if the field is not or set when the field's value 
equals to the type
* default value. In the future if this ever become a problem, one solution 
could be that to
* wrap such fields into proto messages.
*/
@@ -89,7 +90,7 @@ class SparkConnectPlanner(plan: proto.Relation, session: 
SparkSession) {
   rel.getLowerBound,
   rel.getUpperBound,
   rel.getWithReplacement,
-  rel.getSeed,
+  if (rel.hasSeed) rel.getSeed.getSeed else Utils.random.nextLong,
   transformRelation(rel.getInput))
   }
 
diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index 5ca747fdd6a..eabcf433ae9 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -206,6 +206,33 @@ class DataFrame(object):
  

[spark-docker] branch master updated: [SPARK-40866][INFRA] Rename Spark repository as Spark Docker repository in GA

2022-10-21 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 6f56ef1  [SPARK-40866][INFRA] Rename Spark repository as Spark Docker 
repository in GA
6f56ef1 is described below

commit 6f56ef1c8c8bccd05069d4590f7ae084d4c72b4d
Author: Qian.Sun 
AuthorDate: Fri Oct 21 16:02:50 2022 +0800

[SPARK-40866][INFRA] Rename Spark repository as Spark Docker repository in 
GA

### What changes were proposed in this pull request?

This PR aim to rename `Spark repository` as `Spark Docker repository` in 
GA, discussion as 
https://github.com/apache/spark-docker/pull/15#discussion_r1001440707

### Why are the changes needed?

Actually repository is apache/spark-docker.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass the GA

Closes #18 from dcoliversun/SPARK-40866.

Authored-by: Qian.Sun 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/main.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index b47245b..08bba68 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -60,7 +60,7 @@ jobs:
   - ${{ inputs.java }}
 image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
 steps:
-  - name: Checkout Spark repository
+  - name: Checkout Spark Docker repository
 uses: actions/checkout@v2
 
   - name: Set up QEMU


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-21 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fd9e5760bae [SPARK-40657] Add support for Java classes in Protobuf 
functions
fd9e5760bae is described below

commit fd9e5760bae847f47c9c108f0e58814748e0d9b1
Author: Raghu Angadi 
AuthorDate: Fri Oct 21 15:46:50 2022 +0900

[SPARK-40657] Add support for Java classes in Protobuf functions

### What changes were proposed in this pull request?

Adds support for compiled Java classes to Protobuf functions. This is 
tested with Protobuf v3 classes. V2 vs V3 issues will be handled in a separate 
PR. The main changes in this PR:

 - Changes to top level API:
- Adds new version that takes just the class name.
- Changes the order of arguments for existing API with descriptor files 
(`messageName` and `descFilePath` are swapped).
 - Protobuf utils methods to create descriptor from Java class name.
 - Many unit tests are update to check both versions : (1) with descriptor 
file and (2) with Java class name.
 - Maven build updates to generate Java classes to use in tests.
 - Miscellaneous changes:
- Adds `proto` to package name in `proto` files used in tests.
- A few TODO comments about improvements

### Why are the changes needed?
Java compiled classes is a common method for users to provide Protobuf 
definitions.

### Does this PR introduce _any_ user-facing change?
No.
This updates interface, but for a new feature in active development.

### How was this patch tested?
 - Unit tests

Closes #38286 from rangadi/protobuf-java.

Authored-by: Raghu Angadi 
Signed-off-by: Jungtaek Lim 
---
 connector/protobuf/pom.xml |  23 +-
 .../sql/protobuf/CatalystDataToProtobuf.scala  |  10 +-
 .../sql/protobuf/ProtobufDataToCatalyst.scala  |  34 ++-
 .../org/apache/spark/sql/protobuf/functions.scala  |  58 +++-
 .../spark/sql/protobuf/utils/ProtobufUtils.scala   |  65 -
 .../sql/protobuf/utils/SchemaConverters.scala  |   4 +
 .../test/resources/protobuf/catalyst_types.proto   |   4 +-
 .../test/resources/protobuf/functions_suite.proto  |   4 +-
 .../src/test/resources/protobuf/serde_suite.proto  |   6 +-
 .../ProtobufCatalystDataConversionSuite.scala  |  97 +--
 .../sql/protobuf/ProtobufFunctionsSuite.scala  | 318 +
 .../spark/sql/protobuf/ProtobufSerdeSuite.scala|   9 +-
 project/SparkBuild.scala   |   6 +-
 python/pyspark/sql/protobuf/functions.py   |  22 +-
 14 files changed, 437 insertions(+), 223 deletions(-)

diff --git a/connector/protobuf/pom.xml b/connector/protobuf/pom.xml
index 0515f128b8d..b934c7f831a 100644
--- a/connector/protobuf/pom.xml
+++ b/connector/protobuf/pom.xml
@@ -83,7 +83,6 @@
   ${protobuf.version}
   compile
 
-
   
   
 
target/scala-${scala.binary.version}/classes
@@ -110,6 +109,28 @@
   
 
   
+  
+com.github.os72
+protoc-jar-maven-plugin
+3.11.4
+
+
+  
+generate-test-sources
+
+  run
+
+
+  
com.google.protobuf:protoc:${protobuf.version}
+  ${protobuf.version}
+  
+src/test/resources/protobuf
+  
+  test
+
+  
+
+  
 
   
 
diff --git 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala
 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala
index 145100268c2..b9f7907ea8c 100644
--- 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala
+++ 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala
@@ -25,17 +25,17 @@ import org.apache.spark.sql.types.{BinaryType, DataType}
 
 private[protobuf] case class CatalystDataToProtobuf(
 child: Expression,
-descFilePath: String,
-messageName: String)
+messageName: String,
+descFilePath: Option[String] = None)
 extends UnaryExpression {
 
   override def dataType: DataType = BinaryType
 
-  @transient private lazy val protoType =
-ProtobufUtils.buildDescriptor(descFilePath, messageName)
+  @transient private lazy val protoDescriptor =
+ProtobufUtils.buildDescriptor(messageName, descFilePathOpt = descFilePath)
 
   @transient private lazy val serializer =
-new ProtobufSerializer(child.dataType, protoType, child.nullable)
+new ProtobufSerializer(child.dataType, protoDescriptor, child.nullable)
 
   override def nullSafeEval(input: Any): Any = {
 val dynamicMessage =