(spark) branch master updated: [SPARK-46900][BUILD] Upgrade slf4j to 2.0.11

2024-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a368280708dd [SPARK-46900][BUILD] Upgrade slf4j to 2.0.11
a368280708dd is described below

commit a368280708dd3c6eb90bd3b09a36a68bdd096222
Author: yangjie01 
AuthorDate: Sun Jan 28 23:42:37 2024 -0800

[SPARK-46900][BUILD] Upgrade slf4j to 2.0.11

### What changes were proposed in this pull request?
This pr aims to upgrade slf4j from 2.0.10 to 2.0.11

### Why are the changes needed?
This release reinstates the `renderLevel()` method in SimpleLogger which 
was removed by mistake.

The full release notes as follows:
- https://www.slf4j.org/news.html#2.0.11

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44928 from LuciferYang/SPARK-46900.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++---
 pom.xml   | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 09291de50350..06fb4d879db2 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -123,7 +123,7 @@ javassist/3.29.2-GA//javassist-3.29.2-GA.jar
 javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
 javolution/5.5.1//javolution-5.5.1.jar
 jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
-jcl-over-slf4j/2.0.10//jcl-over-slf4j-2.0.10.jar
+jcl-over-slf4j/2.0.11//jcl-over-slf4j-2.0.11.jar
 jdo-api/3.0.1//jdo-api-3.0.1.jar
 jdom2/2.0.6//jdom2-2.0.6.jar
 jersey-client/2.41//jersey-client-2.41.jar
@@ -148,7 +148,7 @@ 
json4s-jackson_2.13/3.7.0-M11//json4s-jackson_2.13-3.7.0-M11.jar
 json4s-scalap_2.13/3.7.0-M11//json4s-scalap_2.13-3.7.0-M11.jar
 jsr305/3.0.0//jsr305-3.0.0.jar
 jta/1.1//jta-1.1.jar
-jul-to-slf4j/2.0.10//jul-to-slf4j-2.0.10.jar
+jul-to-slf4j/2.0.11//jul-to-slf4j-2.0.11.jar
 kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
 kubernetes-client-api/6.10.0//kubernetes-client-api-6.10.0.jar
 kubernetes-client/6.10.0//kubernetes-client-6.10.0.jar
@@ -247,7 +247,7 @@ 
scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar
 scala-parser-combinators_2.13/2.3.0//scala-parser-combinators_2.13-2.3.0.jar
 scala-reflect/2.13.12//scala-reflect-2.13.12.jar
 scala-xml_2.13/2.2.0//scala-xml_2.13-2.2.0.jar
-slf4j-api/2.0.10//slf4j-api-2.0.10.jar
+slf4j-api/2.0.11//slf4j-api-2.0.11.jar
 snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar
 snakeyaml/2.2//snakeyaml-2.2.jar
 snappy-java/1.1.10.5//snappy-java-1.1.10.5.jar
diff --git a/pom.xml b/pom.xml
index a5f2b6f74b7a..b78f49499feb 100644
--- a/pom.xml
+++ b/pom.xml
@@ -119,7 +119,7 @@
 3.1.0
 spark
 9.6
-2.0.10
+2.0.11
 2.22.1
 
 3.3.6


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46901][PYTHON] Upgrade `pyarrow` to 15.0.0

2024-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 487cbc086a30 [SPARK-46901][PYTHON] Upgrade `pyarrow` to 15.0.0
487cbc086a30 is described below

commit 487cbc086a30ec4d58695336acbe8037a3d5ebe7
Author: Ruifeng Zheng 
AuthorDate: Sun Jan 28 23:41:49 2024 -0800

[SPARK-46901][PYTHON] Upgrade `pyarrow` to 15.0.0

### What changes were proposed in this pull request?
Upgrade `pyarrow` to 15.0.0

### Why are the changes needed?
to support latest pyarrow

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44924 from zhengruifeng/py_arrow_15.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 976f94251d7a..fc515d4478ad 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -94,7 +94,7 @@ RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
 RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage 
matplotlib lxml
 
 
-ARG BASIC_PIP_PKGS="numpy pyarrow>=14.0.0 six==1.16.0 pandas<=2.1.4 scipy 
plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 
scikit-learn>=1.3.2"
+ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas<=2.1.4 scipy 
plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 
scikit-learn>=1.3.2"
 # Python deps for Spark Connect
 ARG CONNECT_PIP_PKGS="grpcio==1.59.3 grpcio-status==1.59.3 protobuf==4.25.1 
googleapis-common-protos==1.56.4"
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46721][CORE][TESTS] Make gpu fraction tests more robust

2024-01-28 Thread wuyi
This is an automated email from the ASF dual-hosted git repository.

wuyi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 90e6c0cf2ca1 [SPARK-46721][CORE][TESTS] Make gpu fraction tests more 
robust
90e6c0cf2ca1 is described below

commit 90e6c0cf2ca186d1a492af4dc995b8254aa77aae
Author: Bobby Wang 
AuthorDate: Mon Jan 29 14:59:52 2024 +0800

[SPARK-46721][CORE][TESTS] Make gpu fraction tests more robust

### What changes were proposed in this pull request?

When cherry-picking https://github.com/apache/spark/pull/43494 back to 
branch 3.5 https://github.com/apache/spark/pull/44690,
I ran into the issue that some tests for Scala 2.12 failed when comparing 
two maps. It turned out that the function 
[compareMaps](https://github.com/apache/spark/pull/43494/files#diff-f205431247dd9446f4ce941e5a4620af438c242b9bdff6e7faa7df0194db49acR129)
 is not so robust for scala 2.12 and scala 2.13.

- scala 2.13

``` scala
Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.9).
Type in expressions for evaluation. Or try :help.

scala> def compareMaps(lhs: Map[String, Double], rhs: Map[String, Double],
 |   eps: Double = 0.0001): Boolean = {
 | lhs.size == rhs.size &&
 |   lhs.zip(rhs).forall { case ((lName, lAmount), (rName, 
rAmount)) =>
 | lName == rName && (lAmount - rAmount).abs < eps
 |   }
 | }
 |
 | import scala.collection.mutable.HashMap
 | val resources = Map("gpu" -> Map("a" -> 1.0, "b" -> 2.0, "c" -> 3.0, 
"d"-> 4.0))
 | val mapped = resources.map { case (rName, addressAmounts) =>
 |  rName -> HashMap(addressAmounts.toSeq.sorted: _*)
 | }
 |
 | compareMaps(resources("gpu"), mapped("gpu").toMap)
def compareMaps(lhs: Map[String,Double], rhs: Map[String,Double], eps: 
Double): Boolean
import scala.collection.mutable.HashMap
val resources: 
scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Double]]
 = Map(gpu -> Map(a -> 1.0, b -> 2.0, c -> 3.0, d -> 4.0))
val mapped: 
scala.collection.immutable.Map[String,scala.collection.mutable.HashMap[String,Double]]
 = Map(gpu -> HashMap(a -> 1.0, b -> 2.0, c -> 3.0, d -> 4.0))
val res0: Boolean = true
```

- scala 2.12

``` scala
Welcome to Scala 2.12.14 (OpenJDK 64-Bit Server VM, Java 17.0.9).
Type in expressions for evaluation. Or try :help.

scala> def compareMaps(lhs: Map[String, Double], rhs: Map[String, Double],
 |   eps: Double = 0.0001): Boolean = {
 | lhs.size == rhs.size &&
 |   lhs.zip(rhs).forall { case ((lName, lAmount), (rName, 
rAmount)) =>
 | lName == rName && (lAmount - rAmount).abs < eps
 |   }
 | }
compareMaps: (lhs: Map[String,Double], rhs: Map[String,Double], eps: 
Double)Boolean

scala> import scala.collection.mutable.HashMap
import scala.collection.mutable.HashMap

scala> val resources = Map("gpu" -> Map("a" -> 1.0, "b" -> 2.0, "c" -> 3.0, 
"d"-> 4.0))
resources: 
scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Double]]
 = Map(gpu -> Map(a -> 1.0, b -> 2.0, c -> 3.0, d -> 4.0))

scala> val mapped = resources.map { case (rName, addressAmounts) =>
 |   rName -> HashMap(addressAmounts.toSeq.sorted: _*)
 | }
mapped: 
scala.collection.immutable.Map[String,scala.collection.mutable.HashMap[String,Double]]
 = Map(gpu -> Map(b -> 2.0, d -> 4.0, a -> 1.0, c -> 3.0))

scala> compareMaps(resources("gpu"), mapped("gpu").toMap)
res0: Boolean = false
```

The same code bug got different results for Scala 2.12 and Scala 2.13.  
This PR tried to rework compareMaps to make tests pass for both scala 2.12 and 
scala 2.13

### Why are the changes needed?

Some users may back-port https://github.com/apache/spark/pull/43494  to 
some older branch for scala 2.12 and will run into the same issue. It's just 
trivial work to make the GPU fraction tests compatible with Scala 2.12 and 
Scala 2.13

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Make sure all the CI pipelines pass

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44735 from wbo4958/gpu-fraction-tests.

Authored-by: Bobby Wang 
Signed-off-by: Yi Wu 
---
 .../scheduler/ExecutorResourceInfoSuite.scala  | 10 +---
 .../spark/scheduler/ExecutorResourceUtils.scala| 28 ++
 .../scheduler/ExecutorResourcesAmountsSuite.scala  | 10 +---
 .../spark/scheduler/TaskSchedulerImplSuite.scala   | 22 +
 4 files changed, 42 insertion

(spark) branch master updated: [MINOR][DOCS] Remove Canonicalize in docs

2024-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 112f4acb6283 [MINOR][DOCS] Remove Canonicalize in docs
112f4acb6283 is described below

commit 112f4acb62834511c7a7fd56b4a3c14178c1ce02
Author: longfei.jiang <1251489...@qq.com>
AuthorDate: Mon Jan 29 15:48:41 2024 +0900

[MINOR][DOCS] Remove Canonicalize in docs

### What changes were proposed in this pull request?

Remove Canonicalize in docs

### Why are the changes needed?

In SPARK-40362 remove Canonicalize.scala, need update docs

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Just in the docs, no need to test.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44897 from jlfsdtc/docs_fix.

Lead-authored-by: longfei.jiang <1251489...@qq.com>
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../org/apache/spark/sql/catalyst/expressions/Expression.scala |  4 ++--
 .../apache/spark/sql/catalyst/expressions/ExpressionSet.scala  | 10 ++
 .../sql/catalyst/plans/logical/QueryPlanConstraints.scala  |  2 +-
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
index a3432716002a..817432879391 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
@@ -311,7 +311,7 @@ abstract class Expression extends TreeNode[Expression] {
* Returns true when two expressions will always compute the same result, 
even if they differ
* cosmetically (i.e. capitalization of names in attributes may be 
different).
*
-   * See [[Canonicalize]] for more details.
+   * See [[Expression#canonicalized]] for more details.
*/
   final def semanticEquals(other: Expression): Boolean =
 deterministic && other.deterministic && canonicalized == 
other.canonicalized
@@ -320,7 +320,7 @@ abstract class Expression extends TreeNode[Expression] {
* Returns a `hashCode` for the calculation performed by this expression. 
Unlike the standard
* `hashCode`, an attempt has been made to eliminate cosmetic differences.
*
-   * See [[Canonicalize]] for more details.
+   * See [[Expression#canonicalized]] for more details.
*/
   def semanticHash(): Int = canonicalized.hashCode()
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
index ba18b7a2b86c..1aa9f006463c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
@@ -21,7 +21,9 @@ import scala.collection.mutable
 import scala.collection.mutable.ArrayBuffer
 
 object ExpressionSet {
-  /** Constructs a new [[ExpressionSet]] by applying [[Canonicalize]] to 
`expressions`. */
+  /**
+   * Constructs a new [[ExpressionSet]] by applying 
[[Expression#canonicalized]] to `expressions`.
+   */
   def apply(expressions: IterableOnce[Expression]): ExpressionSet = {
 val set = new ExpressionSet()
 expressions.iterator.foreach(set.add)
@@ -36,7 +38,7 @@ object ExpressionSet {
 /**
  * A [[Set]] where membership is determined based on determinacy and a 
canonical representation of
  * an [[Expression]] (i.e. one that attempts to ignore cosmetic differences).
- * See [[Canonicalize]] for more details.
+ * See [[Expression#canonicalized]] for more details.
  *
  * Internally this set uses the canonical representation, but keeps also track 
of the original
  * expressions to ease debugging.  Since different expressions can share the 
same canonical
@@ -168,8 +170,8 @@ class ExpressionSet protected(
   override def clone(): ExpressionSet = new ExpressionSet(baseSet.clone(), 
originals.clone())
 
   /**
-   * Returns a string containing both the post [[Canonicalize]] expressions 
and the original
-   * expressions in this set.
+   * Returns a string containing both the post [[Expression#canonicalized]] 
expressions
+   * and the original expressions in this set.
*/
   def toDebugString: String =
 s"""
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
index 022fd7fff750..5769f006ccbc 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/ca

(spark) branch master updated: [SPARK-46899][CORE] Remove `POST` APIs from `MasterWebUI` when `spark.ui.killEnabled` is `false`

2024-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 95a4abd5b5bc [SPARK-46899][CORE] Remove `POST` APIs from `MasterWebUI` 
when `spark.ui.killEnabled` is `false`
95a4abd5b5bc is described below

commit 95a4abd5b5bcc36335be9af84b7bbddd7d0034ba
Author: Dongjoon Hyun 
AuthorDate: Sun Jan 28 22:38:32 2024 -0800

[SPARK-46899][CORE] Remove `POST` APIs from `MasterWebUI` when 
`spark.ui.killEnabled` is `false`

### What changes were proposed in this pull request?

This PR aims to remove `POST` APIs from `MasterWebUI` when 
`spark.ui.killEnabled` is false.

### Why are the changes needed?

If `spark.ui.killEnabled` is false, we don't need to attach `POST`-related 
redirect or servlet handlers from the beginning because it will be ignored in 
`MasterPage`.


https://github.com/apache/spark/blob/8cd0d1854da04334aff3188e4eca08a48f734579/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala#L64-L65

### Does this PR introduce _any_ user-facing change?

Previously, the user request is ignored silently after redirecting. Now, it 
will response with a correct HTTP error code, 405 `Method Not Allowed`.

### How was this patch tested?

Pass the CIs with newly added test suite, `ReadOnlyMasterWebUISuite`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44926 from dongjoon-hyun/SPARK-46899.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/deploy/master/ui/MasterWebUI.scala   | 46 ++---
 .../spark/deploy/master/ui/MasterWebUISuite.scala  |  9 ++-
 .../master/ui/ReadOnlyMasterWebUISuite.scala   | 75 ++
 3 files changed, 105 insertions(+), 25 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
index 3025c0bf468b..14ea6dbb3d20 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
@@ -54,31 +54,33 @@ class MasterWebUI(
 attachPage(new LogPage(this))
 attachPage(masterPage)
 addStaticHandler(MasterWebUI.STATIC_RESOURCE_DIR)
-attachHandler(createRedirectHandler(
-  "/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = 
Set("POST")))
-attachHandler(createRedirectHandler(
-  "/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = 
Set("POST")))
-attachHandler(createServletHandler("/workers/kill", new HttpServlet {
-  override def doPost(req: HttpServletRequest, resp: HttpServletResponse): 
Unit = {
-val hostnames: Seq[String] = Option(req.getParameterValues("host"))
-  .getOrElse(Array[String]()).toImmutableArraySeq
-if (decommissionDisabled || !isDecommissioningRequestAllowed(req)) {
-  resp.sendError(HttpServletResponse.SC_METHOD_NOT_ALLOWED)
-} else {
-  val removedWorkers = masterEndpointRef.askSync[Integer](
-DecommissionWorkersOnHosts(hostnames))
-  logInfo(s"Decommissioning of hosts $hostnames decommissioned 
$removedWorkers workers")
-  if (removedWorkers > 0) {
-resp.setStatus(HttpServletResponse.SC_OK)
-  } else if (removedWorkers == 0) {
-resp.sendError(HttpServletResponse.SC_NOT_FOUND)
+if (killEnabled) {
+  attachHandler(createRedirectHandler(
+"/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = 
Set("POST")))
+  attachHandler(createRedirectHandler(
+"/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = 
Set("POST")))
+  attachHandler(createServletHandler("/workers/kill", new HttpServlet {
+override def doPost(req: HttpServletRequest, resp: 
HttpServletResponse): Unit = {
+  val hostnames: Seq[String] = Option(req.getParameterValues("host"))
+.getOrElse(Array[String]()).toImmutableArraySeq
+  if (decommissionDisabled || !isDecommissioningRequestAllowed(req)) {
+resp.sendError(HttpServletResponse.SC_METHOD_NOT_ALLOWED)
   } else {
-// We shouldn't even see this case.
-resp.setStatus(HttpServletResponse.SC_INTERNAL_SERVER_ERROR)
+val removedWorkers = masterEndpointRef.askSync[Integer](
+  DecommissionWorkersOnHosts(hostnames))
+logInfo(s"Decommissioning of hosts $hostnames decommissioned 
$removedWorkers workers")
+if (removedWorkers > 0) {
+  resp.setStatus(HttpServletResponse.SC_OK)
+} else if (removedWorkers == 0) {
+  resp.sendError(HttpServletResponse.SC_NOT_FOUND)
+}

(spark) branch master updated: [SPARK-46898][CONNECT] Simplify the protobuf function transformation in Planner

2024-01-28 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 56633e697571 [SPARK-46898][CONNECT] Simplify the protobuf function 
transformation in Planner
56633e697571 is described below

commit 56633e69757174da8a7dd8f4ea5298fd0a00e656
Author: Ruifeng Zheng 
AuthorDate: Mon Jan 29 13:55:59 2024 +0800

[SPARK-46898][CONNECT] Simplify the protobuf function transformation in 
Planner

### What changes were proposed in this pull request?
Simplify the protobuf function transformation in Planner

### Why are the changes needed?
make `transformUnregisteredFunction` simple and reuse existing helper 
function

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44925 from zhengruifeng/connect_proto_simple.

Authored-by: Ruifeng Zheng 
Signed-off-by: yangjie01 
---
 .../sql/connect/planner/SparkConnectPlanner.scala  | 80 +++---
 1 file changed, 25 insertions(+), 55 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 3e59b2644755..977bff690bac 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -1710,53 +1710,6 @@ class SparkConnectPlanner(
*/
   private def transformUnregisteredFunction(
   fun: proto.Expression.UnresolvedFunction): Option[Expression] = {
-def extractArgsOfProtobufFunction(
-functionName: String,
-argumentsCount: Int,
-children: collection.Seq[Expression])
-: (String, Option[Array[Byte]], Map[String, String]) = {
-  val messageClassName = children(1) match {
-case Literal(s, StringType) if s != null => s.toString
-case other =>
-  throw InvalidPlanInput(
-s"MessageClassName in $functionName should be a literal string, 
but got $other")
-  }
-  val (binaryFileDescSetOpt, options) = if (argumentsCount == 2) {
-(None, Map.empty[String, String])
-  } else if (argumentsCount == 3) {
-children(2) match {
-  case Literal(b, BinaryType) if b != null =>
-(Some(b.asInstanceOf[Array[Byte]]), Map.empty[String, String])
-  case UnresolvedFunction(Seq("map"), arguments, _, _, _, _) =>
-(None, ExprUtils.convertToMapData(CreateMap(arguments)))
-  case other =>
-throw InvalidPlanInput(
-  s"The valid type for the 3rd arg in $functionName " +
-s"is binary or map, but got $other")
-}
-  } else if (argumentsCount == 4) {
-val fileDescSetOpt = children(2) match {
-  case Literal(b, BinaryType) if b != null =>
-Some(b.asInstanceOf[Array[Byte]])
-  case other =>
-throw InvalidPlanInput(
-  s"DescFilePath in $functionName should be a literal binary, but 
got $other")
-}
-val map = children(3) match {
-  case UnresolvedFunction(Seq("map"), arguments, _, _, _, _) =>
-ExprUtils.convertToMapData(CreateMap(arguments))
-  case other =>
-throw InvalidPlanInput(
-  s"Options in $functionName should be created by map, but got 
$other")
-}
-(fileDescSetOpt, map)
-  } else {
-throw InvalidPlanInput(
-  s"$functionName requires 2 ~ 4 arguments, but got $argumentsCount 
ones!")
-  }
-  (messageClassName, binaryFileDescSetOpt, options)
-}
-
 fun.getFunctionName match {
   case "product" if fun.getArgumentsCount == 1 =>
 Some(
@@ -1979,17 +1932,13 @@ class SparkConnectPlanner(
   // Protobuf-specific functions
   case "from_protobuf" if Seq(2, 3, 4).contains(fun.getArgumentsCount) =>
 val children = fun.getArgumentsList.asScala.map(transformExpression)
-val (messageClassName, binaryFileDescSetOpt, options) =
-  extractArgsOfProtobufFunction("from_protobuf", 
fun.getArgumentsCount, children)
-Some(
-  ProtobufDataToCatalyst(children.head, messageClassName, 
binaryFileDescSetOpt, options))
+val (msgName, desc, options) = extractProtobufArgs(children.toSeq)
+Some(ProtobufDataToCatalyst(children(0), msgName, desc, options))
 
   case "to_protobuf" if Seq(2, 3, 4).contains(fun.getArgumentsCount) =>
 val children = fun.getArgumentsList.asScala.map(transformExpression)
-val (messageClassName, bi

Re: [PR] Add instructions for running docker integration tests [spark-website]

2024-01-28 Thread via GitHub


yaooqinn commented on PR #499:
URL: https://github.com/apache/spark-website/pull/499#issuecomment-1913999403

   Thank you @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46897][PYTHON][DOCS] Refine docstring of `bit_and/bit_or/bit_xor`

2024-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5056a17919ac [SPARK-46897][PYTHON][DOCS] Refine docstring of 
`bit_and/bit_or/bit_xor`
5056a17919ac is described below

commit 5056a17919ac88d35475dd13ae4167e783f9504a
Author: yangjie01 
AuthorDate: Sun Jan 28 21:33:39 2024 -0800

[SPARK-46897][PYTHON][DOCS] Refine docstring of `bit_and/bit_or/bit_xor`

### What changes were proposed in this pull request?
This pr refine docstring of  `bit_and/bit_or/bit_xor` and add some new 
examples.

### Why are the changes needed?
To improve PySpark documentation

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44923 from LuciferYang/SPARK-46897.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/functions/builtin.py | 138 ++--
 1 file changed, 132 insertions(+), 6 deletions(-)

diff --git a/python/pyspark/sql/functions/builtin.py 
b/python/pyspark/sql/functions/builtin.py
index d3a94fe4b9e9..0932ac1c2843 100644
--- a/python/pyspark/sql/functions/builtin.py
+++ b/python/pyspark/sql/functions/builtin.py
@@ -3790,9 +3790,51 @@ def bit_and(col: "ColumnOrName") -> Column:
 
 Examples
 
+Example 1: Bitwise AND with all non-null values
+
+>>> from pyspark.sql import functions as sf
 >>> df = spark.createDataFrame([[1],[1],[2]], ["c"])
->>> df.select(bit_and("c")).first()
-Row(bit_and(c)=0)
+>>> df.select(sf.bit_and("c")).show()
++--+
+|bit_and(c)|
++--+
+| 0|
++--+
+
+Example 2: Bitwise AND with null values
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([[1],[None],[2]], ["c"])
+>>> df.select(sf.bit_and("c")).show()
++--+
+|bit_and(c)|
++--+
+| 0|
++--+
+
+Example 3: Bitwise AND with all null values
+
+>>> from pyspark.sql import functions as sf
+>>> from pyspark.sql.types import IntegerType, StructType, StructField
+>>> schema = StructType([StructField("c", IntegerType(), True)])
+>>> df = spark.createDataFrame([[None],[None],[None]], schema=schema)
+>>> df.select(sf.bit_and("c")).show()
++--+
+|bit_and(c)|
++--+
+|  NULL|
++--+
+
+Example 4: Bitwise AND with single input value
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([[5]], ["c"])
+>>> df.select(sf.bit_and("c")).show()
++--+
+|bit_and(c)|
++--+
+| 5|
++--+
 """
 return _invoke_function_over_columns("bit_and", col)
 
@@ -3816,9 +3858,51 @@ def bit_or(col: "ColumnOrName") -> Column:
 
 Examples
 
+Example 1: Bitwise OR with all non-null values
+
+>>> from pyspark.sql import functions as sf
 >>> df = spark.createDataFrame([[1],[1],[2]], ["c"])
->>> df.select(bit_or("c")).first()
-Row(bit_or(c)=3)
+>>> df.select(sf.bit_or("c")).show()
++-+
+|bit_or(c)|
++-+
+|3|
++-+
+
+Example 2: Bitwise OR with some null values
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([[1],[None],[2]], ["c"])
+>>> df.select(sf.bit_or("c")).show()
++-+
+|bit_or(c)|
++-+
+|3|
++-+
+
+Example 3: Bitwise OR with all null values
+
+>>> from pyspark.sql import functions as sf
+>>> from pyspark.sql.types import IntegerType, StructType, StructField
+>>> schema = StructType([StructField("c", IntegerType(), True)])
+>>> df = spark.createDataFrame([[None],[None],[None]], schema=schema)
+>>> df.select(sf.bit_or("c")).show()
++-+
+|bit_or(c)|
++-+
+| NULL|
++-+
+
+Example 4: Bitwise OR with single input value
+
+>>> from pyspark.sql import functions as sf
+>>> df = spark.createDataFrame([[5]], ["c"])
+>>> df.select(sf.bit_or("c")).show()
++-+
+|bit_or(c)|
++-+
+|5|
++-+
 """
 return _invoke_function_over_columns("bit_or", col)
 
@@ -3842,9 +3926,51 @@ def bit_xor(col: "ColumnOrName") -> Column:
 
 Examples
 
+Example 1: Bitwise XOR with all non-null values
+
+>>> from pyspark.sql import functions as sf
 >>> df = spark.createDataFrame([[1],[1],[2]], ["c"])
->>> df.select(bit_xor("c")).first()
-Row(bit_xor(c)=2)
+>>> df.select(sf.bit_xor("c")).show()
++--+
+|bit_xor(c)|
++-

(spark) branch master updated: [SPARK-46896][PS][TESTS] Clean up the imports in `pyspark.pandas.tests.{frame, series, groupby}.*`

2024-01-28 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8cd0d1854da0 [SPARK-46896][PS][TESTS] Clean up the imports in 
`pyspark.pandas.tests.{frame, series, groupby}.*`
8cd0d1854da0 is described below

commit 8cd0d1854da04334aff3188e4eca08a48f734579
Author: Ruifeng Zheng 
AuthorDate: Mon Jan 29 12:00:18 2024 +0800

[SPARK-46896][PS][TESTS] Clean up the imports in 
`pyspark.pandas.tests.{frame, series, groupby}.*`

### What changes were proposed in this pull request?
1, remove unused imports;
2, only define the test datasets once in the vanilla side, so that won't 
need to define it again in the parity tests;

### Why are the changes needed?
code clean up

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44922 from zhengruifeng/ps_test_frame_ser_cleanup.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 .../pyspark/pandas/tests/connect/frame/test_parity_attrs.py  | 11 ++-
 .../pyspark/pandas/tests/connect/frame/test_parity_axis.py   |  6 +-
 .../pandas/tests/connect/frame/test_parity_constructor.py|  4 +++-
 .../pandas/tests/connect/frame/test_parity_conversion.py |  9 -
 .../pandas/tests/connect/frame/test_parity_reindexing.py |  9 -
 .../pandas/tests/connect/frame/test_parity_reshaping.py  |  6 +-
 .../pyspark/pandas/tests/connect/frame/test_parity_spark.py  | 11 ++-
 .../pandas/tests/connect/frame/test_parity_time_series.py|  9 -
 .../pandas/tests/connect/frame/test_parity_truncate.py   | 11 ++-
 .../pandas/tests/connect/groupby/test_parity_aggregate.py|  4 +++-
 .../pandas/tests/connect/groupby/test_parity_apply_func.py   |  4 +++-
 .../pandas/tests/connect/groupby/test_parity_cumulative.py   |  4 +++-
 .../pandas/tests/connect/groupby/test_parity_describe.py |  4 +++-
 .../pandas/tests/connect/groupby/test_parity_groupby.py  |  5 -
 .../pandas/tests/connect/groupby/test_parity_head_tail.py|  4 +++-
 .../pandas/tests/connect/groupby/test_parity_index.py|  6 +-
 .../pandas/tests/connect/groupby/test_parity_missing_data.py |  4 +++-
 .../pandas/tests/connect/series/test_parity_all_any.py   |  6 +-
 .../pandas/tests/connect/series/test_parity_arg_ops.py   |  6 +-
 .../pyspark/pandas/tests/connect/series/test_parity_as_of.py |  6 +-
 .../pandas/tests/connect/series/test_parity_as_type.py   |  6 +-
 .../pandas/tests/connect/series/test_parity_compute.py   |  6 +-
 .../pandas/tests/connect/series/test_parity_conversion.py|  4 +++-
 .../pandas/tests/connect/series/test_parity_cumulative.py|  4 +++-
 .../pyspark/pandas/tests/connect/series/test_parity_index.py |  6 +-
 .../pandas/tests/connect/series/test_parity_missing_data.py  |  4 +++-
 .../pandas/tests/connect/series/test_parity_series.py|  6 +-
 .../pyspark/pandas/tests/connect/series/test_parity_sort.py  |  6 +-
 .../pyspark/pandas/tests/connect/series/test_parity_stat.py  |  6 +-
 .../tests/connect/series/test_parity_string_ops_adv.py   |  4 +++-
 .../tests/connect/series/test_parity_string_ops_basic.py |  4 +++-
 python/pyspark/pandas/tests/frame/test_attrs.py  | 12 ++--
 python/pyspark/pandas/tests/frame/test_axis.py   |  8 ++--
 python/pyspark/pandas/tests/frame/test_constructor.py|  8 ++--
 python/pyspark/pandas/tests/frame/test_conversion.py | 12 ++--
 python/pyspark/pandas/tests/frame/test_interpolate.py|  6 +-
 python/pyspark/pandas/tests/frame/test_reindexing.py |  8 ++--
 python/pyspark/pandas/tests/frame/test_reshaping.py  |  8 ++--
 python/pyspark/pandas/tests/frame/test_spark.py  | 12 ++--
 python/pyspark/pandas/tests/frame/test_time_series.py|  8 ++--
 python/pyspark/pandas/tests/frame/test_truncate.py   |  4 ++--
 python/pyspark/pandas/tests/groupby/test_aggregate.py|  8 ++--
 python/pyspark/pandas/tests/groupby/test_apply_func.py   |  8 ++--
 python/pyspark/pandas/tests/groupby/test_cumulative.py   |  8 ++--
 python/pyspark/pandas/tests/groupby/test_describe.py |  8 ++--
 python/pyspark/pandas/tests/groupby/test_groupby.py  | 12 +---
 python/pyspark/pandas/tests/groupby/test_grouping.py |  5 -
 python/pyspark/pandas/tests/groupby/test_head_tail.py|  8 ++--
 python/pyspark/pandas/tests/groupby/test_index.py|  8 ++--
 python/pyspark/pandas/tests/groupby/test_missing.py  |  5 -
 python/p

Re: [PR] Add instructions for running docker integration tests [spark-website]

2024-01-28 Thread via GitHub


yaooqinn commented on PR #499:
URL: https://github.com/apache/spark-website/pull/499#issuecomment-1913886172

   Thank you @srowen, merged to asf-stie


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Re: [PR] Add instructions for running docker integration tests [spark-website]

2024-01-28 Thread via GitHub


yaooqinn merged PR #499:
URL: https://github.com/apache/spark-website/pull/499


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-website) branch asf-site updated: Add instuctions for running docker integration tests (#499)

2024-01-28 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 9476ea428d Add instuctions for running docker integration tests (#499)
9476ea428d is described below

commit 9476ea428d8aec2c8f1fdf2252a28fb22e208930
Author: Kent Yao 
AuthorDate: Mon Jan 29 11:09:44 2024 +0800

Add instuctions for running docker integration tests (#499)
---
 developer-tools.md| 13 +
 site/developer-tools.html | 13 +
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/developer-tools.md b/developer-tools.md
index 34087a874c..bd0da296a7 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -11,9 +11,9 @@ navigation:
 
 Apache Spark community uses various resources to maintain the community test 
coverage.
 
-GitHub Action
+GitHub Actions
 
-[GitHub Action](https://github.com/apache/spark/actions) provides the 
following on Ubuntu 22.04.
+[GitHub Actions](https://github.com/apache/spark/actions) provides the 
following on Ubuntu 22.04.
 
 Apache Spark 4
 
@@ -204,11 +204,16 @@ Please check other available options via 
`python/run-tests[-with-coverage] --hel
 
 Testing K8S
 
-Although GitHub Action provide both K8s unit test and integration test 
coverage, you can run it locally. For example, Volcano batch scheduler 
integration test should be done manually. Please refer the integration test 
documentation for the detail.
+Although GitHub Actions provide both K8s unit test and integration test 
coverage, you can run it locally. For example, Volcano batch scheduler 
integration test should be done manually. Please refer the integration test 
documentation for the detail.
 
 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md)
 
-Testing with GitHub actions workflow
+Running the Docker integration tests
+
+Docker integration tests are covered by GitHub Actions. However, you can run 
it locally to speedup deveplopment and testing.
+Please refer the [Docker integration test 
documentation](https://github.com/apache/spark/blob/master/connector/docker-integration-tests/README.md)
 for the detail.
+
+Testing with GitHub Actions workflow
 
 Apache Spark leverages GitHub Actions that enables continuous integration and 
a wide range of automation. Apache Spark repository provides several GitHub 
Actions workflows for developers to run before creating a pull request.
 
diff --git a/site/developer-tools.html b/site/developer-tools.html
index d4251cb4d1..4470efbc87 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -143,9 +143,9 @@
 
 Apache Spark community uses various resources to maintain the community 
test coverage.
 
-GitHub Action
+GitHub Actions
 
-https://github.com/apache/spark/actions";>GitHub Action 
provides the following on Ubuntu 22.04.
+https://github.com/apache/spark/actions";>GitHub Actions 
provides the following on Ubuntu 22.04.
 
 Apache Spark 4
 
@@ -329,11 +329,16 @@ Generating HTML files for PySpark coverage under 
/.../spark/python/test_coverage
 
 Testing K8S
 
-Although GitHub Action provide both K8s unit test and integration test 
coverage, you can run it locally. For example, Volcano batch scheduler 
integration test should be done manually. Please refer the integration test 
documentation for the detail.
+Although GitHub Actions provide both K8s unit test and integration test 
coverage, you can run it locally. For example, Volcano batch scheduler 
integration test should be done manually. Please refer the integration test 
documentation for the detail.
 
 https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md";>https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md
 
-Testing with GitHub actions workflow
+Running the Docker integration tests
+
+Docker integration tests are covered by GitHub Actions. However, you can 
run it locally to speedup deveplopment and testing.
+Please refer the https://github.com/apache/spark/blob/master/connector/docker-integration-tests/README.md";>Docker
 integration test documentation for the detail.
+
+Testing with GitHub Actions workflow
 
 Apache Spark leverages GitHub Actions that enables continuous integration 
and a wide range of automation. Apache Spark repository provides several GitHub 
Actions workflows for developers to run before creating a pull request.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (f078998df2f3 -> bb2195554e6d)

2024-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f078998df2f3 [MINOR][DOCS] Miscellaneous documentation improvements
 add bb2195554e6d [SPARK-46874][PYTHON] Remove `pyspark.pandas` dependency 
from `assertDataFrameEqual`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/testing/utils.py | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-website) branch asf-site updated: update (#498)

2024-01-28 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 6e03f8f78b update (#498)
6e03f8f78b is described below

commit 6e03f8f78ba753c6b2f42f4fe5e346dd2f1879ac
Author: Kent Yao 
AuthorDate: Mon Jan 29 10:46:33 2024 +0800

update (#498)
---
 Gemfile.lock | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/Gemfile.lock b/Gemfile.lock
index f1e8cf7c6e..f4dedba223 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -1,18 +1,18 @@
 GEM
   remote: https://rubygems.org/
   specs:
-addressable (2.8.1)
+addressable (2.8.6)
   public_suffix (>= 2.0.2, < 6.0)
 colorator (1.1.0)
-concurrent-ruby (1.1.8)
-em-websocket (0.5.2)
+concurrent-ruby (1.2.3)
+em-websocket (0.5.3)
   eventmachine (>= 0.12.9)
-  http_parser.rb (~> 0.6.0)
+  http_parser.rb (~> 0)
 eventmachine (1.2.7)
-ffi (1.14.2)
+ffi (1.16.3)
 forwardable-extended (2.6.0)
-http_parser.rb (0.6.0)
-i18n (1.8.9)
+http_parser.rb (0.8.0)
+i18n (1.14.1)
   concurrent-ruby (~> 1.0)
 jekyll (4.2.0)
   addressable (~> 2.4)
@@ -29,7 +29,7 @@ GEM
   rouge (~> 3.0)
   safe_yaml (~> 1.0)
   terminal-table (~> 2.0)
-jekyll-sass-converter (2.1.0)
+jekyll-sass-converter (2.2.0)
   sassc (> 2.0.1, < 3.0)
 jekyll-watch (2.2.1)
   listen (~> 3.0)
@@ -38,25 +38,25 @@ GEM
 kramdown-parser-gfm (1.1.0)
   kramdown (~> 2.0)
 liquid (4.0.4)
-listen (3.4.1)
+listen (3.8.0)
   rb-fsevent (~> 0.10, >= 0.10.3)
   rb-inotify (~> 0.9, >= 0.9.10)
 mercenary (0.4.0)
 pathutil (0.16.2)
   forwardable-extended (~> 2.6)
-public_suffix (5.0.0)
-rb-fsevent (0.10.4)
+public_suffix (5.0.4)
+rb-fsevent (0.11.2)
 rb-inotify (0.10.1)
   ffi (~> 1.0)
-rexml (3.2.5)
+rexml (3.2.6)
 rouge (3.26.0)
 safe_yaml (1.0.5)
 sassc (2.4.0)
   ffi (~> 1.9)
 terminal-table (2.0.0)
   unicode-display_width (~> 1.1, >= 1.1.1)
-unicode-display_width (1.7.0)
-webrick (1.7.0)
+unicode-display_width (1.8.0)
+webrick (1.8.1)
 
 PLATFORMS
   ruby
@@ -67,4 +67,4 @@ DEPENDENCIES
   webrick (~> 1.7)
 
 BUNDLED WITH
-   2.3.7
+   2.4.19


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Re: [PR] Upgrade bundle dependencies for doc build [spark-website]

2024-01-28 Thread via GitHub


yaooqinn merged PR #498:
URL: https://github.com/apache/spark-website/pull/498


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Re: [PR] Upgrade bundle dependencies for doc build [spark-website]

2024-01-28 Thread via GitHub


yaooqinn commented on PR #498:
URL: https://github.com/apache/spark-website/pull/498#issuecomment-1913868673

   Thank you @srowen, merged to asf-site


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[PR] Add instructions for running docker integration tests [spark-website]

2024-01-28 Thread via GitHub


yaooqinn opened a new pull request, #499:
URL: https://github.com/apache/spark-website/pull/499

   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[PR] Upgrade bundle dependencies for doc build [spark-website]

2024-01-28 Thread via GitHub


yaooqinn opened a new pull request, #498:
URL: https://github.com/apache/spark-website/pull/498

   On my Mac M2, I failed to gen the docs via `bundle exec jekyll build`
   
   
   ```
   
/Users/hzyaoqin/spark-website/.local_ruby_bundle/ruby/2.6.0/gems/ffi-1.14.2/lib/ffi/library.rb:275:
 [BUG] Bus Error at 0x0001025b4000
   ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.arm64e-darwin23]
   
   -- Crash Report log information 
  See Crash Report log file under the one of following:
* ~/Library/Logs/DiagnosticReports
* /Library/Logs/DiagnosticReports
  for more details.
   Don't forget to include the above Crash Report log file in bug reports.
   
   -- Control frame information ---
   ```
   
   After `bundle update`, it's done
   
   ```
   Configuration file: /Users/hzyaoqin/spark-website/_config.yml
   Source: /Users/hzyaoqin/spark-website
  Destination: /Users/hzyaoqin/spark-website/site
Incremental build: disabled. Enable with --incremental
 Generating...
   done in 3.648 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [MINOR][DOCS] Miscellaneous documentation improvements

2024-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f078998df2f3 [MINOR][DOCS] Miscellaneous documentation improvements
f078998df2f3 is described below

commit f078998df2f3ad61a33b72b2dae18de4951cd15f
Author: Nicholas Chammas 
AuthorDate: Mon Jan 29 10:06:07 2024 +0900

[MINOR][DOCS] Miscellaneous documentation improvements

### What changes were proposed in this pull request?

- Improve the formatting of various code snippets.
- Fix some broken links in the documentation.
- Clarify the non-intuitive behavior of `displayValue` in 
`getAllDefinedConfs()`.

### Why are the changes needed?

These are minor quality of life improvements for users and developers alike.

### Does this PR introduce _any_ user-facing change?

Yes, it tweaks some of the links in user-facing documentation.

### How was this patch tested?

Not tested beyond CI.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44919 from nchammas/misc-doc-fixes.

Authored-by: Nicholas Chammas 
Signed-off-by: Hyukjin Kwon 
---
 docs/configuration.md| 16 ++--
 docs/mllib-dimensionality-reduction.md   |  4 +++-
 docs/rdd-programming-guide.md|  6 --
 docs/sql-data-sources-avro.md|  5 +++--
 .../scala/org/apache/spark/sql/internal/SQLConf.scala|  7 ++-
 5 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index e771c323d369..7fef09781a15 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -88,10 +88,14 @@ val sc = new SparkContext(new SparkConf())
 {% endhighlight %}
 
 Then, you can supply configuration values at runtime:
-{% highlight bash %}
-./bin/spark-submit --name "My app" --master local[4] --conf 
spark.eventLog.enabled=false
-  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps" myApp.jar
-{% endhighlight %}
+```sh
+./bin/spark-submit \
+  --name "My app" \
+  --master local[4] \
+  --conf spark.eventLog.enabled=false \
+  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps" \
+  myApp.jar
+```
 
 The Spark shell and [`spark-submit`](submitting-applications.html)
 tool support two ways to load configurations dynamically. The first is command 
line options,
@@ -3708,9 +3712,9 @@ Also, you can modify or add configurations at runtime:
 GPUs and other accelerators have been widely used for accelerating special 
workloads, e.g.,
 deep learning and signal processing. Spark now supports requesting and 
scheduling generic resources, such as GPUs, with a few caveats. The current 
implementation requires that the resource have addresses that can be allocated 
by the scheduler. It requires your cluster manager to support and be properly 
configured with the resources.
 
-There are configurations available to request resources for the driver: 
spark.driver.resource.{resourceName}.amount, request resources for 
the executor(s): spark.executor.resource.{resourceName}.amount and 
specify the requirements for each task: 
spark.task.resource.{resourceName}.amount. The 
spark.driver.resource.{resourceName}.discoveryScript config is 
required on YARN, Kubernetes and a client side Driver on Spark Standalone. 
spa [...]
+There are configurations available to request resources for the driver: 
`spark.driver.resource.{resourceName}.amount`, request resources for the 
executor(s): `spark.executor.resource.{resourceName}.amount` and specify the 
requirements for each task: `spark.task.resource.{resourceName}.amount`. The 
`spark.driver.resource.{resourceName}.discoveryScript` config is required on 
YARN, Kubernetes and a client side Driver on Spark Standalone. 
`spark.executor.resource.{resourceName}.discoveryScri [...]
 
-Spark will use the configurations specified to first request containers with 
the corresponding resources from the cluster manager. Once it gets the 
container, Spark launches an Executor in that container which will discover 
what resources the container has and the addresses associated with each 
resource. The Executor will register with the Driver and report back the 
resources available to that Executor. The Spark scheduler can then schedule 
tasks to each Executor and assign specific reso [...]
+Spark will use the configurations specified to first request containers with 
the corresponding resources from the cluster manager. Once it gets the 
container, Spark launches an Executor in that container which will discover 
what resources the container has and the addresses associated with each 
resource. The Executor will register with the Driver

(spark) branch master updated: [MINOR][DOCS] Remove unneeded comments from global.html

2024-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 901850cab748 [MINOR][DOCS] Remove unneeded comments from global.html
901850cab748 is described below

commit 901850cab748fae6b9ebab88eda82f6314a2691c
Author: Nicholas Chammas 
AuthorDate: Mon Jan 29 10:05:21 2024 +0900

[MINOR][DOCS] Remove unneeded comments from global.html

### What changes were proposed in this pull request?

Remove some unneeded comments from global.html.

### Why are the changes needed?

They are just noise. They don't appear to do anything (they are not Jekyll 
directives).

For the record, Internet Explorer 8, 9, and 10 were [sunset in 2020][1]. 
Internet Explorer 7 was sunset [last year][2].

[1]: 
https://learn.microsoft.com/en-us/lifecycle/products/internet-explorer-10
[2]: 
https://learn.microsoft.com/en-us/lifecycle/products/internet-explorer-7

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I built the docs with `SKIP_API=1` and confirmed nothing broke.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44921 from nchammas/global-html-comments.

Authored-by: Nicholas Chammas 
Signed-off-by: Hyukjin Kwon 
---
 docs/_layouts/global.html | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index 6acffe8a405d..c61c9349a6d7 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -1,9 +1,5 @@
-
 
-
-
-
-  
+
 
 
 
@@ -53,12 +49,7 @@
 
 
 
-
-
 
-
 
 
 {{site.SPARK_VERSION_SHORT}}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (89d86e617da2 -> 02c945d6ab61)

2024-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 89d86e617da2 [SPARK-46873][SS] Do not recreate new 
StreamingQueryManager for the same Spark Session
 add 02c945d6ab61 [SPARK-46889][CORE] Validate 
`spark.master.ui.decommission.allow.mode` setting

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/UI.scala | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46873][SS] Do not recreate new StreamingQueryManager for the same Spark Session

2024-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 89d86e617da2 [SPARK-46873][SS] Do not recreate new 
StreamingQueryManager for the same Spark Session
89d86e617da2 is described below

commit 89d86e617da2d0346cdf862d975a87c24c9a9f5c
Author: Wei Liu 
AuthorDate: Mon Jan 29 08:53:15 2024 +0900

[SPARK-46873][SS] Do not recreate new StreamingQueryManager for the same 
Spark Session

### What changes were proposed in this pull request?

In Scala, there is only one streaming query manager for one spark session:

```
scala> spark.streams
val res0: org.apache.spark.sql.streaming.StreamingQueryManager = 
org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba

scala> spark.streams
val res1: org.apache.spark.sql.streaming.StreamingQueryManager = 
org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba

scala> spark.streams
val res2: org.apache.spark.sql.streaming.StreamingQueryManager = 
org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba

scala> spark.streams
val res3: org.apache.spark.sql.streaming.StreamingQueryManager = 
org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba
```

In Python, this is currently false for both connect and vanilla spark:

```
>>> spark.streams

>>> spark.streams

>>> spark.streams

>>> spark.streams

```
This PR makes the spark session reuse existing streaming query manager

### Why are the changes needed?

Python should align Scala behavior. 

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44898 from WweiL/SPARK-46873-sqm-reuse.

Authored-by: Wei Liu 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/session.py| 5 -
 python/pyspark/sql/session.py| 5 -
 python/pyspark/sql/tests/streaming/test_streaming.py | 6 ++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/connect/session.py 
b/python/pyspark/sql/connect/session.py
index 9700f72cdcf1..19f66072133c 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -704,7 +704,10 @@ class SparkSession:
 
 @property
 def streams(self) -> "StreamingQueryManager":
-return StreamingQueryManager(self)
+if hasattr(self, "_sqm"):
+return self._sqm
+self._sqm: StreamingQueryManager = StreamingQueryManager(self)
+return self._sqm
 
 streams.__doc__ = PySparkSession.streams.__doc__
 
diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index 6265f4fbe809..b813cf17ced3 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -1825,7 +1825,10 @@ class SparkSession(SparkConversionMixin):
 """
 from pyspark.sql.streaming import StreamingQueryManager
 
-return StreamingQueryManager(self._jsparkSession.streams())
+if hasattr(self, "_sqm"):
+return self._sqm
+self._sqm: StreamingQueryManager = 
StreamingQueryManager(self._jsparkSession.streams())
+return self._sqm
 
 def stop(self) -> None:
 """
diff --git a/python/pyspark/sql/tests/streaming/test_streaming.py 
b/python/pyspark/sql/tests/streaming/test_streaming.py
index a7c22897096b..31486feae156 100644
--- a/python/pyspark/sql/tests/streaming/test_streaming.py
+++ b/python/pyspark/sql/tests/streaming/test_streaming.py
@@ -294,6 +294,12 @@ class StreamingTestsMixin:
 self.assertIsInstance(exception, StreamingQueryException)
 self._assert_exception_tree_contains_msg(exception, 
"ZeroDivisionError")
 
+def test_query_manager_no_recreation(self):
+# SPARK-46873: There should not be a new StreamingQueryManager created 
every time
+# spark.streams is called.
+for i in range(5):
+self.assertTrue(self.spark.streams == self.spark.streams)
+
 def test_query_manager_get(self):
 df = self.spark.readStream.format("rate").load()
 for q in self.spark.streams.active:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46892][BUILD] Upgrade dropwizard metrics 4.2.25

2024-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d74aecd11dcd [SPARK-46892][BUILD] Upgrade dropwizard metrics 4.2.25
d74aecd11dcd is described below

commit d74aecd11dcd1c8414b662457e49b6001395bb8d
Author: panbingkun 
AuthorDate: Sun Jan 28 12:12:02 2024 -0800

[SPARK-46892][BUILD] Upgrade dropwizard metrics 4.2.25

### What changes were proposed in this pull request?
The pr aims to upgrade dropwizard metrics from `4.2.21` to `4.2.25`.

### Why are the changes needed?
The last update occurred 3 months ago.

- The new version bringes some bug fixes:
  Fix IndexOutOfBoundsException in Jetty 9, 10, 11, 12 InstrumentedHandler 
https://github.com/dropwizard/metrics/pull/3912

- The full version release notes:
  https://github.com/dropwizard/metrics/releases/tag/v4.2.25
  https://github.com/dropwizard/metrics/releases/tag/v4.2.24
  https://github.com/dropwizard/metrics/releases/tag/v4.2.23
  https://github.com/dropwizard/metrics/releases/tag/v4.2.22

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44918 from panbingkun/SPARK-46892.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 10 +-
 pom.xml   |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 71f9ac8665b0..09291de50350 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -185,11 +185,11 @@ log4j-core/2.22.1//log4j-core-2.22.1.jar
 log4j-slf4j2-impl/2.22.1//log4j-slf4j2-impl-2.22.1.jar
 logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
 lz4-java/1.8.0//lz4-java-1.8.0.jar
-metrics-core/4.2.21//metrics-core-4.2.21.jar
-metrics-graphite/4.2.21//metrics-graphite-4.2.21.jar
-metrics-jmx/4.2.21//metrics-jmx-4.2.21.jar
-metrics-json/4.2.21//metrics-json-4.2.21.jar
-metrics-jvm/4.2.21//metrics-jvm-4.2.21.jar
+metrics-core/4.2.25//metrics-core-4.2.25.jar
+metrics-graphite/4.2.25//metrics-graphite-4.2.25.jar
+metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar
+metrics-json/4.2.25//metrics-json-4.2.25.jar
+metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar
 minlog/1.3.0//minlog-1.3.0.jar
 netty-all/4.1.106.Final//netty-all-4.1.106.Final.jar
 netty-buffer/4.1.106.Final//netty-buffer-4.1.106.Final.jar
diff --git a/pom.xml b/pom.xml
index d4e8a7db71de..a5f2b6f74b7a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -156,7 +156,7 @@
 If you change codahale.metrics.version, you also need to change
 the link to metrics.dropwizard.io in docs/monitoring.md.
 -->
-4.2.21
+4.2.25
 
 1.11.3
 1.12.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org