(spark) branch master updated: [SPARK-46900][BUILD] Upgrade slf4j to 2.0.11
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a368280708dd [SPARK-46900][BUILD] Upgrade slf4j to 2.0.11 a368280708dd is described below commit a368280708dd3c6eb90bd3b09a36a68bdd096222 Author: yangjie01 AuthorDate: Sun Jan 28 23:42:37 2024 -0800 [SPARK-46900][BUILD] Upgrade slf4j to 2.0.11 ### What changes were proposed in this pull request? This pr aims to upgrade slf4j from 2.0.10 to 2.0.11 ### Why are the changes needed? This release reinstates the `renderLevel()` method in SimpleLogger which was removed by mistake. The full release notes as follows: - https://www.slf4j.org/news.html#2.0.11 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44928 from LuciferYang/SPARK-46900. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++--- pom.xml | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 09291de50350..06fb4d879db2 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -123,7 +123,7 @@ javassist/3.29.2-GA//javassist-3.29.2-GA.jar javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar javolution/5.5.1//javolution-5.5.1.jar jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar -jcl-over-slf4j/2.0.10//jcl-over-slf4j-2.0.10.jar +jcl-over-slf4j/2.0.11//jcl-over-slf4j-2.0.11.jar jdo-api/3.0.1//jdo-api-3.0.1.jar jdom2/2.0.6//jdom2-2.0.6.jar jersey-client/2.41//jersey-client-2.41.jar @@ -148,7 +148,7 @@ json4s-jackson_2.13/3.7.0-M11//json4s-jackson_2.13-3.7.0-M11.jar json4s-scalap_2.13/3.7.0-M11//json4s-scalap_2.13-3.7.0-M11.jar jsr305/3.0.0//jsr305-3.0.0.jar jta/1.1//jta-1.1.jar -jul-to-slf4j/2.0.10//jul-to-slf4j-2.0.10.jar +jul-to-slf4j/2.0.11//jul-to-slf4j-2.0.11.jar kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar kubernetes-client-api/6.10.0//kubernetes-client-api-6.10.0.jar kubernetes-client/6.10.0//kubernetes-client-6.10.0.jar @@ -247,7 +247,7 @@ scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar scala-parser-combinators_2.13/2.3.0//scala-parser-combinators_2.13-2.3.0.jar scala-reflect/2.13.12//scala-reflect-2.13.12.jar scala-xml_2.13/2.2.0//scala-xml_2.13-2.2.0.jar -slf4j-api/2.0.10//slf4j-api-2.0.10.jar +slf4j-api/2.0.11//slf4j-api-2.0.11.jar snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar snakeyaml/2.2//snakeyaml-2.2.jar snappy-java/1.1.10.5//snappy-java-1.1.10.5.jar diff --git a/pom.xml b/pom.xml index a5f2b6f74b7a..b78f49499feb 100644 --- a/pom.xml +++ b/pom.xml @@ -119,7 +119,7 @@ 3.1.0 spark 9.6 -2.0.10 +2.0.11 2.22.1 3.3.6 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46901][PYTHON] Upgrade `pyarrow` to 15.0.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 487cbc086a30 [SPARK-46901][PYTHON] Upgrade `pyarrow` to 15.0.0 487cbc086a30 is described below commit 487cbc086a30ec4d58695336acbe8037a3d5ebe7 Author: Ruifeng Zheng AuthorDate: Sun Jan 28 23:41:49 2024 -0800 [SPARK-46901][PYTHON] Upgrade `pyarrow` to 15.0.0 ### What changes were proposed in this pull request? Upgrade `pyarrow` to 15.0.0 ### Why are the changes needed? to support latest pyarrow ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44924 from zhengruifeng/py_arrow_15. Authored-by: Ruifeng Zheng Signed-off-by: Dongjoon Hyun --- dev/infra/Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 976f94251d7a..fc515d4478ad 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -94,7 +94,7 @@ RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml -ARG BASIC_PIP_PKGS="numpy pyarrow>=14.0.0 six==1.16.0 pandas<=2.1.4 scipy plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2" +ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas<=2.1.4 scipy plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2" # Python deps for Spark Connect ARG CONNECT_PIP_PKGS="grpcio==1.59.3 grpcio-status==1.59.3 protobuf==4.25.1 googleapis-common-protos==1.56.4" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46721][CORE][TESTS] Make gpu fraction tests more robust
This is an automated email from the ASF dual-hosted git repository. wuyi pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 90e6c0cf2ca1 [SPARK-46721][CORE][TESTS] Make gpu fraction tests more robust 90e6c0cf2ca1 is described below commit 90e6c0cf2ca186d1a492af4dc995b8254aa77aae Author: Bobby Wang AuthorDate: Mon Jan 29 14:59:52 2024 +0800 [SPARK-46721][CORE][TESTS] Make gpu fraction tests more robust ### What changes were proposed in this pull request? When cherry-picking https://github.com/apache/spark/pull/43494 back to branch 3.5 https://github.com/apache/spark/pull/44690, I ran into the issue that some tests for Scala 2.12 failed when comparing two maps. It turned out that the function [compareMaps](https://github.com/apache/spark/pull/43494/files#diff-f205431247dd9446f4ce941e5a4620af438c242b9bdff6e7faa7df0194db49acR129) is not so robust for scala 2.12 and scala 2.13. - scala 2.13 ``` scala Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.9). Type in expressions for evaluation. Or try :help. scala> def compareMaps(lhs: Map[String, Double], rhs: Map[String, Double], | eps: Double = 0.0001): Boolean = { | lhs.size == rhs.size && | lhs.zip(rhs).forall { case ((lName, lAmount), (rName, rAmount)) => | lName == rName && (lAmount - rAmount).abs < eps | } | } | | import scala.collection.mutable.HashMap | val resources = Map("gpu" -> Map("a" -> 1.0, "b" -> 2.0, "c" -> 3.0, "d"-> 4.0)) | val mapped = resources.map { case (rName, addressAmounts) => | rName -> HashMap(addressAmounts.toSeq.sorted: _*) | } | | compareMaps(resources("gpu"), mapped("gpu").toMap) def compareMaps(lhs: Map[String,Double], rhs: Map[String,Double], eps: Double): Boolean import scala.collection.mutable.HashMap val resources: scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Double]] = Map(gpu -> Map(a -> 1.0, b -> 2.0, c -> 3.0, d -> 4.0)) val mapped: scala.collection.immutable.Map[String,scala.collection.mutable.HashMap[String,Double]] = Map(gpu -> HashMap(a -> 1.0, b -> 2.0, c -> 3.0, d -> 4.0)) val res0: Boolean = true ``` - scala 2.12 ``` scala Welcome to Scala 2.12.14 (OpenJDK 64-Bit Server VM, Java 17.0.9). Type in expressions for evaluation. Or try :help. scala> def compareMaps(lhs: Map[String, Double], rhs: Map[String, Double], | eps: Double = 0.0001): Boolean = { | lhs.size == rhs.size && | lhs.zip(rhs).forall { case ((lName, lAmount), (rName, rAmount)) => | lName == rName && (lAmount - rAmount).abs < eps | } | } compareMaps: (lhs: Map[String,Double], rhs: Map[String,Double], eps: Double)Boolean scala> import scala.collection.mutable.HashMap import scala.collection.mutable.HashMap scala> val resources = Map("gpu" -> Map("a" -> 1.0, "b" -> 2.0, "c" -> 3.0, "d"-> 4.0)) resources: scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Double]] = Map(gpu -> Map(a -> 1.0, b -> 2.0, c -> 3.0, d -> 4.0)) scala> val mapped = resources.map { case (rName, addressAmounts) => | rName -> HashMap(addressAmounts.toSeq.sorted: _*) | } mapped: scala.collection.immutable.Map[String,scala.collection.mutable.HashMap[String,Double]] = Map(gpu -> Map(b -> 2.0, d -> 4.0, a -> 1.0, c -> 3.0)) scala> compareMaps(resources("gpu"), mapped("gpu").toMap) res0: Boolean = false ``` The same code bug got different results for Scala 2.12 and Scala 2.13. This PR tried to rework compareMaps to make tests pass for both scala 2.12 and scala 2.13 ### Why are the changes needed? Some users may back-port https://github.com/apache/spark/pull/43494 to some older branch for scala 2.12 and will run into the same issue. It's just trivial work to make the GPU fraction tests compatible with Scala 2.12 and Scala 2.13 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Make sure all the CI pipelines pass ### Was this patch authored or co-authored using generative AI tooling? No Closes #44735 from wbo4958/gpu-fraction-tests. Authored-by: Bobby Wang Signed-off-by: Yi Wu --- .../scheduler/ExecutorResourceInfoSuite.scala | 10 +--- .../spark/scheduler/ExecutorResourceUtils.scala| 28 ++ .../scheduler/ExecutorResourcesAmountsSuite.scala | 10 +--- .../spark/scheduler/TaskSchedulerImplSuite.scala | 22 + 4 files changed, 42 insertion
(spark) branch master updated: [MINOR][DOCS] Remove Canonicalize in docs
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 112f4acb6283 [MINOR][DOCS] Remove Canonicalize in docs 112f4acb6283 is described below commit 112f4acb62834511c7a7fd56b4a3c14178c1ce02 Author: longfei.jiang <1251489...@qq.com> AuthorDate: Mon Jan 29 15:48:41 2024 +0900 [MINOR][DOCS] Remove Canonicalize in docs ### What changes were proposed in this pull request? Remove Canonicalize in docs ### Why are the changes needed? In SPARK-40362 remove Canonicalize.scala, need update docs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Just in the docs, no need to test. ### Was this patch authored or co-authored using generative AI tooling? No Closes #44897 from jlfsdtc/docs_fix. Lead-authored-by: longfei.jiang <1251489...@qq.com> Co-authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .../org/apache/spark/sql/catalyst/expressions/Expression.scala | 4 ++-- .../apache/spark/sql/catalyst/expressions/ExpressionSet.scala | 10 ++ .../sql/catalyst/plans/logical/QueryPlanConstraints.scala | 2 +- 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala index a3432716002a..817432879391 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala @@ -311,7 +311,7 @@ abstract class Expression extends TreeNode[Expression] { * Returns true when two expressions will always compute the same result, even if they differ * cosmetically (i.e. capitalization of names in attributes may be different). * - * See [[Canonicalize]] for more details. + * See [[Expression#canonicalized]] for more details. */ final def semanticEquals(other: Expression): Boolean = deterministic && other.deterministic && canonicalized == other.canonicalized @@ -320,7 +320,7 @@ abstract class Expression extends TreeNode[Expression] { * Returns a `hashCode` for the calculation performed by this expression. Unlike the standard * `hashCode`, an attempt has been made to eliminate cosmetic differences. * - * See [[Canonicalize]] for more details. + * See [[Expression#canonicalized]] for more details. */ def semanticHash(): Int = canonicalized.hashCode() diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala index ba18b7a2b86c..1aa9f006463c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala @@ -21,7 +21,9 @@ import scala.collection.mutable import scala.collection.mutable.ArrayBuffer object ExpressionSet { - /** Constructs a new [[ExpressionSet]] by applying [[Canonicalize]] to `expressions`. */ + /** + * Constructs a new [[ExpressionSet]] by applying [[Expression#canonicalized]] to `expressions`. + */ def apply(expressions: IterableOnce[Expression]): ExpressionSet = { val set = new ExpressionSet() expressions.iterator.foreach(set.add) @@ -36,7 +38,7 @@ object ExpressionSet { /** * A [[Set]] where membership is determined based on determinacy and a canonical representation of * an [[Expression]] (i.e. one that attempts to ignore cosmetic differences). - * See [[Canonicalize]] for more details. + * See [[Expression#canonicalized]] for more details. * * Internally this set uses the canonical representation, but keeps also track of the original * expressions to ease debugging. Since different expressions can share the same canonical @@ -168,8 +170,8 @@ class ExpressionSet protected( override def clone(): ExpressionSet = new ExpressionSet(baseSet.clone(), originals.clone()) /** - * Returns a string containing both the post [[Canonicalize]] expressions and the original - * expressions in this set. + * Returns a string containing both the post [[Expression#canonicalized]] expressions + * and the original expressions in this set. */ def toDebugString: String = s""" diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala index 022fd7fff750..5769f006ccbc 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/ca
(spark) branch master updated: [SPARK-46899][CORE] Remove `POST` APIs from `MasterWebUI` when `spark.ui.killEnabled` is `false`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 95a4abd5b5bc [SPARK-46899][CORE] Remove `POST` APIs from `MasterWebUI` when `spark.ui.killEnabled` is `false` 95a4abd5b5bc is described below commit 95a4abd5b5bcc36335be9af84b7bbddd7d0034ba Author: Dongjoon Hyun AuthorDate: Sun Jan 28 22:38:32 2024 -0800 [SPARK-46899][CORE] Remove `POST` APIs from `MasterWebUI` when `spark.ui.killEnabled` is `false` ### What changes were proposed in this pull request? This PR aims to remove `POST` APIs from `MasterWebUI` when `spark.ui.killEnabled` is false. ### Why are the changes needed? If `spark.ui.killEnabled` is false, we don't need to attach `POST`-related redirect or servlet handlers from the beginning because it will be ignored in `MasterPage`. https://github.com/apache/spark/blob/8cd0d1854da04334aff3188e4eca08a48f734579/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala#L64-L65 ### Does this PR introduce _any_ user-facing change? Previously, the user request is ignored silently after redirecting. Now, it will response with a correct HTTP error code, 405 `Method Not Allowed`. ### How was this patch tested? Pass the CIs with newly added test suite, `ReadOnlyMasterWebUISuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44926 from dongjoon-hyun/SPARK-46899. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../spark/deploy/master/ui/MasterWebUI.scala | 46 ++--- .../spark/deploy/master/ui/MasterWebUISuite.scala | 9 ++- .../master/ui/ReadOnlyMasterWebUISuite.scala | 75 ++ 3 files changed, 105 insertions(+), 25 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala index 3025c0bf468b..14ea6dbb3d20 100644 --- a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala +++ b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala @@ -54,31 +54,33 @@ class MasterWebUI( attachPage(new LogPage(this)) attachPage(masterPage) addStaticHandler(MasterWebUI.STATIC_RESOURCE_DIR) -attachHandler(createRedirectHandler( - "/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = Set("POST"))) -attachHandler(createRedirectHandler( - "/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = Set("POST"))) -attachHandler(createServletHandler("/workers/kill", new HttpServlet { - override def doPost(req: HttpServletRequest, resp: HttpServletResponse): Unit = { -val hostnames: Seq[String] = Option(req.getParameterValues("host")) - .getOrElse(Array[String]()).toImmutableArraySeq -if (decommissionDisabled || !isDecommissioningRequestAllowed(req)) { - resp.sendError(HttpServletResponse.SC_METHOD_NOT_ALLOWED) -} else { - val removedWorkers = masterEndpointRef.askSync[Integer]( -DecommissionWorkersOnHosts(hostnames)) - logInfo(s"Decommissioning of hosts $hostnames decommissioned $removedWorkers workers") - if (removedWorkers > 0) { -resp.setStatus(HttpServletResponse.SC_OK) - } else if (removedWorkers == 0) { -resp.sendError(HttpServletResponse.SC_NOT_FOUND) +if (killEnabled) { + attachHandler(createRedirectHandler( +"/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = Set("POST"))) + attachHandler(createRedirectHandler( +"/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = Set("POST"))) + attachHandler(createServletHandler("/workers/kill", new HttpServlet { +override def doPost(req: HttpServletRequest, resp: HttpServletResponse): Unit = { + val hostnames: Seq[String] = Option(req.getParameterValues("host")) +.getOrElse(Array[String]()).toImmutableArraySeq + if (decommissionDisabled || !isDecommissioningRequestAllowed(req)) { +resp.sendError(HttpServletResponse.SC_METHOD_NOT_ALLOWED) } else { -// We shouldn't even see this case. -resp.setStatus(HttpServletResponse.SC_INTERNAL_SERVER_ERROR) +val removedWorkers = masterEndpointRef.askSync[Integer]( + DecommissionWorkersOnHosts(hostnames)) +logInfo(s"Decommissioning of hosts $hostnames decommissioned $removedWorkers workers") +if (removedWorkers > 0) { + resp.setStatus(HttpServletResponse.SC_OK) +} else if (removedWorkers == 0) { + resp.sendError(HttpServletResponse.SC_NOT_FOUND) +}
(spark) branch master updated: [SPARK-46898][CONNECT] Simplify the protobuf function transformation in Planner
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 56633e697571 [SPARK-46898][CONNECT] Simplify the protobuf function transformation in Planner 56633e697571 is described below commit 56633e69757174da8a7dd8f4ea5298fd0a00e656 Author: Ruifeng Zheng AuthorDate: Mon Jan 29 13:55:59 2024 +0800 [SPARK-46898][CONNECT] Simplify the protobuf function transformation in Planner ### What changes were proposed in this pull request? Simplify the protobuf function transformation in Planner ### Why are the changes needed? make `transformUnregisteredFunction` simple and reuse existing helper function ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44925 from zhengruifeng/connect_proto_simple. Authored-by: Ruifeng Zheng Signed-off-by: yangjie01 --- .../sql/connect/planner/SparkConnectPlanner.scala | 80 +++--- 1 file changed, 25 insertions(+), 55 deletions(-) diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala index 3e59b2644755..977bff690bac 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala @@ -1710,53 +1710,6 @@ class SparkConnectPlanner( */ private def transformUnregisteredFunction( fun: proto.Expression.UnresolvedFunction): Option[Expression] = { -def extractArgsOfProtobufFunction( -functionName: String, -argumentsCount: Int, -children: collection.Seq[Expression]) -: (String, Option[Array[Byte]], Map[String, String]) = { - val messageClassName = children(1) match { -case Literal(s, StringType) if s != null => s.toString -case other => - throw InvalidPlanInput( -s"MessageClassName in $functionName should be a literal string, but got $other") - } - val (binaryFileDescSetOpt, options) = if (argumentsCount == 2) { -(None, Map.empty[String, String]) - } else if (argumentsCount == 3) { -children(2) match { - case Literal(b, BinaryType) if b != null => -(Some(b.asInstanceOf[Array[Byte]]), Map.empty[String, String]) - case UnresolvedFunction(Seq("map"), arguments, _, _, _, _) => -(None, ExprUtils.convertToMapData(CreateMap(arguments))) - case other => -throw InvalidPlanInput( - s"The valid type for the 3rd arg in $functionName " + -s"is binary or map, but got $other") -} - } else if (argumentsCount == 4) { -val fileDescSetOpt = children(2) match { - case Literal(b, BinaryType) if b != null => -Some(b.asInstanceOf[Array[Byte]]) - case other => -throw InvalidPlanInput( - s"DescFilePath in $functionName should be a literal binary, but got $other") -} -val map = children(3) match { - case UnresolvedFunction(Seq("map"), arguments, _, _, _, _) => -ExprUtils.convertToMapData(CreateMap(arguments)) - case other => -throw InvalidPlanInput( - s"Options in $functionName should be created by map, but got $other") -} -(fileDescSetOpt, map) - } else { -throw InvalidPlanInput( - s"$functionName requires 2 ~ 4 arguments, but got $argumentsCount ones!") - } - (messageClassName, binaryFileDescSetOpt, options) -} - fun.getFunctionName match { case "product" if fun.getArgumentsCount == 1 => Some( @@ -1979,17 +1932,13 @@ class SparkConnectPlanner( // Protobuf-specific functions case "from_protobuf" if Seq(2, 3, 4).contains(fun.getArgumentsCount) => val children = fun.getArgumentsList.asScala.map(transformExpression) -val (messageClassName, binaryFileDescSetOpt, options) = - extractArgsOfProtobufFunction("from_protobuf", fun.getArgumentsCount, children) -Some( - ProtobufDataToCatalyst(children.head, messageClassName, binaryFileDescSetOpt, options)) +val (msgName, desc, options) = extractProtobufArgs(children.toSeq) +Some(ProtobufDataToCatalyst(children(0), msgName, desc, options)) case "to_protobuf" if Seq(2, 3, 4).contains(fun.getArgumentsCount) => val children = fun.getArgumentsList.asScala.map(transformExpression) -val (messageClassName, bi
Re: [PR] Add instructions for running docker integration tests [spark-website]
yaooqinn commented on PR #499: URL: https://github.com/apache/spark-website/pull/499#issuecomment-1913999403 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46897][PYTHON][DOCS] Refine docstring of `bit_and/bit_or/bit_xor`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5056a17919ac [SPARK-46897][PYTHON][DOCS] Refine docstring of `bit_and/bit_or/bit_xor` 5056a17919ac is described below commit 5056a17919ac88d35475dd13ae4167e783f9504a Author: yangjie01 AuthorDate: Sun Jan 28 21:33:39 2024 -0800 [SPARK-46897][PYTHON][DOCS] Refine docstring of `bit_and/bit_or/bit_xor` ### What changes were proposed in this pull request? This pr refine docstring of `bit_and/bit_or/bit_xor` and add some new examples. ### Why are the changes needed? To improve PySpark documentation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44923 from LuciferYang/SPARK-46897. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- python/pyspark/sql/functions/builtin.py | 138 ++-- 1 file changed, 132 insertions(+), 6 deletions(-) diff --git a/python/pyspark/sql/functions/builtin.py b/python/pyspark/sql/functions/builtin.py index d3a94fe4b9e9..0932ac1c2843 100644 --- a/python/pyspark/sql/functions/builtin.py +++ b/python/pyspark/sql/functions/builtin.py @@ -3790,9 +3790,51 @@ def bit_and(col: "ColumnOrName") -> Column: Examples +Example 1: Bitwise AND with all non-null values + +>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],[1],[2]], ["c"]) ->>> df.select(bit_and("c")).first() -Row(bit_and(c)=0) +>>> df.select(sf.bit_and("c")).show() ++--+ +|bit_and(c)| ++--+ +| 0| ++--+ + +Example 2: Bitwise AND with null values + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame([[1],[None],[2]], ["c"]) +>>> df.select(sf.bit_and("c")).show() ++--+ +|bit_and(c)| ++--+ +| 0| ++--+ + +Example 3: Bitwise AND with all null values + +>>> from pyspark.sql import functions as sf +>>> from pyspark.sql.types import IntegerType, StructType, StructField +>>> schema = StructType([StructField("c", IntegerType(), True)]) +>>> df = spark.createDataFrame([[None],[None],[None]], schema=schema) +>>> df.select(sf.bit_and("c")).show() ++--+ +|bit_and(c)| ++--+ +| NULL| ++--+ + +Example 4: Bitwise AND with single input value + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame([[5]], ["c"]) +>>> df.select(sf.bit_and("c")).show() ++--+ +|bit_and(c)| ++--+ +| 5| ++--+ """ return _invoke_function_over_columns("bit_and", col) @@ -3816,9 +3858,51 @@ def bit_or(col: "ColumnOrName") -> Column: Examples +Example 1: Bitwise OR with all non-null values + +>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],[1],[2]], ["c"]) ->>> df.select(bit_or("c")).first() -Row(bit_or(c)=3) +>>> df.select(sf.bit_or("c")).show() ++-+ +|bit_or(c)| ++-+ +|3| ++-+ + +Example 2: Bitwise OR with some null values + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame([[1],[None],[2]], ["c"]) +>>> df.select(sf.bit_or("c")).show() ++-+ +|bit_or(c)| ++-+ +|3| ++-+ + +Example 3: Bitwise OR with all null values + +>>> from pyspark.sql import functions as sf +>>> from pyspark.sql.types import IntegerType, StructType, StructField +>>> schema = StructType([StructField("c", IntegerType(), True)]) +>>> df = spark.createDataFrame([[None],[None],[None]], schema=schema) +>>> df.select(sf.bit_or("c")).show() ++-+ +|bit_or(c)| ++-+ +| NULL| ++-+ + +Example 4: Bitwise OR with single input value + +>>> from pyspark.sql import functions as sf +>>> df = spark.createDataFrame([[5]], ["c"]) +>>> df.select(sf.bit_or("c")).show() ++-+ +|bit_or(c)| ++-+ +|5| ++-+ """ return _invoke_function_over_columns("bit_or", col) @@ -3842,9 +3926,51 @@ def bit_xor(col: "ColumnOrName") -> Column: Examples +Example 1: Bitwise XOR with all non-null values + +>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],[1],[2]], ["c"]) ->>> df.select(bit_xor("c")).first() -Row(bit_xor(c)=2) +>>> df.select(sf.bit_xor("c")).show() ++--+ +|bit_xor(c)| ++-
(spark) branch master updated: [SPARK-46896][PS][TESTS] Clean up the imports in `pyspark.pandas.tests.{frame, series, groupby}.*`
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8cd0d1854da0 [SPARK-46896][PS][TESTS] Clean up the imports in `pyspark.pandas.tests.{frame, series, groupby}.*` 8cd0d1854da0 is described below commit 8cd0d1854da04334aff3188e4eca08a48f734579 Author: Ruifeng Zheng AuthorDate: Mon Jan 29 12:00:18 2024 +0800 [SPARK-46896][PS][TESTS] Clean up the imports in `pyspark.pandas.tests.{frame, series, groupby}.*` ### What changes were proposed in this pull request? 1, remove unused imports; 2, only define the test datasets once in the vanilla side, so that won't need to define it again in the parity tests; ### Why are the changes needed? code clean up ### Does this PR introduce _any_ user-facing change? no, test-only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #44922 from zhengruifeng/ps_test_frame_ser_cleanup. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- .../pyspark/pandas/tests/connect/frame/test_parity_attrs.py | 11 ++- .../pyspark/pandas/tests/connect/frame/test_parity_axis.py | 6 +- .../pandas/tests/connect/frame/test_parity_constructor.py| 4 +++- .../pandas/tests/connect/frame/test_parity_conversion.py | 9 - .../pandas/tests/connect/frame/test_parity_reindexing.py | 9 - .../pandas/tests/connect/frame/test_parity_reshaping.py | 6 +- .../pyspark/pandas/tests/connect/frame/test_parity_spark.py | 11 ++- .../pandas/tests/connect/frame/test_parity_time_series.py| 9 - .../pandas/tests/connect/frame/test_parity_truncate.py | 11 ++- .../pandas/tests/connect/groupby/test_parity_aggregate.py| 4 +++- .../pandas/tests/connect/groupby/test_parity_apply_func.py | 4 +++- .../pandas/tests/connect/groupby/test_parity_cumulative.py | 4 +++- .../pandas/tests/connect/groupby/test_parity_describe.py | 4 +++- .../pandas/tests/connect/groupby/test_parity_groupby.py | 5 - .../pandas/tests/connect/groupby/test_parity_head_tail.py| 4 +++- .../pandas/tests/connect/groupby/test_parity_index.py| 6 +- .../pandas/tests/connect/groupby/test_parity_missing_data.py | 4 +++- .../pandas/tests/connect/series/test_parity_all_any.py | 6 +- .../pandas/tests/connect/series/test_parity_arg_ops.py | 6 +- .../pyspark/pandas/tests/connect/series/test_parity_as_of.py | 6 +- .../pandas/tests/connect/series/test_parity_as_type.py | 6 +- .../pandas/tests/connect/series/test_parity_compute.py | 6 +- .../pandas/tests/connect/series/test_parity_conversion.py| 4 +++- .../pandas/tests/connect/series/test_parity_cumulative.py| 4 +++- .../pyspark/pandas/tests/connect/series/test_parity_index.py | 6 +- .../pandas/tests/connect/series/test_parity_missing_data.py | 4 +++- .../pandas/tests/connect/series/test_parity_series.py| 6 +- .../pyspark/pandas/tests/connect/series/test_parity_sort.py | 6 +- .../pyspark/pandas/tests/connect/series/test_parity_stat.py | 6 +- .../tests/connect/series/test_parity_string_ops_adv.py | 4 +++- .../tests/connect/series/test_parity_string_ops_basic.py | 4 +++- python/pyspark/pandas/tests/frame/test_attrs.py | 12 ++-- python/pyspark/pandas/tests/frame/test_axis.py | 8 ++-- python/pyspark/pandas/tests/frame/test_constructor.py| 8 ++-- python/pyspark/pandas/tests/frame/test_conversion.py | 12 ++-- python/pyspark/pandas/tests/frame/test_interpolate.py| 6 +- python/pyspark/pandas/tests/frame/test_reindexing.py | 8 ++-- python/pyspark/pandas/tests/frame/test_reshaping.py | 8 ++-- python/pyspark/pandas/tests/frame/test_spark.py | 12 ++-- python/pyspark/pandas/tests/frame/test_time_series.py| 8 ++-- python/pyspark/pandas/tests/frame/test_truncate.py | 4 ++-- python/pyspark/pandas/tests/groupby/test_aggregate.py| 8 ++-- python/pyspark/pandas/tests/groupby/test_apply_func.py | 8 ++-- python/pyspark/pandas/tests/groupby/test_cumulative.py | 8 ++-- python/pyspark/pandas/tests/groupby/test_describe.py | 8 ++-- python/pyspark/pandas/tests/groupby/test_groupby.py | 12 +--- python/pyspark/pandas/tests/groupby/test_grouping.py | 5 - python/pyspark/pandas/tests/groupby/test_head_tail.py| 8 ++-- python/pyspark/pandas/tests/groupby/test_index.py| 8 ++-- python/pyspark/pandas/tests/groupby/test_missing.py | 5 - python/p
Re: [PR] Add instructions for running docker integration tests [spark-website]
yaooqinn commented on PR #499: URL: https://github.com/apache/spark-website/pull/499#issuecomment-1913886172 Thank you @srowen, merged to asf-stie -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Add instructions for running docker integration tests [spark-website]
yaooqinn merged PR #499: URL: https://github.com/apache/spark-website/pull/499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-website) branch asf-site updated: Add instuctions for running docker integration tests (#499)
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 9476ea428d Add instuctions for running docker integration tests (#499) 9476ea428d is described below commit 9476ea428d8aec2c8f1fdf2252a28fb22e208930 Author: Kent Yao AuthorDate: Mon Jan 29 11:09:44 2024 +0800 Add instuctions for running docker integration tests (#499) --- developer-tools.md| 13 + site/developer-tools.html | 13 + 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/developer-tools.md b/developer-tools.md index 34087a874c..bd0da296a7 100644 --- a/developer-tools.md +++ b/developer-tools.md @@ -11,9 +11,9 @@ navigation: Apache Spark community uses various resources to maintain the community test coverage. -GitHub Action +GitHub Actions -[GitHub Action](https://github.com/apache/spark/actions) provides the following on Ubuntu 22.04. +[GitHub Actions](https://github.com/apache/spark/actions) provides the following on Ubuntu 22.04. Apache Spark 4 @@ -204,11 +204,16 @@ Please check other available options via `python/run-tests[-with-coverage] --hel Testing K8S -Although GitHub Action provide both K8s unit test and integration test coverage, you can run it locally. For example, Volcano batch scheduler integration test should be done manually. Please refer the integration test documentation for the detail. +Although GitHub Actions provide both K8s unit test and integration test coverage, you can run it locally. For example, Volcano batch scheduler integration test should be done manually. Please refer the integration test documentation for the detail. [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md) -Testing with GitHub actions workflow +Running the Docker integration tests + +Docker integration tests are covered by GitHub Actions. However, you can run it locally to speedup deveplopment and testing. +Please refer the [Docker integration test documentation](https://github.com/apache/spark/blob/master/connector/docker-integration-tests/README.md) for the detail. + +Testing with GitHub Actions workflow Apache Spark leverages GitHub Actions that enables continuous integration and a wide range of automation. Apache Spark repository provides several GitHub Actions workflows for developers to run before creating a pull request. diff --git a/site/developer-tools.html b/site/developer-tools.html index d4251cb4d1..4470efbc87 100644 --- a/site/developer-tools.html +++ b/site/developer-tools.html @@ -143,9 +143,9 @@ Apache Spark community uses various resources to maintain the community test coverage. -GitHub Action +GitHub Actions -https://github.com/apache/spark/actions";>GitHub Action provides the following on Ubuntu 22.04. +https://github.com/apache/spark/actions";>GitHub Actions provides the following on Ubuntu 22.04. Apache Spark 4 @@ -329,11 +329,16 @@ Generating HTML files for PySpark coverage under /.../spark/python/test_coverage Testing K8S -Although GitHub Action provide both K8s unit test and integration test coverage, you can run it locally. For example, Volcano batch scheduler integration test should be done manually. Please refer the integration test documentation for the detail. +Although GitHub Actions provide both K8s unit test and integration test coverage, you can run it locally. For example, Volcano batch scheduler integration test should be done manually. Please refer the integration test documentation for the detail. https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md";>https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md -Testing with GitHub actions workflow +Running the Docker integration tests + +Docker integration tests are covered by GitHub Actions. However, you can run it locally to speedup deveplopment and testing. +Please refer the https://github.com/apache/spark/blob/master/connector/docker-integration-tests/README.md";>Docker integration test documentation for the detail. + +Testing with GitHub Actions workflow Apache Spark leverages GitHub Actions that enables continuous integration and a wide range of automation. Apache Spark repository provides several GitHub Actions workflows for developers to run before creating a pull request. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (f078998df2f3 -> bb2195554e6d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f078998df2f3 [MINOR][DOCS] Miscellaneous documentation improvements add bb2195554e6d [SPARK-46874][PYTHON] Remove `pyspark.pandas` dependency from `assertDataFrameEqual` No new revisions were added by this update. Summary of changes: python/pyspark/testing/utils.py | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-website) branch asf-site updated: update (#498)
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 6e03f8f78b update (#498) 6e03f8f78b is described below commit 6e03f8f78ba753c6b2f42f4fe5e346dd2f1879ac Author: Kent Yao AuthorDate: Mon Jan 29 10:46:33 2024 +0800 update (#498) --- Gemfile.lock | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/Gemfile.lock b/Gemfile.lock index f1e8cf7c6e..f4dedba223 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -1,18 +1,18 @@ GEM remote: https://rubygems.org/ specs: -addressable (2.8.1) +addressable (2.8.6) public_suffix (>= 2.0.2, < 6.0) colorator (1.1.0) -concurrent-ruby (1.1.8) -em-websocket (0.5.2) +concurrent-ruby (1.2.3) +em-websocket (0.5.3) eventmachine (>= 0.12.9) - http_parser.rb (~> 0.6.0) + http_parser.rb (~> 0) eventmachine (1.2.7) -ffi (1.14.2) +ffi (1.16.3) forwardable-extended (2.6.0) -http_parser.rb (0.6.0) -i18n (1.8.9) +http_parser.rb (0.8.0) +i18n (1.14.1) concurrent-ruby (~> 1.0) jekyll (4.2.0) addressable (~> 2.4) @@ -29,7 +29,7 @@ GEM rouge (~> 3.0) safe_yaml (~> 1.0) terminal-table (~> 2.0) -jekyll-sass-converter (2.1.0) +jekyll-sass-converter (2.2.0) sassc (> 2.0.1, < 3.0) jekyll-watch (2.2.1) listen (~> 3.0) @@ -38,25 +38,25 @@ GEM kramdown-parser-gfm (1.1.0) kramdown (~> 2.0) liquid (4.0.4) -listen (3.4.1) +listen (3.8.0) rb-fsevent (~> 0.10, >= 0.10.3) rb-inotify (~> 0.9, >= 0.9.10) mercenary (0.4.0) pathutil (0.16.2) forwardable-extended (~> 2.6) -public_suffix (5.0.0) -rb-fsevent (0.10.4) +public_suffix (5.0.4) +rb-fsevent (0.11.2) rb-inotify (0.10.1) ffi (~> 1.0) -rexml (3.2.5) +rexml (3.2.6) rouge (3.26.0) safe_yaml (1.0.5) sassc (2.4.0) ffi (~> 1.9) terminal-table (2.0.0) unicode-display_width (~> 1.1, >= 1.1.1) -unicode-display_width (1.7.0) -webrick (1.7.0) +unicode-display_width (1.8.0) +webrick (1.8.1) PLATFORMS ruby @@ -67,4 +67,4 @@ DEPENDENCIES webrick (~> 1.7) BUNDLED WITH - 2.3.7 + 2.4.19 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Upgrade bundle dependencies for doc build [spark-website]
yaooqinn merged PR #498: URL: https://github.com/apache/spark-website/pull/498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Re: [PR] Upgrade bundle dependencies for doc build [spark-website]
yaooqinn commented on PR #498: URL: https://github.com/apache/spark-website/pull/498#issuecomment-1913868673 Thank you @srowen, merged to asf-site -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[PR] Add instructions for running docker integration tests [spark-website]
yaooqinn opened a new pull request, #499: URL: https://github.com/apache/spark-website/pull/499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[PR] Upgrade bundle dependencies for doc build [spark-website]
yaooqinn opened a new pull request, #498: URL: https://github.com/apache/spark-website/pull/498 On my Mac M2, I failed to gen the docs via `bundle exec jekyll build` ``` /Users/hzyaoqin/spark-website/.local_ruby_bundle/ruby/2.6.0/gems/ffi-1.14.2/lib/ffi/library.rb:275: [BUG] Bus Error at 0x0001025b4000 ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.arm64e-darwin23] -- Crash Report log information See Crash Report log file under the one of following: * ~/Library/Logs/DiagnosticReports * /Library/Logs/DiagnosticReports for more details. Don't forget to include the above Crash Report log file in bug reports. -- Control frame information --- ``` After `bundle update`, it's done ``` Configuration file: /Users/hzyaoqin/spark-website/_config.yml Source: /Users/hzyaoqin/spark-website Destination: /Users/hzyaoqin/spark-website/site Incremental build: disabled. Enable with --incremental Generating... done in 3.648 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][DOCS] Miscellaneous documentation improvements
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f078998df2f3 [MINOR][DOCS] Miscellaneous documentation improvements f078998df2f3 is described below commit f078998df2f3ad61a33b72b2dae18de4951cd15f Author: Nicholas Chammas AuthorDate: Mon Jan 29 10:06:07 2024 +0900 [MINOR][DOCS] Miscellaneous documentation improvements ### What changes were proposed in this pull request? - Improve the formatting of various code snippets. - Fix some broken links in the documentation. - Clarify the non-intuitive behavior of `displayValue` in `getAllDefinedConfs()`. ### Why are the changes needed? These are minor quality of life improvements for users and developers alike. ### Does this PR introduce _any_ user-facing change? Yes, it tweaks some of the links in user-facing documentation. ### How was this patch tested? Not tested beyond CI. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44919 from nchammas/misc-doc-fixes. Authored-by: Nicholas Chammas Signed-off-by: Hyukjin Kwon --- docs/configuration.md| 16 ++-- docs/mllib-dimensionality-reduction.md | 4 +++- docs/rdd-programming-guide.md| 6 -- docs/sql-data-sources-avro.md| 5 +++-- .../scala/org/apache/spark/sql/internal/SQLConf.scala| 7 ++- 5 files changed, 26 insertions(+), 12 deletions(-) diff --git a/docs/configuration.md b/docs/configuration.md index e771c323d369..7fef09781a15 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -88,10 +88,14 @@ val sc = new SparkContext(new SparkConf()) {% endhighlight %} Then, you can supply configuration values at runtime: -{% highlight bash %} -./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false - --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar -{% endhighlight %} +```sh +./bin/spark-submit \ + --name "My app" \ + --master local[4] \ + --conf spark.eventLog.enabled=false \ + --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \ + myApp.jar +``` The Spark shell and [`spark-submit`](submitting-applications.html) tool support two ways to load configurations dynamically. The first is command line options, @@ -3708,9 +3712,9 @@ Also, you can modify or add configurations at runtime: GPUs and other accelerators have been widely used for accelerating special workloads, e.g., deep learning and signal processing. Spark now supports requesting and scheduling generic resources, such as GPUs, with a few caveats. The current implementation requires that the resource have addresses that can be allocated by the scheduler. It requires your cluster manager to support and be properly configured with the resources. -There are configurations available to request resources for the driver: spark.driver.resource.{resourceName}.amount, request resources for the executor(s): spark.executor.resource.{resourceName}.amount and specify the requirements for each task: spark.task.resource.{resourceName}.amount. The spark.driver.resource.{resourceName}.discoveryScript config is required on YARN, Kubernetes and a client side Driver on Spark Standalone. spa [...] +There are configurations available to request resources for the driver: `spark.driver.resource.{resourceName}.amount`, request resources for the executor(s): `spark.executor.resource.{resourceName}.amount` and specify the requirements for each task: `spark.task.resource.{resourceName}.amount`. The `spark.driver.resource.{resourceName}.discoveryScript` config is required on YARN, Kubernetes and a client side Driver on Spark Standalone. `spark.executor.resource.{resourceName}.discoveryScri [...] -Spark will use the configurations specified to first request containers with the corresponding resources from the cluster manager. Once it gets the container, Spark launches an Executor in that container which will discover what resources the container has and the addresses associated with each resource. The Executor will register with the Driver and report back the resources available to that Executor. The Spark scheduler can then schedule tasks to each Executor and assign specific reso [...] +Spark will use the configurations specified to first request containers with the corresponding resources from the cluster manager. Once it gets the container, Spark launches an Executor in that container which will discover what resources the container has and the addresses associated with each resource. The Executor will register with the Driver
(spark) branch master updated: [MINOR][DOCS] Remove unneeded comments from global.html
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 901850cab748 [MINOR][DOCS] Remove unneeded comments from global.html 901850cab748 is described below commit 901850cab748fae6b9ebab88eda82f6314a2691c Author: Nicholas Chammas AuthorDate: Mon Jan 29 10:05:21 2024 +0900 [MINOR][DOCS] Remove unneeded comments from global.html ### What changes were proposed in this pull request? Remove some unneeded comments from global.html. ### Why are the changes needed? They are just noise. They don't appear to do anything (they are not Jekyll directives). For the record, Internet Explorer 8, 9, and 10 were [sunset in 2020][1]. Internet Explorer 7 was sunset [last year][2]. [1]: https://learn.microsoft.com/en-us/lifecycle/products/internet-explorer-10 [2]: https://learn.microsoft.com/en-us/lifecycle/products/internet-explorer-7 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I built the docs with `SKIP_API=1` and confirmed nothing broke. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44921 from nchammas/global-html-comments. Authored-by: Nicholas Chammas Signed-off-by: Hyukjin Kwon --- docs/_layouts/global.html | 11 +-- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index 6acffe8a405d..c61c9349a6d7 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -1,9 +1,5 @@ - - - - - + @@ -53,12 +49,7 @@ - - - {{site.SPARK_VERSION_SHORT}} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (89d86e617da2 -> 02c945d6ab61)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 89d86e617da2 [SPARK-46873][SS] Do not recreate new StreamingQueryManager for the same Spark Session add 02c945d6ab61 [SPARK-46889][CORE] Validate `spark.master.ui.decommission.allow.mode` setting No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/config/UI.scala | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46873][SS] Do not recreate new StreamingQueryManager for the same Spark Session
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 89d86e617da2 [SPARK-46873][SS] Do not recreate new StreamingQueryManager for the same Spark Session 89d86e617da2 is described below commit 89d86e617da2d0346cdf862d975a87c24c9a9f5c Author: Wei Liu AuthorDate: Mon Jan 29 08:53:15 2024 +0900 [SPARK-46873][SS] Do not recreate new StreamingQueryManager for the same Spark Session ### What changes were proposed in this pull request? In Scala, there is only one streaming query manager for one spark session: ``` scala> spark.streams val res0: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba scala> spark.streams val res1: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba scala> spark.streams val res2: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba scala> spark.streams val res3: org.apache.spark.sql.streaming.StreamingQueryManager = org.apache.spark.sql.streaming.StreamingQueryManager46bb8cba ``` In Python, this is currently false for both connect and vanilla spark: ``` >>> spark.streams >>> spark.streams >>> spark.streams >>> spark.streams ``` This PR makes the spark session reuse existing streaming query manager ### Why are the changes needed? Python should align Scala behavior. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #44898 from WweiL/SPARK-46873-sqm-reuse. Authored-by: Wei Liu Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/connect/session.py| 5 - python/pyspark/sql/session.py| 5 - python/pyspark/sql/tests/streaming/test_streaming.py | 6 ++ 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/connect/session.py b/python/pyspark/sql/connect/session.py index 9700f72cdcf1..19f66072133c 100644 --- a/python/pyspark/sql/connect/session.py +++ b/python/pyspark/sql/connect/session.py @@ -704,7 +704,10 @@ class SparkSession: @property def streams(self) -> "StreamingQueryManager": -return StreamingQueryManager(self) +if hasattr(self, "_sqm"): +return self._sqm +self._sqm: StreamingQueryManager = StreamingQueryManager(self) +return self._sqm streams.__doc__ = PySparkSession.streams.__doc__ diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py index 6265f4fbe809..b813cf17ced3 100644 --- a/python/pyspark/sql/session.py +++ b/python/pyspark/sql/session.py @@ -1825,7 +1825,10 @@ class SparkSession(SparkConversionMixin): """ from pyspark.sql.streaming import StreamingQueryManager -return StreamingQueryManager(self._jsparkSession.streams()) +if hasattr(self, "_sqm"): +return self._sqm +self._sqm: StreamingQueryManager = StreamingQueryManager(self._jsparkSession.streams()) +return self._sqm def stop(self) -> None: """ diff --git a/python/pyspark/sql/tests/streaming/test_streaming.py b/python/pyspark/sql/tests/streaming/test_streaming.py index a7c22897096b..31486feae156 100644 --- a/python/pyspark/sql/tests/streaming/test_streaming.py +++ b/python/pyspark/sql/tests/streaming/test_streaming.py @@ -294,6 +294,12 @@ class StreamingTestsMixin: self.assertIsInstance(exception, StreamingQueryException) self._assert_exception_tree_contains_msg(exception, "ZeroDivisionError") +def test_query_manager_no_recreation(self): +# SPARK-46873: There should not be a new StreamingQueryManager created every time +# spark.streams is called. +for i in range(5): +self.assertTrue(self.spark.streams == self.spark.streams) + def test_query_manager_get(self): df = self.spark.readStream.format("rate").load() for q in self.spark.streams.active: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46892][BUILD] Upgrade dropwizard metrics 4.2.25
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d74aecd11dcd [SPARK-46892][BUILD] Upgrade dropwizard metrics 4.2.25 d74aecd11dcd is described below commit d74aecd11dcd1c8414b662457e49b6001395bb8d Author: panbingkun AuthorDate: Sun Jan 28 12:12:02 2024 -0800 [SPARK-46892][BUILD] Upgrade dropwizard metrics 4.2.25 ### What changes were proposed in this pull request? The pr aims to upgrade dropwizard metrics from `4.2.21` to `4.2.25`. ### Why are the changes needed? The last update occurred 3 months ago. - The new version bringes some bug fixes: Fix IndexOutOfBoundsException in Jetty 9, 10, 11, 12 InstrumentedHandler https://github.com/dropwizard/metrics/pull/3912 - The full version release notes: https://github.com/dropwizard/metrics/releases/tag/v4.2.25 https://github.com/dropwizard/metrics/releases/tag/v4.2.24 https://github.com/dropwizard/metrics/releases/tag/v4.2.23 https://github.com/dropwizard/metrics/releases/tag/v4.2.22 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44918 from panbingkun/SPARK-46892. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 10 +- pom.xml | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 71f9ac8665b0..09291de50350 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -185,11 +185,11 @@ log4j-core/2.22.1//log4j-core-2.22.1.jar log4j-slf4j2-impl/2.22.1//log4j-slf4j2-impl-2.22.1.jar logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar lz4-java/1.8.0//lz4-java-1.8.0.jar -metrics-core/4.2.21//metrics-core-4.2.21.jar -metrics-graphite/4.2.21//metrics-graphite-4.2.21.jar -metrics-jmx/4.2.21//metrics-jmx-4.2.21.jar -metrics-json/4.2.21//metrics-json-4.2.21.jar -metrics-jvm/4.2.21//metrics-jvm-4.2.21.jar +metrics-core/4.2.25//metrics-core-4.2.25.jar +metrics-graphite/4.2.25//metrics-graphite-4.2.25.jar +metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar +metrics-json/4.2.25//metrics-json-4.2.25.jar +metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar minlog/1.3.0//minlog-1.3.0.jar netty-all/4.1.106.Final//netty-all-4.1.106.Final.jar netty-buffer/4.1.106.Final//netty-buffer-4.1.106.Final.jar diff --git a/pom.xml b/pom.xml index d4e8a7db71de..a5f2b6f74b7a 100644 --- a/pom.xml +++ b/pom.xml @@ -156,7 +156,7 @@ If you change codahale.metrics.version, you also need to change the link to metrics.dropwizard.io in docs/monitoring.md. --> -4.2.21 +4.2.25 1.11.3 1.12.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org