date:20230905

[spark] branch master updated (6c885a7cf57d -> 5fff2427ccb8)

2023-09-05 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6c885a7cf57d [SPARK-45074][PYTHON][CONNECT] `DataFrame.{sort, 
sortWithinPartitions}` support column ordinals
 add 5fff2427ccb8 [SPARK-43241][PS][FOLLOWUP] Add migration guide for 
behavior change

No new revisions were added by this update.

Summary of changes:
 python/docs/source/migration_guide/pyspark_upgrade.rst | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 40943c2748fd [SPARK-45072][CONNECT] Fix outer scopes for ammonite 
classes
40943c2748fd is described below

commit 40943c2748fdd28d970d017cb8ee86c294ee62df
Author: Herman van Hovell 
AuthorDate: Tue Sep 5 15:35:12 2023 +0200

[SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

### What changes were proposed in this pull request?
Ammonite places all user code inside Helper classes which are nested inside 
the class it creates for each command. This PR adds a custom code class wrapper 
for the Ammonite REPL. It makes sure the Helper classes generated by ammonite 
are always registered as an outer scope immediately. This way we can 
instantiate classes defined inside the Helper class, even when we execute Spark 
code as part of the Helper's constructor.

### Why are the changes needed?
When you currently define a class and execute a Spark command using that 
class inside the same cell/line this will fail with an NullPointerException. 
The reason for that is that we cannot resolve the outer scope needed to 
instantiate the class. This PR fixes that issue. The following code will now 
execute successfully (include the curly braces):
```scala
{
  case class Thing(val value: String)
  val r = (0 to 10).map( value => Thing(value.toString) )
  spark.createDataFrame(r)
}
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
I added more tests to the `ReplE2ESuite`.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42807 from hvanhovell/SPARK-45072.

Authored-by: Herman van Hovell 
Signed-off-by: Herman van Hovell 
---
 .../apache/spark/sql/application/ConnectRepl.scala | 29 +++--
 .../spark/sql/application/ReplE2ESuite.scala   | 48 ++
 .../CheckConnectJvmClientCompatibility.scala   |  6 +++
 3 files changed, 71 insertions(+), 12 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
index e6ada566398c..0360a4057886 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
@@ -22,7 +22,8 @@ import java.util.concurrent.Semaphore
 import scala.util.control.NonFatal
 
 import ammonite.compiler.CodeClassWrapper
-import ammonite.util.Bind
+import ammonite.compiler.iface.CodeWrapper
+import ammonite.util.{Bind, Imports, Name, Util}
 
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.sql.SparkSession
@@ -94,8 +95,8 @@ object ConnectRepl {
 val main = ammonite.Main(
   welcomeBanner = Option(splash),
   predefCode = predefCode,
-  replCodeWrapper = CodeClassWrapper,
-  scriptCodeWrapper = CodeClassWrapper,
+  replCodeWrapper = ExtendedCodeClassWrapper,
+  scriptCodeWrapper = ExtendedCodeClassWrapper,
   inputStream = inputStream,
   outputStream = outputStream,
   errorStream = errorStream)
@@ -107,3 +108,25 @@ object ConnectRepl {
 }
   }
 }
+
+/**
+ * [[CodeWrapper]] that makes sure new Helper classes are always registered as 
an outer scope.
+ */
+@DeveloperApi
+object ExtendedCodeClassWrapper extends CodeWrapper {
+  override def wrapperPath: Seq[Name] = CodeClassWrapper.wrapperPath
+  override def apply(
+  code: String,
+  source: Util.CodeSource,
+  imports: Imports,
+  printCode: String,
+  indexedWrapper: Name,
+  extraCode: String): (String, String, Int) = {
+val (top, bottom, level) =
+  CodeClassWrapper(code, source, imports, printCode, indexedWrapper, 
extraCode)
+// Make sure we register the Helper before anything else, so outer scopes 
work as expected.
+val augmentedTop = top +
+  
"\norg.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)\n"
+(augmentedTop, bottom, level)
+  }
+}
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/application/ReplE2ESuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/application/ReplE2ESuite.scala
index 4106d298dbe2..5bb8cbf3543b 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/application/ReplE2ESuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/application/ReplE2ESuite.scala
@@ -79,12 +79,10 @@ class ReplE2ESuite extends RemoteSparkSession with 
BeforeAndAfterEach {
 
   override def afterEach(): Unit = {
 semaphore.drainPermits()

[spark] branch branch-3.5 updated: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 9389a2ccacce [SPARK-45072][CONNECT] Fix outer scopes for ammonite 
classes
9389a2ccacce is described below

commit 9389a2ccacce61cbbbc9bbb1b19b2825d932ba11
Author: Herman van Hovell 
AuthorDate: Tue Sep 5 15:35:12 2023 +0200

[SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

### What changes were proposed in this pull request?
Ammonite places all user code inside Helper classes which are nested inside 
the class it creates for each command. This PR adds a custom code class wrapper 
for the Ammonite REPL. It makes sure the Helper classes generated by ammonite 
are always registered as an outer scope immediately. This way we can 
instantiate classes defined inside the Helper class, even when we execute Spark 
code as part of the Helper's constructor.

### Why are the changes needed?
When you currently define a class and execute a Spark command using that 
class inside the same cell/line this will fail with an NullPointerException. 
The reason for that is that we cannot resolve the outer scope needed to 
instantiate the class. This PR fixes that issue. The following code will now 
execute successfully (include the curly braces):
```scala
{
  case class Thing(val value: String)
  val r = (0 to 10).map( value => Thing(value.toString) )
  spark.createDataFrame(r)
}
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
I added more tests to the `ReplE2ESuite`.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42807 from hvanhovell/SPARK-45072.

Authored-by: Herman van Hovell 
Signed-off-by: Herman van Hovell 
(cherry picked from commit 40943c2748fdd28d970d017cb8ee86c294ee62df)
Signed-off-by: Herman van Hovell 
---
 .../apache/spark/sql/application/ConnectRepl.scala | 29 +++--
 .../spark/sql/application/ReplE2ESuite.scala   | 48 ++
 .../CheckConnectJvmClientCompatibility.scala   |  6 +++
 3 files changed, 71 insertions(+), 12 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
index e6ada566398c..0360a4057886 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/application/ConnectRepl.scala
@@ -22,7 +22,8 @@ import java.util.concurrent.Semaphore
 import scala.util.control.NonFatal
 
 import ammonite.compiler.CodeClassWrapper
-import ammonite.util.Bind
+import ammonite.compiler.iface.CodeWrapper
+import ammonite.util.{Bind, Imports, Name, Util}
 
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.sql.SparkSession
@@ -94,8 +95,8 @@ object ConnectRepl {
 val main = ammonite.Main(
   welcomeBanner = Option(splash),
   predefCode = predefCode,
-  replCodeWrapper = CodeClassWrapper,
-  scriptCodeWrapper = CodeClassWrapper,
+  replCodeWrapper = ExtendedCodeClassWrapper,
+  scriptCodeWrapper = ExtendedCodeClassWrapper,
   inputStream = inputStream,
   outputStream = outputStream,
   errorStream = errorStream)
@@ -107,3 +108,25 @@ object ConnectRepl {
 }
   }
 }
+
+/**
+ * [[CodeWrapper]] that makes sure new Helper classes are always registered as 
an outer scope.
+ */
+@DeveloperApi
+object ExtendedCodeClassWrapper extends CodeWrapper {
+  override def wrapperPath: Seq[Name] = CodeClassWrapper.wrapperPath
+  override def apply(
+  code: String,
+  source: Util.CodeSource,
+  imports: Imports,
+  printCode: String,
+  indexedWrapper: Name,
+  extraCode: String): (String, String, Int) = {
+val (top, bottom, level) =
+  CodeClassWrapper(code, source, imports, printCode, indexedWrapper, 
extraCode)
+// Make sure we register the Helper before anything else, so outer scopes 
work as expected.
+val augmentedTop = top +
+  
"\norg.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)\n"
+(augmentedTop, bottom, level)
+  }
+}
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/application/ReplE2ESuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/application/ReplE2ESuite.scala
index 4106d298dbe2..5bb8cbf3543b 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/application/ReplE2ESuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/application/ReplE2ESuite.scala
@@ -79,12 +79,10 @@ class ReplE2ESuite exten

[spark] branch master updated: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e0a6af988df3 [SPARK-45082][DOC] Review and fix issues in API docs for 
3.5.0
e0a6af988df3 is described below

commit e0a6af988df3f52e95d46ac4c333825d2940065f
Author: Yuanjian Li 
AuthorDate: Tue Sep 5 12:45:36 2023 -0700

[SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

### What changes were proposed in this pull request?

Compare the 3.4 API doc with the 3.5 RC3 cut. Fix the following issues:

- Remove the leaking class/object in API doc

### Why are the changes needed?
Fix the issues in the Spark 3.5.0 release API docs.

### Does this PR introduce _any_ user-facing change?
No, API doc changes only.

### How was this patch tested?
Manually test.

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #42819 from xuanyuanking/SPARK-45082.

Authored-by: Yuanjian Li 
Signed-off-by: Yuanjian Li 
---
 .../scala/org/apache/spark/SparkBuildInfo.scala|  2 +-
 .../org/apache/spark/util/SparkClassUtils.scala|  4 +--
 .../apache/spark/util/SparkCollectionUtils.scala   |  4 +--
 .../org/apache/spark/util/SparkErrorUtils.scala|  2 +-
 .../org/apache/spark/util/SparkSerDeUtils.scala|  4 +--
 .../org/apache/spark/sql/avro/CustomDecimal.scala  |  4 +--
 .../org/apache/spark/util/StubClassLoader.scala|  4 +--
 .../spark/sql/errors/CompilationErrors.scala   |  2 +-
 .../spark/sql/types/DataTypeExpression.scala   | 30 +++---
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  2 +-
 10 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/SparkBuildInfo.scala 
b/common/utils/src/main/scala/org/apache/spark/SparkBuildInfo.scala
index 23f671f9d764..ebc62460d231 100644
--- a/common/utils/src/main/scala/org/apache/spark/SparkBuildInfo.scala
+++ b/common/utils/src/main/scala/org/apache/spark/SparkBuildInfo.scala
@@ -18,7 +18,7 @@ package org.apache.spark
 
 import java.util.Properties
 
-object SparkBuildInfo {
+private[spark] object SparkBuildInfo {
 
   val (
 spark_version: String,
diff --git 
a/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala
index 679d546d04c9..5984eaee42e7 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala
@@ -20,7 +20,7 @@ import java.util.Random
 
 import scala.util.Try
 
-trait SparkClassUtils {
+private[spark] trait SparkClassUtils {
   val random = new Random()
 
   def getSparkClassLoader: ClassLoader = getClass.getClassLoader
@@ -80,4 +80,4 @@ trait SparkClassUtils {
   }
 }
 
-object SparkClassUtils extends SparkClassUtils
+private[spark] object SparkClassUtils extends SparkClassUtils
diff --git 
a/common/utils/src/main/scala/org/apache/spark/util/SparkCollectionUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/SparkCollectionUtils.scala
index 7fecc9ccb664..be8282db31be 100644
--- 
a/common/utils/src/main/scala/org/apache/spark/util/SparkCollectionUtils.scala
+++ 
b/common/utils/src/main/scala/org/apache/spark/util/SparkCollectionUtils.scala
@@ -18,7 +18,7 @@ package org.apache.spark.sql.catalyst.util
 
 import scala.collection.immutable
 
-trait SparkCollectionUtils {
+private[spark] trait SparkCollectionUtils {
   /**
* Same function as `keys.zipWithIndex.toMap`, but has perf gain.
*/
@@ -34,4 +34,4 @@ trait SparkCollectionUtils {
   }
 }
 
-object SparkCollectionUtils extends SparkCollectionUtils
+private[spark] object SparkCollectionUtils extends SparkCollectionUtils
diff --git 
a/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala
index 97a07984a228..8194d1e42417 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala
@@ -90,4 +90,4 @@ private[spark] trait SparkErrorUtils extends Logging {
   }
 }
 
-object SparkErrorUtils extends SparkErrorUtils
+private[spark] object SparkErrorUtils extends SparkErrorUtils
diff --git 
a/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala
index 9b6174c47bde..2cc14fea5f30 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala
@@ -18,7 +18,7 @@ package org.apache.spark.util
 
 import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
ObjectInpu

[spark] branch branch-3.5 updated: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new dc6af11082ee [SPARK-45082][DOC] Review and fix issues in API docs for 
3.5.0
dc6af11082ee is described below

commit dc6af11082ee39666aa6ae64df0f32dc142bf807
Author: Yuanjian Li 
AuthorDate: Tue Sep 5 12:45:36 2023 -0700

[SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

### What changes were proposed in this pull request?

Compare the 3.4 API doc with the 3.5 RC3 cut. Fix the following issues:

- Remove the leaking class/object in API doc

### Why are the changes needed?
Fix the issues in the Spark 3.5.0 release API docs.

### Does this PR introduce _any_ user-facing change?
No, API doc changes only.

### How was this patch tested?
Manually test.

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #42819 from xuanyuanking/SPARK-45082.

Authored-by: Yuanjian Li 
Signed-off-by: Yuanjian Li 
(cherry picked from commit e0a6af988df3f52e95d46ac4c333825d2940065f)
Signed-off-by: Yuanjian Li 
---
 .../scala/org/apache/spark/SparkBuildInfo.scala|  2 +-
 .../org/apache/spark/util/SparkClassUtils.scala|  4 +--
 .../apache/spark/util/SparkCollectionUtils.scala   |  4 +--
 .../org/apache/spark/util/SparkErrorUtils.scala|  2 +-
 .../org/apache/spark/util/SparkSerDeUtils.scala|  4 +--
 .../org/apache/spark/sql/avro/CustomDecimal.scala  |  4 +--
 .../org/apache/spark/util/StubClassLoader.scala|  4 +--
 .../spark/sql/errors/CompilationErrors.scala   |  2 +-
 .../spark/sql/types/DataTypeExpression.scala   | 30 +++---
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  2 +-
 10 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/SparkBuildInfo.scala 
b/common/utils/src/main/scala/org/apache/spark/SparkBuildInfo.scala
index 23f671f9d764..ebc62460d231 100644
--- a/common/utils/src/main/scala/org/apache/spark/SparkBuildInfo.scala
+++ b/common/utils/src/main/scala/org/apache/spark/SparkBuildInfo.scala
@@ -18,7 +18,7 @@ package org.apache.spark
 
 import java.util.Properties
 
-object SparkBuildInfo {
+private[spark] object SparkBuildInfo {
 
   val (
 spark_version: String,
diff --git 
a/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala
index 679d546d04c9..5984eaee42e7 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/SparkClassUtils.scala
@@ -20,7 +20,7 @@ import java.util.Random
 
 import scala.util.Try
 
-trait SparkClassUtils {
+private[spark] trait SparkClassUtils {
   val random = new Random()
 
   def getSparkClassLoader: ClassLoader = getClass.getClassLoader
@@ -80,4 +80,4 @@ trait SparkClassUtils {
   }
 }
 
-object SparkClassUtils extends SparkClassUtils
+private[spark] object SparkClassUtils extends SparkClassUtils
diff --git 
a/common/utils/src/main/scala/org/apache/spark/util/SparkCollectionUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/SparkCollectionUtils.scala
index 7fecc9ccb664..be8282db31be 100644
--- 
a/common/utils/src/main/scala/org/apache/spark/util/SparkCollectionUtils.scala
+++ 
b/common/utils/src/main/scala/org/apache/spark/util/SparkCollectionUtils.scala
@@ -18,7 +18,7 @@ package org.apache.spark.sql.catalyst.util
 
 import scala.collection.immutable
 
-trait SparkCollectionUtils {
+private[spark] trait SparkCollectionUtils {
   /**
* Same function as `keys.zipWithIndex.toMap`, but has perf gain.
*/
@@ -34,4 +34,4 @@ trait SparkCollectionUtils {
   }
 }
 
-object SparkCollectionUtils extends SparkCollectionUtils
+private[spark] object SparkCollectionUtils extends SparkCollectionUtils
diff --git 
a/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala
index 97a07984a228..8194d1e42417 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/SparkErrorUtils.scala
@@ -90,4 +90,4 @@ private[spark] trait SparkErrorUtils extends Logging {
   }
 }
 
-object SparkErrorUtils extends SparkErrorUtils
+private[spark] object SparkErrorUtils extends SparkErrorUtils
diff --git 
a/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala
index 9b6174c47bde..2cc14fea5f30 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala
@@ -18,7 +

[spark] 01/01: Preparing Spark release v3.5.0-rc4

2023-09-05 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to tag v3.5.0-rc4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit c2939589a29dd0d6a2d3d31a8d833877a37ee02a
Author: Yuanjian Li 
AuthorDate: Tue Sep 5 20:15:44 2023 +

Preparing Spark release v3.5.0-rc4
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 45 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 66faa8031c45..1c093a4a9804 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.5.1
+Version: 3.5.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 45b68dd81cb9..a0aca22eab91 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 1b1a8d0066f8..ce180f49ff12 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 54c10a05eed2..8da48076a43a 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 92bf5bc07854..48e64d21a58b 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 3003927e713c..2bbacbe71a43 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SN

[spark] tag v3.5.0-rc4 created (now c2939589a29d)

2023-09-05 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a change to tag v3.5.0-rc4
in repository https://gitbox.apache.org/repos/asf/spark.git


  at c2939589a29d (commit)
This tag includes the following new commits:

 new c2939589a29d Preparing Spark release v3.5.0-rc4

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated (dc6af11082ee -> 66e07c97b7a5)

2023-09-05 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from dc6af11082ee [SPARK-45082][DOC] Review and fix issues in API docs for 
3.5.0
 add c2939589a29d Preparing Spark release v3.5.0-rc4
 new 66e07c97b7a5 Preparing development version 3.5.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing development version 3.5.1-SNAPSHOT

2023-09-05 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 66e07c97b7a5ac454e96a7b0b7ed83d942302991
Author: Yuanjian Li 
AuthorDate: Tue Sep 5 20:15:48 2023 +

Preparing development version 3.5.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 45 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 1c093a4a9804..66faa8031c45 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.5.0
+Version: 3.5.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a0aca22eab91..45b68dd81cb9 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index ce180f49ff12..1b1a8d0066f8 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 8da48076a43a..54c10a05eed2 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 48e64d21a58b..92bf5bc07854 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 2bbacbe71a43..3003927e713c 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12

svn commit: r63805 - /dev/spark/v3.5.0-rc4-bin/

2023-09-05 Thread liyuanjian

Author: liyuanjian
Date: Tue Sep  5 21:17:23 2023
New Revision: 63805

Log:
Apache Spark v3.5.0-rc4

Added:
dev/spark/v3.5.0-rc4-bin/
dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz   (with props)
dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz.asc
dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz.sha512
dev/spark/v3.5.0-rc4-bin/pyspark-3.5.0.tar.gz   (with props)
dev/spark/v3.5.0-rc4-bin/pyspark-3.5.0.tar.gz.asc
dev/spark/v3.5.0-rc4-bin/pyspark-3.5.0.tar.gz.sha512
dev/spark/v3.5.0-rc4-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.5.0-rc4-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.5.0-rc4-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.5.0-rc4-bin/spark-3.5.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.5.0-rc4-bin/spark-3.5.0-bin-hadoop3.tgz.asc
dev/spark/v3.5.0-rc4-bin/spark-3.5.0-bin-hadoop3.tgz.sha512
dev/spark/v3.5.0-rc4-bin/spark-3.5.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.5.0-rc4-bin/spark-3.5.0-bin-without-hadoop.tgz.asc
dev/spark/v3.5.0-rc4-bin/spark-3.5.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.5.0-rc4-bin/spark-3.5.0.tgz   (with props)
dev/spark/v3.5.0-rc4-bin/spark-3.5.0.tgz.asc
dev/spark/v3.5.0-rc4-bin/spark-3.5.0.tgz.sha512

Added: dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz.asc
==
--- dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz.asc (added)
+++ dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz.asc Tue Sep  5 21:17:23 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJKBAABCgA0FiEE/Drjp+qhusmHcIQOfhq8xTqqIhYFAmT3moAWHGxpeXVhbmpp
+YW5AYXBhY2hlLm9yZwAKCRB+GrzFOqoiFrCeD/9BPI36Ut/Sc1RUzAXbzP7bn7H0
+Pcuu9Prm5uz1x+9Dqv5ybYsr1PsgINlWa33J6aK1i/J30k6kUybMA0b8dQ+BmDbL
+xwIUcrfqX8+DKlBV5oc0HlLFpBpfqHXk5aVe616Cm1XuOToDxU+30LZEaL9/w6kz
++TnLKv+oS7kzGd1UJVf4qJQftPrekuNuxgk9mx0q0/fMdOtZBBE3RrWzdu1w2soN
+ro082r7SkY1alOfMN8kgsGguhOCBmiLQRUKw8PXmZoAdo92wxpdh8+gWiv+JfRhD
+sol78cmuRvT8pOg1bOEpCMl0J6ITmDyg6a+DcsmkZ94VQblpvKXhGixIIAbiaTc0
+njF8HuZWBC0EaO4EvV6oV6k+T7L/ysj/Vz1dUuFYVGUtrDK/92qzLYQBzmpS6va3
+0Il5h9JHOwEMFOkhkSSi129XDpdO+SDrWNpA9kR7/uqJmo8VYa43Gc0WWiHu75ni
+e1YxxtD8aOnHhUY1LU2LO58w7KcgOmuSHdbDjHSBsRe+cu7LH2zv0dSgYpwCP42o
+whOSaNJPW49WILZVWRC0T1sP5kMSNJjxPVr+mgawia6ImvoWQvXloo48T0QpKmhf
+rDv1JDFlBuXPw55Sq69BZhHAa5+vMZfwW/kSgA7hVu3vKEVttrAB3w0OyKlSxenP
+Fepv5E7YkD6/cr7ALQ==
+=kzYJ
+-END PGP SIGNATURE-

Added: dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz.sha512
==
--- dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz.sha512 (added)
+++ dev/spark/v3.5.0-rc4-bin/SparkR_3.5.0.tar.gz.sha512 Tue Sep  5 21:17:23 2023
@@ -0,0 +1 @@
+16b93fd63aa30221d5edcdcf0a56c4d40c99d65cebcd818cdc2a4c1dc5ead043511012e3f58bfc3352281ff39557d29e507cef5a574e06c350364debe3f33583
  SparkR_3.5.0.tar.gz

Added: dev/spark/v3.5.0-rc4-bin/pyspark-3.5.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.5.0-rc4-bin/pyspark-3.5.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.5.0-rc4-bin/pyspark-3.5.0.tar.gz.asc
==
--- dev/spark/v3.5.0-rc4-bin/pyspark-3.5.0.tar.gz.asc (added)
+++ dev/spark/v3.5.0-rc4-bin/pyspark-3.5.0.tar.gz.asc Tue Sep  5 21:17:23 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJKBAABCgA0FiEE/Drjp+qhusmHcIQOfhq8xTqqIhYFAmT3moIWHGxpeXVhbmpp
+YW5AYXBhY2hlLm9yZwAKCRB+GrzFOqoiFltND/9r1O0WaGcRnqzjJZkU0clFR5Fv
+Z8A3XugA01fC6Z/Asq2/LkW8vCmTDB+1Daj4w5/f3rQO8r7+KlO92whx3rWiyEpQ
+7xgZ2WIbkidtXfmu8XmYaXrSdN2D2MsyfJDMhB7sgI3upz/auwCtPRdyxb99GfAq
+n/SglIQW7icuWo4XAHlQ1QjCFJMY4a1Fj8KfAfahoLswj3jTNxGMrVBMrNdHKPJg
+9ldh3rdd8IQKRqCaixLUKnpWerQhQK/IjUESNHCC2nEQAsg3KbQjvzDeUWaMFF/o
+fHJxMGZcOx0lEQLwI787NDxHCtUWDABImYJ7QgcrVtmIHEQOEdvyB61mrqfbXi8k
+DW+5ZVfz3m9KoMg+tczNnEJgsr/ddVFYgL2wHjerEcV0klHEeYyizkjzLzpX2tTs
+vNqB/01SvVXoH9Td02+I3wXTBs/zcS/aP6U9VrI2rOWrxbZJkoN+0pelIQi8JXTR
+knd2qDdFKnyee3vSPxxGj+rZztgKki3RN2DCyEsR7kZKseOq1Zj/+klBE0DphjnL
+LRUPH+0wAV7JeovSsrVeSNM3COPVzDTRgJZnA37y8acnnXG3JmIYNraN+IGAD+Hq
+0mOIBKjCjzgeHOhR1s1rQrUJ3HhY9Hns0R3vpT//AlKlhCZrDdJ/OO3J3c3twY9l
+dqsqae7TFAFUa5qaAw==
+=Ziwk
+-END PGP SIGNATURE-

Added: dev/spark/v3.5.0-rc4-bin/pyspark-3.5.0.tar.gz.sha512
==
--- dev/spark/v3.5.0-rc4-bin/pyspark-3.5.0.tar.gz.sha512 (added)
+++

svn commit: r63807 - in /dev/spark/v3.5.0-rc4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-09-05 Thread liyuanjian

Author: liyuanjian
Date: Tue Sep  5 23:50:07 2023
New Revision: 63807

Log:
Apache Spark v3.5.0-rc4 docs


[This commit notification would consist of 4517 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e0a6af988df3 -> ba35140ad166)

2023-09-05 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e0a6af988df3 [SPARK-45082][DOC] Review and fix issues in API docs for 
3.5.0
 add ba35140ad166 [SPARK-45083][PYTHON][DOCS] Refine the docstring of 
function `min`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/functions.py | 62 +++--
 1 file changed, 59 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 16e813cecd5 [SPARK-45071][SQL] Optimize the processing speed of 
`BinaryArithmetic#dataType` when processing multi-column data
16e813cecd5 is described below

commit 16e813cecd55490a71ef6c05fca2209fbdae078f
Author: ming95 <505306...@qq.com>
AuthorDate: Wed Sep 6 11:38:30 2023 +0800

[SPARK-45071][SQL] Optimize the processing speed of 
`BinaryArithmetic#dataType` when processing multi-column data

### What changes were proposed in this pull request?

Since `BinaryArithmetic#dataType` will recursively process the datatype of 
each node, the driver will be very slow when multiple columns are processed.

For example, the following code:
```scala
import spark.implicits._
import scala.util.Random
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.sum
import org.apache.spark.sql.types.{StructType, StructField, IntegerType}

val N = 30
val M = 100

val columns = Seq.fill(N)(Random.alphanumeric.take(8).mkString)
val data = Seq.fill(M)(Seq.fill(N)(Random.nextInt(16) - 5))

val schema = StructType(columns.map(StructField(_, IntegerType)))
val rdd = spark.sparkContext.parallelize(data.map(Row.fromSeq(_)))
val df = spark.createDataFrame(rdd, schema)
val colExprs = columns.map(sum(_))

// gen a new column , and add the other 30 column
df.withColumn("new_col_sum", expr(columns.mkString(" + ")))
```

This code will take a few minutes for the driver to execute in the spark3.4 
version, but only takes a few seconds to execute in the spark3.2 version. 
Related issue: [SPARK-39316](https://github.com/apache/spark/pull/36698)

### Why are the changes needed?

Optimize the processing speed of `BinaryArithmetic#dataType` when 
processing multi-column data

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

manual testing

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #42804 from ming95/SPARK-45071.

Authored-by: ming95 <505306...@qq.com>
Signed-off-by: Yuming Wang 
---
 .../apache/spark/sql/catalyst/expressions/arithmetic.scala   | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 2d9bccc0854..a556ac9f129 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -219,6 +219,12 @@ abstract class BinaryArithmetic extends BinaryOperator
 
   protected val evalMode: EvalMode.Value
 
+  private lazy val internalDataType: DataType = (left.dataType, 
right.dataType) match {
+case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
+  resultDecimalType(p1, s1, p2, s2)
+case _ => left.dataType
+  }
+
   protected def failOnError: Boolean = evalMode match {
 // The TRY mode executes as if it would fail on errors, except that it 
would capture the errors
 // and return null results.
@@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends BinaryOperator
 case _ => super.checkInputDataTypes()
   }
 
-  override def dataType: DataType = (left.dataType, right.dataType) match {
-case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
-  resultDecimalType(p1, s1, p2, s2)
-case _ => left.dataType
-  }
+  override def dataType: DataType = internalDataType
 
   // When `spark.sql.decimalOperations.allowPrecisionLoss` is set to true, if 
the precision / scale
   // needed are out of the range of available values, the scale is reduced up 
to 6, in order to


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 699e4e59515 [SPARK-45071][SQL] Optimize the processing speed of 
`BinaryArithmetic#dataType` when processing multi-column data
699e4e59515 is described below

commit 699e4e59515864862bd86734b2eb51cc674e38be
Author: ming95 <505306...@qq.com>
AuthorDate: Wed Sep 6 11:38:30 2023 +0800

[SPARK-45071][SQL] Optimize the processing speed of 
`BinaryArithmetic#dataType` when processing multi-column data

### What changes were proposed in this pull request?

Since `BinaryArithmetic#dataType` will recursively process the datatype of 
each node, the driver will be very slow when multiple columns are processed.

For example, the following code:
```scala
import spark.implicits._
import scala.util.Random
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.sum
import org.apache.spark.sql.types.{StructType, StructField, IntegerType}

val N = 30
val M = 100

val columns = Seq.fill(N)(Random.alphanumeric.take(8).mkString)
val data = Seq.fill(M)(Seq.fill(N)(Random.nextInt(16) - 5))

val schema = StructType(columns.map(StructField(_, IntegerType)))
val rdd = spark.sparkContext.parallelize(data.map(Row.fromSeq(_)))
val df = spark.createDataFrame(rdd, schema)
val colExprs = columns.map(sum(_))

// gen a new column , and add the other 30 column
df.withColumn("new_col_sum", expr(columns.mkString(" + ")))
```

This code will take a few minutes for the driver to execute in the spark3.4 
version, but only takes a few seconds to execute in the spark3.2 version. 
Related issue: [SPARK-39316](https://github.com/apache/spark/pull/36698)

### Why are the changes needed?

Optimize the processing speed of `BinaryArithmetic#dataType` when 
processing multi-column data

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

manual testing

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #42804 from ming95/SPARK-45071.

Authored-by: ming95 <505306...@qq.com>
Signed-off-by: Yuming Wang 
(cherry picked from commit 16e813cecd55490a71ef6c05fca2209fbdae078f)
Signed-off-by: Yuming Wang 
---
 .../apache/spark/sql/catalyst/expressions/arithmetic.scala   | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 31d4d71cd40..06742ef917b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -219,6 +219,12 @@ abstract class BinaryArithmetic extends BinaryOperator
 
   protected val evalMode: EvalMode.Value
 
+  private lazy val internalDataType: DataType = (left.dataType, 
right.dataType) match {
+case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
+  resultDecimalType(p1, s1, p2, s2)
+case _ => left.dataType
+  }
+
   protected def failOnError: Boolean = evalMode match {
 // The TRY mode executes as if it would fail on errors, except that it 
would capture the errors
 // and return null results.
@@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends BinaryOperator
 case _ => super.checkInputDataTypes()
   }
 
-  override def dataType: DataType = (left.dataType, right.dataType) match {
-case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
-  resultDecimalType(p1, s1, p2, s2)
-case _ => left.dataType
-  }
+  override def dataType: DataType = internalDataType
 
   // When `spark.sql.decimalOperations.allowPrecisionLoss` is set to true, if 
the precision / scale
   // needed are out of the range of available values, the scale is reduced up 
to 6, in order to


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.4 updated: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new a96804b7542 [SPARK-45071][SQL] Optimize the processing speed of 
`BinaryArithmetic#dataType` when processing multi-column data
a96804b7542 is described below

commit a96804b7542c1cbac8e274dba2c322b84f47fa2d
Author: ming95 <505306...@qq.com>
AuthorDate: Wed Sep 6 11:38:30 2023 +0800

[SPARK-45071][SQL] Optimize the processing speed of 
`BinaryArithmetic#dataType` when processing multi-column data

### What changes were proposed in this pull request?

Since `BinaryArithmetic#dataType` will recursively process the datatype of 
each node, the driver will be very slow when multiple columns are processed.

For example, the following code:
```scala
import spark.implicits._
import scala.util.Random
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.sum
import org.apache.spark.sql.types.{StructType, StructField, IntegerType}

val N = 30
val M = 100

val columns = Seq.fill(N)(Random.alphanumeric.take(8).mkString)
val data = Seq.fill(M)(Seq.fill(N)(Random.nextInt(16) - 5))

val schema = StructType(columns.map(StructField(_, IntegerType)))
val rdd = spark.sparkContext.parallelize(data.map(Row.fromSeq(_)))
val df = spark.createDataFrame(rdd, schema)
val colExprs = columns.map(sum(_))

// gen a new column , and add the other 30 column
df.withColumn("new_col_sum", expr(columns.mkString(" + ")))
```

This code will take a few minutes for the driver to execute in the spark3.4 
version, but only takes a few seconds to execute in the spark3.2 version. 
Related issue: [SPARK-39316](https://github.com/apache/spark/pull/36698)

### Why are the changes needed?

Optimize the processing speed of `BinaryArithmetic#dataType` when 
processing multi-column data

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

manual testing

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #42804 from ming95/SPARK-45071.

Authored-by: ming95 <505306...@qq.com>
Signed-off-by: Yuming Wang 
(cherry picked from commit 16e813cecd55490a71ef6c05fca2209fbdae078f)
Signed-off-by: Yuming Wang 
---
 .../apache/spark/sql/catalyst/expressions/arithmetic.scala   | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 88f7fabf121..1396e4d1259 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -218,6 +218,12 @@ abstract class BinaryArithmetic extends BinaryOperator
 
   protected val evalMode: EvalMode.Value
 
+  private lazy val internalDataType: DataType = (left.dataType, 
right.dataType) match {
+case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
+  resultDecimalType(p1, s1, p2, s2)
+case _ => left.dataType
+  }
+
   protected def failOnError: Boolean = evalMode match {
 // The TRY mode executes as if it would fail on errors, except that it 
would capture the errors
 // and return null results.
@@ -233,11 +239,7 @@ abstract class BinaryArithmetic extends BinaryOperator
 case _ => super.checkInputDataTypes()
   }
 
-  override def dataType: DataType = (left.dataType, right.dataType) match {
-case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
-  resultDecimalType(p1, s1, p2, s2)
-case _ => left.dataType
-  }
+  override def dataType: DataType = internalDataType
 
   // When `spark.sql.decimalOperations.allowPrecisionLoss` is set to true, if 
the precision / scale
   // needed are out of the range of available values, the scale is reduced up 
to 6, in order to


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

2023-09-05 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7e1dc10e19d [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module 
to `false`
7e1dc10e19d is described below

commit 7e1dc10e19d7a28d23dc6748fed2edb227d35781
Author: yangjie01 
AuthorDate: Wed Sep 6 11:40:45 2023 +0800

[SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

### What changes were proposed in this pull request?
This pr aims to set `shadeTestJar` of `core` module to `false` to skip 
shade `spark-core**-tests.jar` process.

### Why are the changes needed?
Before SPARK-41244, the core module used the `maven-shade-plugin` 
configuration in the `parent pom.xml` for shading, where the `shadeTestJar` 
configuration item was set to the default `false` in the `maven-shade-plugin` 
in `parent pom.xml`.

In SPARK-41244 | SPARK-41244 https://github.com/apache/spark/pull/38779, 
the core module started using a separate `maven-shade-plugin` configuration and 
set the `shadeTestJar` configuration item to `true`.

And when `shadeTestJar` is true, `maven-shade-plugin` always try to 
find(sometime is downloading) some non-existent jars:

```
[INFO] --- maven-shade-plugin:3.5.0:shade (default)  spark-core_2.12 ---
[INFO] Including org.eclipse.jetty:jetty-plus:jar:9.4.51.v20230217 in the 
shaded jar.
Downloading from gcs-maven-central-mirror: 
https://maven-central.storage-download.googleapis.com/maven2/org/eclipse/jetty/jetty-plus/9.4.51.v20230217/jetty-plus-9.4.51.v20230217-tests.jar
Downloading from central: 
https://repo.maven.apache.org/maven2/org/eclipse/jetty/jetty-plus/9.4.51.v20230217/jetty-plus-9.4.51.v20230217-tests.jar
[WARNING] Could not get tests for 
org.eclipse.jetty:jetty-plus:jar:9.4.51.v20230217:compile
[INFO] Including org.eclipse.jetty:jetty-security:jar:9.4.51.v20230217 in 
the shaded jar.
Downloading from gcs-maven-central-mirror: 
https://maven-central.storage-download.googleapis.com/maven2/org/eclipse/jetty/jetty-security/9.4.51.v20230217/jetty-security-9.4.51.v20230217-tests.jar
Downloading from central: 
https://repo.maven.apache.org/maven2/org/eclipse/jetty/jetty-security/9.4.51.v20230217/jetty-security-9.4.51.v20230217-tests.jar
[WARNING] Could not get tests for 
org.eclipse.jetty:jetty-security:jar:9.4.51.v20230217:compile
[INFO] Including org.eclipse.jetty:jetty-util:jar:9.4.51.v20230217 in the 
shaded jar.
Downloading from gcs-maven-central-mirror: 
https://maven-central.storage-download.googleapis.com/maven2/org/eclipse/jetty/jetty-util/9.4.51.v20230217/jetty-util-9.4.51.v20230217-tests.jar
Downloading from central: 
https://repo.maven.apache.org/maven2/org/eclipse/jetty/jetty-util/9.4.51.v20230217/jetty-util-9.4.51.v20230217-tests.jar
[WARNING] Could not get tests for 
org.eclipse.jetty:jetty-util:jar:9.4.51.v20230217:compile
[INFO] Including org.eclipse.jetty:jetty-server:jar:9.4.51.v20230217 in the 
shaded jar.
[INFO] Including org.eclipse.jetty:jetty-io:jar:9.4.51.v20230217 in the 
shaded jar.
Downloading from gcs-maven-central-mirror: 
https://maven-central.storage-download.googleapis.com/maven2/org/eclipse/jetty/jetty-io/9.4.51.v20230217/jetty-io-9.4.51.v20230217-tests.jar
Downloading from central: 
https://repo.maven.apache.org/maven2/org/eclipse/jetty/jetty-io/9.4.51.v20230217/jetty-io-9.4.51.v20230217-tests.jar
[WARNING] Could not get tests for 
org.eclipse.jetty:jetty-io:jar:9.4.51.v20230217:compile
[INFO] Including org.eclipse.jetty:jetty-http:jar:9.4.51.v20230217 in the 
shaded jar.
[INFO] Including org.eclipse.jetty:jetty-continuation:jar:9.4.51.v20230217 
in the shaded jar.
Downloading from gcs-maven-central-mirror: 
https://maven-central.storage-download.googleapis.com/maven2/org/eclipse/jetty/jetty-continuation/9.4.51.v20230217/jetty-continuation-9.4.51.v20230217-tests.jar
Downloading from central: 
https://repo.maven.apache.org/maven2/org/eclipse/jetty/jetty-continuation/9.4.51.v20230217/jetty-continuation-9.4.51.v20230217-tests.jar
[WARNING] Could not get tests for 
org.eclipse.jetty:jetty-continuation:jar:9.4.51.v20230217:compile
[INFO] Including org.eclipse.jetty:jetty-servlet:jar:9.4.51.v20230217 in 
the shaded jar.
[INFO] Including org.eclipse.jetty:jetty-proxy:jar:9.4.51.v20230217 in the 
shaded jar.
Downloading from gcs-maven-central-mirror: 
https://maven-central.storage-download.googleapis.com/maven2/org/eclipse/jetty/jetty-proxy/9.4.51.v20230217/jetty-proxy-9.4.51.v20230217-tests.jar
Downloading from central: 
https://repo.maven.apache.org/maven2/org/eclipse/jetty/jetty-proxy/9.4.51.v20230217/jetty-proxy-9.4.51.v20230217-tests.jar
[WARNING] Could not get tests for 
org.eclipse.jetty:jetty-proxy:jar:9.4.51.v20230217:compile

[spark] branch master updated: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

2023-09-05 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e4d17e9a1fb [SPARK-44833][CONNECT] Fix sending Reattach too fast after 
Execute
e4d17e9a1fb is described below

commit e4d17e9a1fb64454a6a007171837d159633e91fb
Author: Juliusz Sompolski 
AuthorDate: Wed Sep 6 14:21:47 2023 +0900

[SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

### What changes were proposed in this pull request?

Redo the retry logic, so that getting a new iterator via ReattachExecute 
does not depend on "firstTry", but there is logic in "callIter" with unsetting 
the iterator when a new one is needed.

### Why are the changes needed?

After an "INVALID_HANDLE.OPERATION_NOT_FOUND" error, client would realize 
that the failure in ReattachExecute was because the initial ExecutePlan didn't 
reach the server. It would then call another ExecutePlan, and it will throw a 
RetryException to let the retry logic handle retrying. However, the retry logic 
would then immediately send a ReattachExecute, and the client will want to use 
the iterator of the reattach.

However, on the server the ExecutePlan and ReattachExecute could race with 
each other:
* ExecutePlan didn't reach 
executeHolder.runGrpcResponseSender(responseSender) in 
SparkConnectExecutePlanHandler yet.
* ReattachExecute races around and reaches 
executeHolder.runGrpcResponseSender(responseSender) in 
SparkConnectReattachExecuteHandler first.
* When ExecutePlan reaches 
executeHolder.runGrpcResponseSender(responseSender), and 
executionObserver.attachConsumer(this) is called in ExecuteGrpcResponseSender 
of ExecutePlan, it will kick out the ExecuteGrpcResponseSender of 
ReattachExecute.

So even though ReattachExecute came later, it will get interrupted by the 
earlier ExecutePlan and finish with a INVALID_CURSOR.DISCONNECTED error.

After this change, such a race between ExecutePlan / ReattachExecute can 
still happens, but the client should no longer send these requests in such 
quick succession.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Integration testing.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42806 from juliuszsompolski/SPARK-44833.

Authored-by: Juliusz Sompolski 
Signed-off-by: Hyukjin Kwon 
---
 .../ExecutePlanResponseReattachableIterator.scala  | 33 --
 python/pyspark/sql/connect/client/reattach.py  | 31 ++--
 2 files changed, 28 insertions(+), 36 deletions(-)

diff --git 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
index aeb452faecf..9bf7de33da8 100644
--- 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
+++ 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
@@ -91,8 +91,8 @@ class ExecutePlanResponseReattachableIterator(
   // Initial iterator comes from ExecutePlan request.
   // Note: This is not retried, because no error would ever be thrown here, 
and GRPC will only
   // throw error on first iter.hasNext() or iter.next()
-  private var iter: java.util.Iterator[proto.ExecutePlanResponse] =
-rawBlockingStub.executePlan(initialRequest)
+  private var iter: Option[java.util.Iterator[proto.ExecutePlanResponse]] =
+Some(rawBlockingStub.executePlan(initialRequest))
 
   override def next(): proto.ExecutePlanResponse = synchronized {
 // hasNext will trigger reattach in case the stream completed without 
resultComplete
@@ -102,15 +102,7 @@ class ExecutePlanResponseReattachableIterator(
 
 try {
   // Get next response, possibly triggering reattach in case of stream 
error.
-  var firstTry = true
   val ret = retry {
-if (firstTry) {
-  // on first try, we use the existing iter.
-  firstTry = false
-} else {
-  // on retry, the iter is borked, so we need a new one
-  iter = 
rawBlockingStub.reattachExecute(createReattachExecuteRequest())
-}
 callIter(_.next())
   }
 
@@ -134,23 +126,15 @@ class ExecutePlanResponseReattachableIterator(
   // After response complete response
   return false
 }
-var firstTry = true
 try {
   retry {
-if (firstTry) {
-  // on first try, we use the existing iter.
-  firstTry = false
-} else {
-  // on retry, the iter is borked, so we need a new one
-

[spark] branch branch-3.5 updated: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

2023-09-05 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 04558fc90fe [SPARK-44833][CONNECT] Fix sending Reattach too fast after 
Execute
04558fc90fe is described below

commit 04558fc90fe2df2d2791c98c5f7b15ee26e250eb
Author: Juliusz Sompolski 
AuthorDate: Wed Sep 6 14:21:47 2023 +0900

[SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

### What changes were proposed in this pull request?

Redo the retry logic, so that getting a new iterator via ReattachExecute 
does not depend on "firstTry", but there is logic in "callIter" with unsetting 
the iterator when a new one is needed.

### Why are the changes needed?

After an "INVALID_HANDLE.OPERATION_NOT_FOUND" error, client would realize 
that the failure in ReattachExecute was because the initial ExecutePlan didn't 
reach the server. It would then call another ExecutePlan, and it will throw a 
RetryException to let the retry logic handle retrying. However, the retry logic 
would then immediately send a ReattachExecute, and the client will want to use 
the iterator of the reattach.

However, on the server the ExecutePlan and ReattachExecute could race with 
each other:
* ExecutePlan didn't reach 
executeHolder.runGrpcResponseSender(responseSender) in 
SparkConnectExecutePlanHandler yet.
* ReattachExecute races around and reaches 
executeHolder.runGrpcResponseSender(responseSender) in 
SparkConnectReattachExecuteHandler first.
* When ExecutePlan reaches 
executeHolder.runGrpcResponseSender(responseSender), and 
executionObserver.attachConsumer(this) is called in ExecuteGrpcResponseSender 
of ExecutePlan, it will kick out the ExecuteGrpcResponseSender of 
ReattachExecute.

So even though ReattachExecute came later, it will get interrupted by the 
earlier ExecutePlan and finish with a INVALID_CURSOR.DISCONNECTED error.

After this change, such a race between ExecutePlan / ReattachExecute can 
still happens, but the client should no longer send these requests in such 
quick succession.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Integration testing.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42806 from juliuszsompolski/SPARK-44833.

Authored-by: Juliusz Sompolski 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit e4d17e9a1fb64454a6a007171837d159633e91fb)
Signed-off-by: Hyukjin Kwon 
---
 .../ExecutePlanResponseReattachableIterator.scala  | 33 --
 python/pyspark/sql/connect/client/reattach.py  | 31 ++--
 2 files changed, 28 insertions(+), 36 deletions(-)

diff --git 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
index aeb452faecf..9bf7de33da8 100644
--- 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
+++ 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
@@ -91,8 +91,8 @@ class ExecutePlanResponseReattachableIterator(
   // Initial iterator comes from ExecutePlan request.
   // Note: This is not retried, because no error would ever be thrown here, 
and GRPC will only
   // throw error on first iter.hasNext() or iter.next()
-  private var iter: java.util.Iterator[proto.ExecutePlanResponse] =
-rawBlockingStub.executePlan(initialRequest)
+  private var iter: Option[java.util.Iterator[proto.ExecutePlanResponse]] =
+Some(rawBlockingStub.executePlan(initialRequest))
 
   override def next(): proto.ExecutePlanResponse = synchronized {
 // hasNext will trigger reattach in case the stream completed without 
resultComplete
@@ -102,15 +102,7 @@ class ExecutePlanResponseReattachableIterator(
 
 try {
   // Get next response, possibly triggering reattach in case of stream 
error.
-  var firstTry = true
   val ret = retry {
-if (firstTry) {
-  // on first try, we use the existing iter.
-  firstTry = false
-} else {
-  // on retry, the iter is borked, so we need a new one
-  iter = 
rawBlockingStub.reattachExecute(createReattachExecuteRequest())
-}
 callIter(_.next())
   }
 
@@ -134,23 +126,15 @@ class ExecutePlanResponseReattachableIterator(
   // After response complete response
   return false
 }
-var firstTry = true
 try {
   retry {
-if (firstTry) {
-  // on first try, we use the existing iter.
-

[spark] branch master updated: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 637f16e7ff8 [SPARK-45070][SQL][DOCS] Describe the binary and datetime 
formats of `to_char`/`to_varchar`
637f16e7ff8 is described below

commit 637f16e7ff88c2aef0e7f29163e13138ff472c1d
Author: Max Gekk 
AuthorDate: Wed Sep 6 08:25:41 2023 +0300

[SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of 
`to_char`/`to_varchar`

### What changes were proposed in this pull request?
In the PR, I propose to document the recent changes related to the `format` 
of the `to_char`/`to_varchar` functions:
1. binary formats added by https://github.com/apache/spark/pull/42632
2. datetime formats introduced by https://github.com/apache/spark/pull/42534

### Why are the changes needed?
To inform users about recent changes.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By CI.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42801 from MaxGekk/doc-to_char-api.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../main/scala/org/apache/spark/sql/functions.scala| 18 --
 python/pyspark/sql/functions.py| 12 
 .../main/scala/org/apache/spark/sql/functions.scala| 18 ++
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index 527848e95e6..54bf0106956 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -4280,6 +4280,7 @@ object functions {
*/
   def to_binary(e: Column): Column = Column.fn("to_binary", e)
 
+  // scalastyle:off line.size.limit
   /**
* Convert `e` to a string based on the `format`. Throws an exception if the 
conversion fails.
*
@@ -4300,13 +4301,20 @@ object functions {
*   (optional, only allowed once at the beginning or end of the format 
string). Note that 'S'
*   prints '+' for positive values but 'MI' prints a space. 'PR': 
Only allowed at the
*   end of the format string; specifies that the result string will be 
wrapped by angle
-   *   brackets if the input value is negative. 
+   *   brackets if the input value is negative.  If `e` is a 
datetime, `format` shall be
+   *   a valid datetime pattern, see https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime
+   *   Patterns. If `e` is a binary, it is converted to a string in one of 
the formats: 
+   *   'base64': a base 64 string. 'hex': a string in the 
hexadecimal format.
+   *   'utf-8': the input binary is decoded to UTF-8 string. 
*
* @group string_funcs
* @since 3.5.0
*/
+  // scalastyle:on line.size.limit
   def to_char(e: Column, format: Column): Column = Column.fn("to_char", e, 
format)
 
+  // scalastyle:off line.size.limit
   /**
* Convert `e` to a string based on the `format`. Throws an exception if the 
conversion fails.
*
@@ -4327,11 +4335,17 @@ object functions {
*   (optional, only allowed once at the beginning or end of the format 
string). Note that 'S'
*   prints '+' for positive values but 'MI' prints a space. 'PR': 
Only allowed at the
*   end of the format string; specifies that the result string will be 
wrapped by angle
-   *   brackets if the input value is negative. 
+   *   brackets if the input value is negative.  If `e` is a 
datetime, `format` shall be
+   *   a valid datetime pattern, see https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime
+   *   Patterns. If `e` is a binary, it is converted to a string in one of 
the formats: 
+   *   'base64': a base 64 string. 'hex': a string in the 
hexadecimal format.
+   *   'utf-8': the input binary is decoded to UTF-8 string. 
*
* @group string_funcs
* @since 3.5.0
*/
+  // scalastyle:on line.size.limit
   def to_varchar(e: Column, format: Column): Column = Column.fn("to_varchar", 
e, format)
 
   /**
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 56b436421af..de91cced206 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -10902,6 +10902,12 @@ def to_char(col: "ColumnOrName", format: 
"ColumnOrName") -> Column:
 values but 'MI' prints a space.
 'PR': Only allowed at the end of the format string; specifies that the 
result string
 will be wrapped by angle brackets if the input value is negative.
+If `col` is a datetime, `format` shall be a valid datetime patt

[spark] branch master updated (6c885a7cf57d -> 5fff2427ccb8)

[spark] branch master updated: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

[spark] branch branch-3.5 updated: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

[spark] branch master updated: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

[spark] branch branch-3.5 updated: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

[spark] 01/01: Preparing Spark release v3.5.0-rc4

[spark] tag v3.5.0-rc4 created (now c2939589a29d)

[spark] branch branch-3.5 updated (dc6af11082ee -> 66e07c97b7a5)

[spark] 01/01: Preparing development version 3.5.1-SNAPSHOT

svn commit: r63805 - /dev/spark/v3.5.0-rc4-bin/

svn commit: r63807 - in /dev/spark/v3.5.0-rc4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

[spark] branch master updated (e0a6af988df3 -> ba35140ad166)

[spark] branch master updated: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

[spark] branch branch-3.5 updated: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

[spark] branch branch-3.4 updated: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

[spark] branch master updated: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

[spark] branch master updated: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

[spark] branch branch-3.5 updated: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

[spark] branch master updated: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`

19 matches

Site Navigation

Mail list logo

Footer information