[spark] branch branch-3.5 updated: [SPARK-43744][CONNECT] Fix class loading problem caused by stub user classes not found on the server classpath

2023-07-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 408c3df01d2 [SPARK-43744][CONNECT] Fix class loading problem caused by 
stub user classes not found on the server classpath
408c3df01d2 is described below

commit 408c3df01d20bb539929bdfdb236f899c428a12e
Author: Zhen Li 
AuthorDate: Fri Jul 28 22:59:07 2023 -0400

[SPARK-43744][CONNECT] Fix class loading problem caused by stub user 
classes not found on the server classpath

### What changes were proposed in this pull request?
This PR introduces a stub class loader for unpacking Scala UDFs in the 
driver and the executor. When encountering user classes that are not found on 
the server session classpath, the stub class loader would try to stub the class.

This solves the problem that when serializing UDFs, Java serializer might 
include unnecessary user code e.g. User classes used in the lambda definition 
signatures in the same class where the UDF is defined.

If the user code is actually needed to execute the UDF, we will return an 
error message to suggest the user to add the missing classes using the 
`addArtifact` method.

### Why are the changes needed?
To enhance the user experience of UDF. This PR should be merged to master 
and 3.5.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added test both for Scala 2.12 & 2.13

4 tests in SparkSessionE2ESuite still fail to run with maven after the fix 
because the client test jar is installed on the system classpath (added using 
--jar at server start), the stub classloader can only stub classes missing from 
the session classpath (added using `session.addArtifact`).

Moving the test jar to the session classpath causes failures in tests for 
`flatMapGroupsWithState` (SPARK-44576). Finish moving the test jar to session 
classpath once `flatMapGroupsWithState` test failures are fixed.

Closes #42069 from zhenlineo/ref-spark-result.

Authored-by: Zhen Li 
Signed-off-by: Herman van Hovell 
(cherry picked from commit 6d0fed9a18ff87e73fdf1ee46b6b0d2df8dd5a1b)
Signed-off-by: Herman van Hovell 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  |   2 +-
 .../sql/expressions/UserDefinedFunction.scala  |   2 +-
 .../jvm/src/test/resources/StubClassDummyUdf.scala |  56 +
 .../connect/client/jvm/src/test/resources/udf2.12  | Bin 0 -> 1520 bytes
 .../client/jvm/src/test/resources/udf2.12.jar  | Bin 0 -> 5332 bytes
 .../connect/client/jvm/src/test/resources/udf2.13  | Bin 0 -> 1630 bytes
 .../client/jvm/src/test/resources/udf2.13.jar  | Bin 0 -> 5674 bytes
 .../connect/client/UDFClassLoadingE2ESuite.scala   |  83 +
 .../connect/client/util/IntegrationTestUtils.scala |   2 +-
 .../connect/client/util/RemoteSparkSession.scala   |   2 +-
 .../artifact/SparkConnectArtifactManager.scala |  17 ++-
 .../sql/connect/planner/SparkConnectPlanner.scala  |  23 +++-
 connector/connect/server/src/test/resources/udf| Bin 0 -> 973 bytes
 .../connect/server/src/test/resources/udf_noA.jar  | Bin 0 -> 5545 bytes
 .../connect/artifact/StubClassLoaderSuite.scala| 132 +
 .../spark/util/ChildFirstURLClassLoader.java   |   9 ++
 .../scala/org/apache/spark/executor/Executor.scala |  86 +++---
 .../org/apache/spark/internal/config/package.scala |  14 +++
 .../org/apache/spark/util/StubClassLoader.scala|  79 
 19 files changed, 480 insertions(+), 27 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
index d1832e65f3e..4b3de91b56f 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -554,7 +554,7 @@ class SparkSession private[sql] (
 val command = proto.Command.newBuilder().setRegisterFunction(udf).build()
 val plan = proto.Plan.newBuilder().setCommand(command).build()
 
-client.execute(plan)
+client.execute(plan).asScala.foreach(_ => ())
   }
 
   @DeveloperApi
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
index 18aef8a2e4c..e5c89d90c19 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
@@ -92,7 +92,7 @@ sealed abstract class UserDefinedFunction {
 

[spark] branch master updated: [SPARK-43744][CONNECT] Fix class loading problem caused by stub user classes not found on the server classpath

2023-07-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6d0fed9a18f [SPARK-43744][CONNECT] Fix class loading problem caused by 
stub user classes not found on the server classpath
6d0fed9a18f is described below

commit 6d0fed9a18ff87e73fdf1ee46b6b0d2df8dd5a1b
Author: Zhen Li 
AuthorDate: Fri Jul 28 22:59:07 2023 -0400

[SPARK-43744][CONNECT] Fix class loading problem caused by stub user 
classes not found on the server classpath

### What changes were proposed in this pull request?
This PR introduces a stub class loader for unpacking Scala UDFs in the 
driver and the executor. When encountering user classes that are not found on 
the server session classpath, the stub class loader would try to stub the class.

This solves the problem that when serializing UDFs, Java serializer might 
include unnecessary user code e.g. User classes used in the lambda definition 
signatures in the same class where the UDF is defined.

If the user code is actually needed to execute the UDF, we will return an 
error message to suggest the user to add the missing classes using the 
`addArtifact` method.

### Why are the changes needed?
To enhance the user experience of UDF. This PR should be merged to master 
and 3.5.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added test both for Scala 2.12 & 2.13

4 tests in SparkSessionE2ESuite still fail to run with maven after the fix 
because the client test jar is installed on the system classpath (added using 
--jar at server start), the stub classloader can only stub classes missing from 
the session classpath (added using `session.addArtifact`).

Moving the test jar to the session classpath causes failures in tests for 
`flatMapGroupsWithState` (SPARK-44576). Finish moving the test jar to session 
classpath once `flatMapGroupsWithState` test failures are fixed.

Closes #42069 from zhenlineo/ref-spark-result.

Authored-by: Zhen Li 
Signed-off-by: Herman van Hovell 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  |   2 +-
 .../sql/expressions/UserDefinedFunction.scala  |   2 +-
 .../jvm/src/test/resources/StubClassDummyUdf.scala |  56 +
 .../connect/client/jvm/src/test/resources/udf2.12  | Bin 0 -> 1520 bytes
 .../client/jvm/src/test/resources/udf2.12.jar  | Bin 0 -> 5332 bytes
 .../connect/client/jvm/src/test/resources/udf2.13  | Bin 0 -> 1630 bytes
 .../client/jvm/src/test/resources/udf2.13.jar  | Bin 0 -> 5674 bytes
 .../connect/client/UDFClassLoadingE2ESuite.scala   |  83 +
 .../connect/client/util/IntegrationTestUtils.scala |   2 +-
 .../connect/client/util/RemoteSparkSession.scala   |   2 +-
 .../artifact/SparkConnectArtifactManager.scala |  17 ++-
 .../sql/connect/planner/SparkConnectPlanner.scala  |  23 +++-
 connector/connect/server/src/test/resources/udf| Bin 0 -> 973 bytes
 .../connect/server/src/test/resources/udf_noA.jar  | Bin 0 -> 5545 bytes
 .../connect/artifact/StubClassLoaderSuite.scala| 132 +
 .../spark/util/ChildFirstURLClassLoader.java   |   9 ++
 .../scala/org/apache/spark/executor/Executor.scala |  86 +++---
 .../org/apache/spark/internal/config/package.scala |  14 +++
 .../org/apache/spark/util/StubClassLoader.scala|  79 
 19 files changed, 480 insertions(+), 27 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
index d1832e65f3e..4b3de91b56f 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -554,7 +554,7 @@ class SparkSession private[sql] (
 val command = proto.Command.newBuilder().setRegisterFunction(udf).build()
 val plan = proto.Plan.newBuilder().setCommand(command).build()
 
-client.execute(plan)
+client.execute(plan).asScala.foreach(_ => ())
   }
 
   @DeveloperApi
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
index 18aef8a2e4c..e5c89d90c19 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
@@ -92,7 +92,7 @@ sealed abstract class UserDefinedFunction {
 /**
  * Holder class for a scalar user-defined function and it's input/output 
encoder(s).
  */
-case class 

[spark] branch branch-3.5 updated: [SPARK-44574][SQL][CONNECT] Errors that moved into sq/api should also use AnalysisException

2023-07-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d01821e0a99 [SPARK-44574][SQL][CONNECT] Errors that moved into sq/api 
should also use AnalysisException
d01821e0a99 is described below

commit d01821e0a99f527b1c151c19b0f44c26121ad57d
Author: Rui Wang 
AuthorDate: Fri Jul 28 22:47:00 2023 -0400

[SPARK-44574][SQL][CONNECT] Errors that moved into sq/api should also use 
AnalysisException

### What changes were proposed in this pull request?

https://github.com/apache/spark/pull/42130 moved AnalysisException to 
sql/api. So we can use AnalysisException for those errors that were moved into 
sql/api but changed to SparkException.

### Why are the changes needed?

Restore to previous behavior.

### Does this PR introduce _any_ user-facing change?

No. This PR recovers the behaviors to the past which users should see 
AnalysisException upon many cases.

### How was this patch tested?

Existing UT.

Closes #42190 from amaliujia/convert_back_errors.

Authored-by: Rui Wang 
Signed-off-by: Herman van Hovell 
(cherry picked from commit 3761b7d65bd1e21b5a7c5966c2d03a0fe4a0b982)
Signed-off-by: Herman van Hovell 
---
 .../apache/spark/sql/errors/DataTypeErrors.scala   | 53 ++
 .../spark/sql/errors/QueryCompilationErrors.scala  |  7 +--
 .../apache/spark/sql/types/StructTypeSuite.scala   | 26 +--
 .../apache/spark/sql/CharVarcharTestSuite.scala| 28 +++-
 .../org/apache/spark/sql/DataFrameSuite.scala  |  2 +-
 .../org/apache/spark/sql/JsonFunctionsSuite.scala  |  6 +--
 .../spark/sql/connector/AlterTableTests.scala  |  2 +-
 .../sql/errors/QueryCompilationErrorsSuite.scala   |  4 +-
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |  4 +-
 .../sql/test/DataFrameReaderWriterSuite.scala  | 12 ++---
 10 files changed, 68 insertions(+), 76 deletions(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
index fcc3086b573..7a34a386cd8 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala
@@ -17,8 +17,10 @@
 package org.apache.spark.sql.errors
 
 import org.apache.spark.{SparkArithmeticException, SparkException, 
SparkIllegalArgumentException, SparkNumberFormatException, 
SparkRuntimeException, SparkUnsupportedOperationException}
+import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.trees.{Origin, SQLQueryContext}
 import org.apache.spark.sql.catalyst.util.QuotingUtils
+import org.apache.spark.sql.catalyst.util.QuotingUtils.toSQLSchema
 import org.apache.spark.sql.types.{DataType, Decimal, StringType}
 import org.apache.spark.unsafe.types.UTF8String
 
@@ -97,49 +99,45 @@ private[sql] object DataTypeErrors extends 
DataTypeErrorsBase {
   }
 
   def schemaFailToParseError(schema: String, e: Throwable): Throwable = {
-new SparkException(
+new AnalysisException(
   errorClass = "INVALID_SCHEMA.PARSE_ERROR",
   messageParameters = Map(
-"inputSchema" -> QuotingUtils.toSQLSchema(schema),
+"inputSchema" -> toSQLSchema(schema),
 "reason" -> e.getMessage
   ),
-  cause = e)
+  cause = Some(e))
   }
 
   def invalidDayTimeIntervalType(startFieldName: String, endFieldName: 
String): Throwable = {
-new SparkException(
+new AnalysisException(
   errorClass = "_LEGACY_ERROR_TEMP_1224",
   messageParameters = Map(
 "startFieldName" -> startFieldName,
-"endFieldName" -> endFieldName),
-  cause = null)
+"endFieldName" -> endFieldName))
   }
 
   def invalidDayTimeField(field: Byte, supportedIds: Seq[String]): Throwable = 
{
-new SparkException(
+new AnalysisException(
   errorClass = "_LEGACY_ERROR_TEMP_1223",
   messageParameters = Map(
 "field" -> field.toString,
-"supportedIds" -> supportedIds.mkString(", ")),
-  cause = null)
+"supportedIds" -> supportedIds.mkString(", ")))
   }
 
   def invalidYearMonthField(field: Byte, supportedIds: Seq[String]): Throwable 
= {
-new SparkException(
+new AnalysisException(
   errorClass = "_LEGACY_ERROR_TEMP_1225",
   messageParameters = Map(
 "field" -> field.toString,
-"supportedIds" -> supportedIds.mkString(", ")),
-  cause = null)
+"supportedIds" -> supportedIds.mkString(", ")))
   }
 
   def decimalCannotGreaterThanPrecisionError(scale: Int, precision: Int): 
Throwable = {
-new SparkException(
+new AnalysisException(
   errorClass = "_LEGACY_ERROR_TEMP_1228",
   messageParameters = Map(
 

[spark] branch master updated (85a4d1e56e8 -> 3761b7d65bd)

2023-07-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 85a4d1e56e8 [SPARK-41400][CONNECT] Remove Connect Client Catalyst 
Dependency
 add 3761b7d65bd [SPARK-44574][SQL][CONNECT] Errors that moved into sq/api 
should also use AnalysisException

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/errors/DataTypeErrors.scala   | 53 ++
 .../spark/sql/errors/QueryCompilationErrors.scala  |  7 +--
 .../apache/spark/sql/types/StructTypeSuite.scala   | 26 +--
 .../apache/spark/sql/CharVarcharTestSuite.scala| 28 +++-
 .../org/apache/spark/sql/DataFrameSuite.scala  |  2 +-
 .../org/apache/spark/sql/JsonFunctionsSuite.scala  |  6 +--
 .../spark/sql/connector/AlterTableTests.scala  |  2 +-
 .../sql/errors/QueryCompilationErrorsSuite.scala   |  4 +-
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |  4 +-
 .../sql/test/DataFrameReaderWriterSuite.scala  | 12 ++---
 10 files changed, 68 insertions(+), 76 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (72af2c0fbc6 -> 85a4d1e56e8)

2023-07-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 72af2c0fbc6 [SPARK-44585][MLLIB] Fix warning condition in MLLib 
RankingMetrics ndcgAk
 add 85a4d1e56e8 [SPARK-41400][CONNECT] Remove Connect Client Catalyst 
Dependency

No new revisions were added by this update.

Summary of changes:
 common/network-common/pom.xml  |   3 +-
 .../apache/spark/network/sasl/SparkSaslSuite.java  |   5 +-
 .../org/apache/spark/network/util/ByteUnit.java|   0
 .../org/apache/spark/network/util/JavaUtils.java   |  66 ++---
 .../org/apache/spark/util/SparkErrorUtils.scala|  53 +++-
 .../org/apache/spark/util/SparkFileUtils.scala |  80 +-
 .../org/apache/spark/util/SparkSerDeUtils.scala|   9 +-
 connector/connect/client/jvm/pom.xml   |  14 +-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   6 +-
 .../connect/client/arrow/ArrowDeserializer.scala   |   6 +-
 .../sql/connect/client/arrow/ArrowSerializer.scala |  19 +-
 .../connect/client/arrow/ArrowVectorReader.scala   |  15 +-
 .../org/apache/spark/sql/protobuf/functions.scala  |  15 +-
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  |   7 +-
 .../spark/sql/DataFrameNaFunctionSuite.scala   |   6 +-
 .../apache/spark/sql/PlanGenerationTestSuite.scala |   6 +-
 .../scala/org/apache/spark/sql/SQLHelper.scala |  11 +-
 .../apache/spark/sql/SQLImplicitsTestSuite.scala   |  31 ++-
 .../apache/spark/sql/SparkSessionE2ESuite.scala|  16 +-
 .../sql/UserDefinedFunctionE2ETestSuite.scala  |  15 +-
 .../spark/sql/UserDefinedFunctionSuite.scala   |   4 +-
 .../sql/connect/client/ClassFinderSuite.scala  |   6 +-
 .../connect/client/arrow/ArrowEncoderSuite.scala   |   6 +-
 .../connect/client/util/IntegrationTestUtils.scala |  11 +-
 .../spark/sql/connect/client/util/QueryTest.scala  |   4 +-
 .../spark/sql/streaming/StreamingQuerySuite.scala  |   6 +-
 connector/connect/common/pom.xml   |   8 +-
 .../common/LiteralValueProtoConverter.scala|  14 +-
 .../main/scala/org/apache/spark/util/Utils.scala   | 132 +-
 .../catalyst/plans/logical/LogicalGroupState.scala |   7 +-
 .../sql/catalyst/util/SparkIntervalUtils.scala | 287 -
 .../spark/sql/catalyst/util/StringUtils.scala  |  20 ++
 .../spark/sql/errors/CompilationErrors.scala   |  54 
 .../org/apache/spark/sql/internal/SqlApiConf.scala |   3 +
 .../org/apache/spark/sql/types/UpCastRule.scala|  86 ++
 .../sql/catalyst/analysis/AnsiTypeCoercion.scala   |   2 +-
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |  12 +-
 .../spark/sql/catalyst/expressions/Cast.scala  |  47 +---
 .../sql/catalyst/expressions/ToStringBase.scala|   6 +-
 .../spark/sql/catalyst/plans/logical/object.scala  |   3 -
 .../spark/sql/catalyst/util/IntervalUtils.scala| 260 ---
 .../apache/spark/sql/catalyst/util/package.scala   |  21 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  33 +--
 .../org/apache/spark/sql/internal/SQLConf.scala|   6 +-
 .../sql/catalyst/expressions/CastSuiteBase.scala   |   2 +-
 45 files changed, 751 insertions(+), 672 deletions(-)
 rename common/{network-common => 
utils}/src/main/java/org/apache/spark/network/util/ByteUnit.java (100%)
 rename common/{network-common => 
utils}/src/main/java/org/apache/spark/network/util/JavaUtils.java (89%)
 copy 
connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/package-info.java
 => 
sql/api/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalGroupState.scala
 (86%)
 create mode 100644 
sql/api/src/main/scala/org/apache/spark/sql/errors/CompilationErrors.scala
 create mode 100644 
sql/api/src/main/scala/org/apache/spark/sql/types/UpCastRule.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-41400][CONNECT] Remove Connect Client Catalyst Dependency

2023-07-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 5412fb0590e [SPARK-41400][CONNECT] Remove Connect Client Catalyst 
Dependency
5412fb0590e is described below

commit 5412fb0590e55d635e9e31887ec5c72d10011899
Author: Herman van Hovell 
AuthorDate: Fri Jul 28 21:30:51 2023 -0400

[SPARK-41400][CONNECT] Remove Connect Client Catalyst Dependency

### What changes were proposed in this pull request?
This PR decouples the Spark Connect Scala Client from Catalyst, it now used 
SQL API module instead.

There were quite a few changes we still needed to make:
- For testing we needed a bunch of utilities. I have moved these to 
common-utils.
- I have moved bits and pieces of IntervalUtils to SparkIntervalUtils.
- A lot of small fixes.

### Why are the changes needed?
This reduces the client's dependency tree from ~300 MB of deps to ~30MB. 
This makes it easier to use the client when you are developing connect 
applications. On top of this the reduced dependency graph also means folks will 
be less affected by the clients' classpath.

### Does this PR introduce _any_ user-facing change?
Yes. It changes the classpath exposed by the Spark Connect Scala Client.

### How was this patch tested?
Existing tests.

Closes #42184 from hvanhovell/SPARK-41400-v1.

Authored-by: Herman van Hovell 
Signed-off-by: Herman van Hovell 
(cherry picked from commit 85a4d1e56e80d85dad9b8945c67287927eb379f6)
Signed-off-by: Herman van Hovell 
---
 common/network-common/pom.xml  |   3 +-
 .../apache/spark/network/sasl/SparkSaslSuite.java  |   5 +-
 .../org/apache/spark/network/util/ByteUnit.java|   0
 .../org/apache/spark/network/util/JavaUtils.java   |  66 ++---
 .../org/apache/spark/util/SparkErrorUtils.scala|  53 +++-
 .../org/apache/spark/util/SparkFileUtils.scala |  80 +-
 .../org/apache/spark/util/SparkSerDeUtils.scala|   9 +-
 connector/connect/client/jvm/pom.xml   |  14 +-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   6 +-
 .../connect/client/arrow/ArrowDeserializer.scala   |   6 +-
 .../sql/connect/client/arrow/ArrowSerializer.scala |  19 +-
 .../connect/client/arrow/ArrowVectorReader.scala   |  15 +-
 .../org/apache/spark/sql/protobuf/functions.scala  |  15 +-
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  |   7 +-
 .../spark/sql/DataFrameNaFunctionSuite.scala   |   6 +-
 .../apache/spark/sql/PlanGenerationTestSuite.scala |   6 +-
 .../scala/org/apache/spark/sql/SQLHelper.scala |  11 +-
 .../apache/spark/sql/SQLImplicitsTestSuite.scala   |  31 ++-
 .../apache/spark/sql/SparkSessionE2ESuite.scala|  16 +-
 .../sql/UserDefinedFunctionE2ETestSuite.scala  |  15 +-
 .../spark/sql/UserDefinedFunctionSuite.scala   |   4 +-
 .../sql/connect/client/ClassFinderSuite.scala  |   6 +-
 .../connect/client/arrow/ArrowEncoderSuite.scala   |   6 +-
 .../connect/client/util/IntegrationTestUtils.scala |  11 +-
 .../spark/sql/connect/client/util/QueryTest.scala  |   4 +-
 .../spark/sql/streaming/StreamingQuerySuite.scala  |   6 +-
 connector/connect/common/pom.xml   |   8 +-
 .../common/LiteralValueProtoConverter.scala|  14 +-
 .../main/scala/org/apache/spark/util/Utils.scala   | 132 +-
 .../catalyst/plans/logical/LogicalGroupState.scala |  16 +-
 .../sql/catalyst/util/SparkIntervalUtils.scala | 287 -
 .../spark/sql/catalyst/util/StringUtils.scala  |  20 ++
 .../spark/sql/errors/CompilationErrors.scala   |  54 
 .../org/apache/spark/sql/internal/SqlApiConf.scala |   3 +
 .../org/apache/spark/sql/types/UpCastRule.scala|  86 ++
 .../sql/catalyst/analysis/AnsiTypeCoercion.scala   |   2 +-
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |  12 +-
 .../spark/sql/catalyst/expressions/Cast.scala  |  47 +---
 .../sql/catalyst/expressions/ToStringBase.scala|   6 +-
 .../spark/sql/catalyst/plans/logical/object.scala  |   3 -
 .../spark/sql/catalyst/util/IntervalUtils.scala| 260 ---
 .../apache/spark/sql/catalyst/util/package.scala   |  21 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  33 +--
 .../org/apache/spark/sql/internal/SQLConf.scala|   6 +-
 .../sql/catalyst/expressions/CastSuiteBase.scala   |   2 +-
 45 files changed, 751 insertions(+), 681 deletions(-)

diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 8a63e999c53..2b43f9ce98a 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -150,7 +150,8 @@
 
 
   org.apache.spark
-  spark-tags_${scala.binary.version}
+  spark-common-utils_${scala.binary.version}
+  ${project.version}

 
 
 
   

[spark] branch branch-3.4 updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

2023-07-28 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new f19a953b647 [SPARK-44585][MLLIB] Fix warning condition in MLLib 
RankingMetrics ndcgAk
f19a953b647 is described below

commit f19a953b6471673f89d689bea20e0d53026f7b5b
Author: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com>
AuthorDate: Fri Jul 28 17:29:47 2023 -0500

[SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

### What changes were proposed in this pull request?

This PR fixes the condition to raise the following warning in MLLib's 
RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value 
set should be equal, check input data"

The logic for raising warnings is faulty at the moment: it raises a warning 
if the `rel` input is empty and `lab.size` and `rel.size` are not equal.

The logic should be to raise a warning if `rel` input is **not empty** and 
`lab.size` and `rel.size` are not equal.

This warning was added in the following PR: 
https://github.com/apache/spark/pull/36843

### Why are the changes needed?

With the current logic, RankingMetrics will:
- raise incorrect warning when a user is using it in the "binary" mode 
(i.e. no relevance values in the input)
- not raise warning (that could be necessary) when the user is using it in 
the "non-binary" model (i.e. with relevance values in the input)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No change made to the test suite for RankingMetrics: 
https://github.com/uch/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala

Closes #42207 from guilhem-depop/patch-1.

Authored-by: Guilhem Vuillier 
<101632595+guilhem-de...@users.noreply.github.com>
Signed-off-by: Sean Owen 
(cherry picked from commit 72af2c0fbc6673a5e49f1fd6693fe2c90141a84f)
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
index 37e57736574..a3316d8a8fa 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
@@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
* and the NDCG is obtained by dividing the DCG value on the ground truth 
set. In the current
* implementation, the relevance value is binary if the relevance value is 
empty.
 
+   * If the relevance value is not empty but its size doesn't match the ground 
truth set size,
+   * a log warning is generated.
+   *
* If a query has an empty ground truth set, zero will be used as ndcg 
together with
* a log warning.
*
@@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
   val useBinary = rel.isEmpty
   val labSet = lab.toSet
   val relMap = Utils.toMap(lab, rel)
-  if (useBinary && lab.size != rel.size) {
+  if (!useBinary && lab.size != rel.size) {
 logWarning(
   "# of ground truth set and # of relevance value set should be equal, 
" +
 "check input data")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

2023-07-28 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d0fa5a75d17 [SPARK-44585][MLLIB] Fix warning condition in MLLib 
RankingMetrics ndcgAk
d0fa5a75d17 is described below

commit d0fa5a75d17335e60aefbb554adb9b3fce1f97ff
Author: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com>
AuthorDate: Fri Jul 28 17:29:47 2023 -0500

[SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

### What changes were proposed in this pull request?

This PR fixes the condition to raise the following warning in MLLib's 
RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value 
set should be equal, check input data"

The logic for raising warnings is faulty at the moment: it raises a warning 
if the `rel` input is empty and `lab.size` and `rel.size` are not equal.

The logic should be to raise a warning if `rel` input is **not empty** and 
`lab.size` and `rel.size` are not equal.

This warning was added in the following PR: 
https://github.com/apache/spark/pull/36843

### Why are the changes needed?

With the current logic, RankingMetrics will:
- raise incorrect warning when a user is using it in the "binary" mode 
(i.e. no relevance values in the input)
- not raise warning (that could be necessary) when the user is using it in 
the "non-binary" model (i.e. with relevance values in the input)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No change made to the test suite for RankingMetrics: 
https://github.com/uch/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala

Closes #42207 from guilhem-depop/patch-1.

Authored-by: Guilhem Vuillier 
<101632595+guilhem-de...@users.noreply.github.com>
Signed-off-by: Sean Owen 
(cherry picked from commit 72af2c0fbc6673a5e49f1fd6693fe2c90141a84f)
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
index 37e57736574..a3316d8a8fa 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
@@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
* and the NDCG is obtained by dividing the DCG value on the ground truth 
set. In the current
* implementation, the relevance value is binary if the relevance value is 
empty.
 
+   * If the relevance value is not empty but its size doesn't match the ground 
truth set size,
+   * a log warning is generated.
+   *
* If a query has an empty ground truth set, zero will be used as ndcg 
together with
* a log warning.
*
@@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
   val useBinary = rel.isEmpty
   val labSet = lab.toSet
   val relMap = Utils.toMap(lab, rel)
-  if (useBinary && lab.size != rel.size) {
+  if (!useBinary && lab.size != rel.size) {
 logWarning(
   "# of ground truth set and # of relevance value set should be equal, 
" +
 "check input data")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

2023-07-28 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 72af2c0fbc6 [SPARK-44585][MLLIB] Fix warning condition in MLLib 
RankingMetrics ndcgAk
72af2c0fbc6 is described below

commit 72af2c0fbc6673a5e49f1fd6693fe2c90141a84f
Author: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com>
AuthorDate: Fri Jul 28 17:29:47 2023 -0500

[SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

### What changes were proposed in this pull request?

This PR fixes the condition to raise the following warning in MLLib's 
RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value 
set should be equal, check input data"

The logic for raising warnings is faulty at the moment: it raises a warning 
if the `rel` input is empty and `lab.size` and `rel.size` are not equal.

The logic should be to raise a warning if `rel` input is **not empty** and 
`lab.size` and `rel.size` are not equal.

This warning was added in the following PR: 
https://github.com/apache/spark/pull/36843

### Why are the changes needed?

With the current logic, RankingMetrics will:
- raise incorrect warning when a user is using it in the "binary" mode 
(i.e. no relevance values in the input)
- not raise warning (that could be necessary) when the user is using it in 
the "non-binary" model (i.e. with relevance values in the input)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No change made to the test suite for RankingMetrics: 
https://github.com/uch/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala

Closes #42207 from guilhem-depop/patch-1.

Authored-by: Guilhem Vuillier 
<101632595+guilhem-de...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
index 37e57736574..a3316d8a8fa 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
@@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
* and the NDCG is obtained by dividing the DCG value on the ground truth 
set. In the current
* implementation, the relevance value is binary if the relevance value is 
empty.
 
+   * If the relevance value is not empty but its size doesn't match the ground 
truth set size,
+   * a log warning is generated.
+   *
* If a query has an empty ground truth set, zero will be used as ndcg 
together with
* a log warning.
*
@@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
   val useBinary = rel.isEmpty
   val labSet = lab.toSet
   val relMap = Utils.toMap(lab, rel)
-  if (useBinary && lab.size != rel.size) {
+  if (!useBinary && lab.size != rel.size) {
 logWarning(
   "# of ground truth set and # of relevance value set should be equal, 
" +
 "check input data")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-37377][SQL][FOLLOWUP] Fix the partitioned join of one side test case not match

2023-07-28 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 6879519cafa [SPARK-37377][SQL][FOLLOWUP] Fix the partitioned join of 
one side test case not match
6879519cafa is described below

commit 6879519cafa913f61a5f6029125e41cf79e80168
Author: Jia Fan 
AuthorDate: Fri Jul 28 19:14:56 2023 +0800

[SPARK-37377][SQL][FOLLOWUP] Fix the partitioned join of one side test case 
not match

### What changes were proposed in this pull request?
The partitioned join of one side test case not match with description.
Current, always two side reports partitioning not one.

### Why are the changes needed?
Fix test case

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

Closes #42144 from Hisoka-X/SPARK-37377_one_side_test_case.

Authored-by: Jia Fan 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 20bb6c0c5a84345bb09ac3cab6267a5747b6be05)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala   | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
index 8be3c6d9e13..880c30ba9f9 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
@@ -313,9 +313,8 @@ class KeyGroupedPartitioningSuite extends 
DistributionAndOrderingSuiteBase {
 
   test("partitioned join: only one side reports partitioning") {
 val customers_partitions = Array(bucket(4, "customer_id"))
-val orders_partitions = Array(bucket(2, "customer_id"))
 
-testWithCustomersAndOrders(customers_partitions, orders_partitions, 2)
+testWithCustomersAndOrders(customers_partitions, Array.empty, 2)
   }
 
   private val items: String = "items"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37377][SQL][FOLLOWUP] Fix the partitioned join of one side test case not match

2023-07-28 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 20bb6c0c5a8 [SPARK-37377][SQL][FOLLOWUP] Fix the partitioned join of 
one side test case not match
20bb6c0c5a8 is described below

commit 20bb6c0c5a84345bb09ac3cab6267a5747b6be05
Author: Jia Fan 
AuthorDate: Fri Jul 28 19:14:56 2023 +0800

[SPARK-37377][SQL][FOLLOWUP] Fix the partitioned join of one side test case 
not match

### What changes were proposed in this pull request?
The partitioned join of one side test case not match with description.
Current, always two side reports partitioning not one.

### Why are the changes needed?
Fix test case

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

Closes #42144 from Hisoka-X/SPARK-37377_one_side_test_case.

Authored-by: Jia Fan 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala   | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
index 8be3c6d9e13..880c30ba9f9 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/KeyGroupedPartitioningSuite.scala
@@ -313,9 +313,8 @@ class KeyGroupedPartitioningSuite extends 
DistributionAndOrderingSuiteBase {
 
   test("partitioned join: only one side reports partitioning") {
 val customers_partitions = Array(bucket(4, "customer_id"))
-val orders_partitions = Array(bucket(2, "customer_id"))
 
-testWithCustomersAndOrders(customers_partitions, orders_partitions, 2)
+testWithCustomersAndOrders(customers_partitions, Array.empty, 2)
   }
 
   private val items: String = "items"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch 3.5 created (now 3dcee0ccf7e)

2023-07-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch 3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 3dcee0ccf7e [SPARK-44538][CONNECT][SQL] Reinstate Row.jsonValue and 
friends

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44552][SQL] Remove `private object ParseState` definition from `IntervalUtils`

2023-07-28 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 81a939d375b [SPARK-44552][SQL] Remove `private object ParseState` 
definition from `IntervalUtils`
81a939d375b is described below

commit 81a939d375bad482333a8e87f00d21dd64048b0e
Author: yangjie01 
AuthorDate: Fri Jul 28 18:09:48 2023 +0800

[SPARK-44552][SQL] Remove `private object ParseState` definition from 
`IntervalUtils`

### What changes were proposed in this pull request?
SPARK-44326(https://github.com/apache/spark/pull/41885) moved the relevant 
code from `IntervalUtils` to `SparkIntervalUtils` (including `private object 
ParseState` definition),


https://github.com/apache/spark/blob/6ca45c52b7416e7b3520dc902cb24f060c7c72dd/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkIntervalUtils.scala#L247-L260

but did not delete the definition of `private object ParseState` in 
`IntervalUtils`. So this pr cleans up it.

### Why are the changes needed?
Code cleanup.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

Closes #42165 from LuciferYang/SPARK-44552.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: yangjie01 
(cherry picked from commit 7ee9e43ec6891f26b36a090a83536d3cae682861)
Signed-off-by: yangjie01 
---
 .../apache/spark/sql/catalyst/util/IntervalUtils.scala| 15 ---
 1 file changed, 15 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
index 6ba59b4e730..24620e692a0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
@@ -739,21 +739,6 @@ object IntervalUtils extends SparkIntervalUtils {
 fromDoubles(interval.months / num, interval.days / num, 
interval.microseconds / num)
   }
 
-  private object ParseState extends Enumeration {
-type ParseState = Value
-
-val PREFIX,
-TRIM_BEFORE_SIGN,
-SIGN,
-TRIM_BEFORE_VALUE,
-VALUE,
-VALUE_FRACTIONAL_PART,
-TRIM_BEFORE_UNIT,
-UNIT_BEGIN,
-UNIT_SUFFIX,
-UNIT_END = Value
-  }
-
   /**
* A safe version of `stringToInterval`. It returns null for invalid input 
string.
*/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44552][SQL] Remove `private object ParseState` definition from `IntervalUtils`

2023-07-28 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7ee9e43ec68 [SPARK-44552][SQL] Remove `private object ParseState` 
definition from `IntervalUtils`
7ee9e43ec68 is described below

commit 7ee9e43ec6891f26b36a090a83536d3cae682861
Author: yangjie01 
AuthorDate: Fri Jul 28 18:09:48 2023 +0800

[SPARK-44552][SQL] Remove `private object ParseState` definition from 
`IntervalUtils`

### What changes were proposed in this pull request?
SPARK-44326(https://github.com/apache/spark/pull/41885) moved the relevant 
code from `IntervalUtils` to `SparkIntervalUtils` (including `private object 
ParseState` definition),


https://github.com/apache/spark/blob/6ca45c52b7416e7b3520dc902cb24f060c7c72dd/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkIntervalUtils.scala#L247-L260

but did not delete the definition of `private object ParseState` in 
`IntervalUtils`. So this pr cleans up it.

### Why are the changes needed?
Code cleanup.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

Closes #42165 from LuciferYang/SPARK-44552.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: yangjie01 
---
 .../apache/spark/sql/catalyst/util/IntervalUtils.scala| 15 ---
 1 file changed, 15 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
index 6ba59b4e730..24620e692a0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
@@ -739,21 +739,6 @@ object IntervalUtils extends SparkIntervalUtils {
 fromDoubles(interval.months / num, interval.days / num, 
interval.microseconds / num)
   }
 
-  private object ParseState extends Enumeration {
-type ParseState = Value
-
-val PREFIX,
-TRIM_BEFORE_SIGN,
-SIGN,
-TRIM_BEFORE_VALUE,
-VALUE,
-VALUE_FRACTIONAL_PART,
-TRIM_BEFORE_UNIT,
-UNIT_BEGIN,
-UNIT_SUFFIX,
-UNIT_END = Value
-  }
-
   /**
* A safe version of `stringToInterval`. It returns null for invalid input 
string.
*/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-43968][PYTHON][3.5] Improve error messages for Python UDTFs with wrong number of outputs

2023-07-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 7059b69e67d [SPARK-43968][PYTHON][3.5] Improve error messages for 
Python UDTFs with wrong number of outputs
7059b69e67d is described below

commit 7059b69e67db8126dafc3d4b1f3b39e947c4c3ca
Author: allisonwang-db 
AuthorDate: Fri Jul 28 15:30:17 2023 +0900

[SPARK-43968][PYTHON][3.5] Improve error messages for Python UDTFs with 
wrong number of outputs

### What changes were proposed in this pull request?

This PR cherry-picks 7194ce9263fe1683c039a1aaf9462657b1672a99. It improves 
the error messages for Python UDTFs when the number of outputs mismatches the 
number of outputs specified in the return type of the UDTFs.

### Why are the changes needed?

To make Python UDTFs more user-friendly.

### Does this PR introduce _any_ user-facing change?

Yes. This PR improves the error messages.
Before this change, the error thrown by Spark will be a java 
IllegalStateException:
```
java.lang.IllegalStateException: Input row doesn't have expected number of 
values required by the schema
```
After this PR, it will throw a clearer error message with an error class:
```
[UDTF_RETURN_SCHEMA_MISMATCH] The number of columns in the result does not 
match the specified schema
```

### How was this patch tested?

Existing tests and new unit tests.

Closes #42192 from allisonwang-db/spark-43968-3.5.

Authored-by: allisonwang-db 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/errors/error_classes.py |   5 +
 python/pyspark/sql/connect/udtf.py |   4 +-
 .../pyspark/sql/tests/connect/test_parity_udtf.py  |  50 
 python/pyspark/sql/tests/test_udtf.py  | 133 +++--
 python/pyspark/sql/udtf.py |   9 +-
 python/pyspark/worker.py   |  22 +++-
 6 files changed, 99 insertions(+), 124 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index b1bf6b47af9..f6411fac1da 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -320,6 +320,11 @@ ERROR_CLASSES_JSON = """
   "The eval type for the UDTF '' is invalid. It must be one of 
."
 ]
   },
+  "INVALID_UDTF_HANDLER_TYPE" : {
+"message" : [
+  "The UDTF is invalid. The function handler must be a class, but got 
''. Please provide a class as the function handler."
+]
+  },
   "INVALID_UDTF_NO_EVAL" : {
 "message" : [
   "The UDTF '' is invalid. It does not implement the required 'eval' 
method. Please implement the 'eval' method in '' and try again."
diff --git a/python/pyspark/sql/connect/udtf.py 
b/python/pyspark/sql/connect/udtf.py
index 1fe8e1024ee..3747e37459e 100644
--- a/python/pyspark/sql/connect/udtf.py
+++ b/python/pyspark/sql/connect/udtf.py
@@ -124,6 +124,8 @@ class UserDefinedTableFunction:
 evalType: int = PythonEvalType.SQL_TABLE_UDF,
 deterministic: bool = True,
 ) -> None:
+_validate_udtf_handler(func)
+
 self.func = func
 self.returnType: DataType = (
 UnparsedDataType(returnType) if isinstance(returnType, str) else 
returnType
@@ -132,8 +134,6 @@ class UserDefinedTableFunction:
 self.evalType = evalType
 self.deterministic = deterministic
 
-_validate_udtf_handler(func)
-
 def _build_common_inline_user_defined_table_function(
 self, *cols: "ColumnOrName"
 ) -> CommonInlineUserDefinedTableFunction:
diff --git a/python/pyspark/sql/tests/connect/test_parity_udtf.py 
b/python/pyspark/sql/tests/connect/test_parity_udtf.py
index e18e116e003..355f5288d2c 100644
--- a/python/pyspark/sql/tests/connect/test_parity_udtf.py
+++ b/python/pyspark/sql/tests/connect/test_parity_udtf.py
@@ -54,56 +54,6 @@ class UDTFParityTests(BaseUDTFTestsMixin, 
ReusedConnectTestCase):
 ):
 TestUDTF(lit(1)).collect()
 
-def test_udtf_with_wrong_num_output(self):
-err_msg = (
-"java.lang.IllegalStateException: Input row doesn't have expected 
number of "
-+ "values required by the schema."
-)
-
-@udtf(returnType="a: int, b: int")
-class TestUDTF:
-def eval(self, a: int):
-yield a,
-
-with self.assertRaisesRegex(SparkConnectGrpcException, err_msg):
-TestUDTF(lit(1)).collect()
-
-@udtf(returnType="a: int")
-class TestUDTF:
-def eval(self, a: int):
-yield a, a + 1
-
-with self.assertRaisesRegex(SparkConnectGrpcException, err_msg):
-TestUDTF(lit(1)).collect()
-
-def