[spark] branch master updated: [SPARK-44497][WEBUI] Show task partition id in Task table

2023-08-26 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7866da9519a [SPARK-44497][WEBUI] Show task partition id in Task table
7866da9519a is described below

commit 7866da9519a586e617a55702c39f19d0e16b7279
Author: sychen 
AuthorDate: Sun Aug 27 13:28:49 2023 +0800

[SPARK-44497][WEBUI] Show task partition id in Task table

### What changes were proposed in this pull request?

### Why are the changes needed?
In [SPARK-37831](https://issues.apache.org/jira/browse/SPARK-37831), the 
partition id is added in taskinfo, and the task partition id cannot be directly 
seen in the UI.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
local test

Closes #42093 from cxzl25/SPARK-44497.

Authored-by: sychen 
Signed-off-by: Kent Yao 
---
 core/src/main/resources/org/apache/spark/ui/static/stagepage.js| 7 +--
 .../resources/org/apache/spark/ui/static/stagespage-template.html  | 2 +-
 core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala   | 2 +-
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
index a8792593bf2..3c9c38bd092 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
@@ -846,12 +846,7 @@ $(document).ready(function () {
 }
   },
   "columns": [
-{
-  data: function (row, type) {
-return type !== 'display' ? (isNaN(row.index) ? 0 : row.index 
) : row.index;
-  },
-  name: "Index"
-},
+{data: "partitionId", name: "Index"},
 {data : "taskId", name: "ID"},
 {data : "attempt", name: "Attempt"},
 {data : "status", name: "Status"},
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html 
b/core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html
index 699278e0c16..86ec73e91a7 100644
--- 
a/core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html
+++ 
b/core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html
@@ -54,7 +54,7 @@ limitations under the License.
 
 
 
-   
+
 Executor ID
 Logs
 Address
diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index d50ccdadff5..7b71c476982 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -469,7 +469,7 @@ private[spark] object ApiHelper {
 
   private[ui] val COLUMN_TO_INDEX = Map(
 HEADER_ID -> null.asInstanceOf[String],
-HEADER_TASK_INDEX -> TaskIndexNames.TASK_INDEX,
+HEADER_TASK_INDEX -> TaskIndexNames.TASK_PARTITION_ID,
 HEADER_ATTEMPT -> TaskIndexNames.ATTEMPT,
 HEADER_STATUS -> TaskIndexNames.STATUS,
 HEADER_LOCALITY -> TaskIndexNames.LOCALITY,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42944][PYTHON][FOLLOW-UP] Rename tests from foreachBatch to foreach_batch

2023-08-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 84a65bd1cf9 [SPARK-42944][PYTHON][FOLLOW-UP] Rename tests from 
foreachBatch to foreach_batch
84a65bd1cf9 is described below

commit 84a65bd1cf95775c4210c6dac8026551fd9d150f
Author: Hyukjin Kwon 
AuthorDate: Sat Aug 26 21:51:29 2023 -0700

[SPARK-42944][PYTHON][FOLLOW-UP] Rename tests from foreachBatch to 
foreach_batch

### What changes were proposed in this pull request?

This PR proposes to rename tests from foreachBatch to foreach_batch.

### Why are the changes needed?

Non-API should follow snake_naming rule per PEP 8.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

CI in this PR should test it out.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42675 from HyukjinKwon/pyspark-connect.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .../planner/StreamingForeachBatchHelper.scala  |  2 +-
 dev/sparktestsupport/modules.py|  4 ++--
 ...eachBatch_worker.py => foreach_batch_worker.py} |  0
 ...oreachBatch.py => test_parity_foreach_batch.py} | 12 +-
 ...achBatch.py => test_streaming_foreach_batch.py} | 28 +++---
 5 files changed, 23 insertions(+), 23 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
index ef7195439f9..c30e08bc39d 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
@@ -108,7 +108,7 @@ object StreamingForeachBatchHelper extends Logging {
   pythonFn,
   connectUrl,
   sessionHolder.sessionId,
-  "pyspark.sql.connect.streaming.worker.foreachBatch_worker")
+  "pyspark.sql.connect.streaming.worker.foreach_batch_worker")
 val (dataOut, dataIn) = runner.init()
 
 val foreachBatchRunnerFn: FnArgsWithId => Unit = (args: FnArgsWithId) => {
diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index 3c018ac7c83..741b89466be 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -497,7 +497,7 @@ pyspark_sql = Module(
 "pyspark.sql.tests.test_session",
 "pyspark.sql.tests.streaming.test_streaming",
 "pyspark.sql.tests.streaming.test_streaming_foreach",
-"pyspark.sql.tests.streaming.test_streaming_foreachBatch",
+"pyspark.sql.tests.streaming.test_streaming_foreach_batch",
 "pyspark.sql.tests.streaming.test_streaming_listener",
 "pyspark.sql.tests.test_types",
 "pyspark.sql.tests.test_udf",
@@ -866,7 +866,7 @@ pyspark_connect = Module(
 "pyspark.sql.tests.connect.streaming.test_parity_streaming",
 "pyspark.sql.tests.connect.streaming.test_parity_listener",
 "pyspark.sql.tests.connect.streaming.test_parity_foreach",
-"pyspark.sql.tests.connect.streaming.test_parity_foreachBatch",
+"pyspark.sql.tests.connect.streaming.test_parity_foreach_batch",
 "pyspark.sql.tests.connect.test_parity_pandas_grouped_map_with_state",
 "pyspark.sql.tests.connect.test_parity_pandas_udf_scalar",
 "pyspark.sql.tests.connect.test_parity_pandas_udf_grouped_agg",
diff --git a/python/pyspark/sql/connect/streaming/worker/foreachBatch_worker.py 
b/python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py
similarity index 100%
rename from python/pyspark/sql/connect/streaming/worker/foreachBatch_worker.py
rename to python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py
diff --git 
a/python/pyspark/sql/tests/connect/streaming/test_parity_foreachBatch.py 
b/python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py
similarity index 87%
rename from 
python/pyspark/sql/tests/connect/streaming/test_parity_foreachBatch.py
rename to 
python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py
index 0718c6a88b0..e4577173687 100644
--- a/python/pyspark/sql/tests/connect/streaming/test_parity_foreachBatch.py
+++ b/python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py
@@ -17,19 +17,19 @@
 
 import unittest
 
-from pyspark.sql.tests.streaming.test_streaming_foreachBatch import 
StreamingTestsForeachBatchMixin
+from pyspark.sql.tests.streaming.test_streaming_foreach_batch import 
StreamingTestsForeachBatchMixin
 from pyspark.testing.connectutils import ReusedConnectTestCase
 from 

[spark] branch branch-3.5 updated: [SPARK-44784][CONNECT] Make SBT testing hermetic

2023-08-26 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new fa2e53f6105 [SPARK-44784][CONNECT] Make SBT testing hermetic
fa2e53f6105 is described below

commit fa2e53f6105c60effdf210cab4c8d77f13fee6b6
Author: Herman van Hovell 
AuthorDate: Sun Aug 27 11:28:58 2023 +0800

[SPARK-44784][CONNECT] Make SBT testing hermetic

### What changes were proposed in this pull request?
This PR makes a bunch of changes to connect testing for the scala client:
- We do not start the connect server with the `SPARK_DIST_CLASSPATH ` 
environment variable. This is set by the build system, but its value for SBT 
and Maven is different. For SBT it also contained the client code.
- We use dependency upload to add the dependencies needed for the tests. 
Currently this entails: the compiled test classes (class files), scalatest 
jars, and scalactic jars.
- The use of classfile sync unearthed an issue with stubbing and the 
`ExecutorClassLoader`. If they load classes in the same namespace then stubbing 
will generate stubs for classes that can be loaded by the 
`ExecutorClassLoader`. Since this is mostly a testing issue I decided to move 
the test code to a different namespace. We should definitely fix this later on.
- A bunch of tiny fixes.

### Why are the changes needed?
SBT testing for connect leaked client side code into the server. This is a 
problem because tests pass and we sign-off on features that do not work when 
well in a normal environment. Stubbing was an example of this. Maven did not 
have this problem and was therefore more correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
It are mostly tests.

### Was this patch authored or co-authored using generative AI tooling?
No. I write my own code thank you...

Closes #42591 from hvanhovell/investigate-stubbing.

Authored-by: Herman van Hovell 
Signed-off-by: yangjie01 
(cherry picked from commit 9326615592eac14c7cab3dd126b3c21222b7778f)
Signed-off-by: yangjie01 
---
 connector/connect/bin/spark-connect-build  |   6 +-
 connector/connect/bin/spark-connect-scala-client   |   2 +-
 .../bin/spark-connect-scala-client-classpath   |   3 +-
 connector/connect/client/jvm/pom.xml   |   5 -
 .../apache/spark/sql/connect/client/ToStub.scala}  |  11 +-
 .../org/apache/spark/sql/JavaEncoderSuite.java |  12 +-
 .../scala/org/apache/spark/sql/CatalogSuite.scala  |   2 +-
 .../spark/sql/ClientDataFrameStatSuite.scala   |   2 +-
 .../org/apache/spark/sql/ClientDatasetSuite.scala  |   2 +-
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  |   4 +-
 .../org/apache/spark/sql/ColumnTestSuite.scala |   2 +-
 .../spark/sql/DataFrameNaFunctionSuite.scala   |   2 +-
 .../org/apache/spark/sql/FunctionTestSuite.scala   |   2 +-
 .../sql/KeyValueGroupedDatasetE2ETestSuite.scala   |   2 +-
 .../apache/spark/sql/PlanGenerationTestSuite.scala |   3 +-
 .../apache/spark/sql/SQLImplicitsTestSuite.scala   |   2 +-
 .../apache/spark/sql/SparkSessionE2ESuite.scala|   2 +-
 .../org/apache/spark/sql/SparkSessionSuite.scala   |   2 +-
 .../org/apache/spark/sql/StubbingTestSuite.scala}  |  23 ++-
 .../client => }/UDFClassLoadingE2ESuite.scala  |   5 +-
 .../sql/UserDefinedFunctionE2ETestSuite.scala  |  16 +-
 .../spark/sql/UserDefinedFunctionSuite.scala   |   2 +-
 .../spark/sql/application/ReplE2ESuite.scala   |   2 +-
 .../spark/sql/connect/client/ArtifactSuite.scala   |   2 +-
 .../CheckConnectJvmClientCompatibility.scala   |   2 +-
 .../sql/connect/client/ClassFinderSuite.scala  |   2 +-
 .../SparkConnectClientBuilderParseTestSuite.scala  |   2 +-
 .../connect/client/SparkConnectClientSuite.scala   |   2 +-
 .../connect/client/arrow/ArrowEncoderSuite.scala   |   2 +-
 .../connect/client/util/RemoteSparkSession.scala   | 228 -
 .../sql/streaming/ClientStreamingQuerySuite.scala  |   4 +-
 .../FlatMapGroupsWithStateStreamingSuite.scala |   4 +-
 .../streaming/StreamingQueryProgressSuite.scala|   2 +-
 .../client/util => test}/ConnectFunSuite.scala |   2 +-
 .../util => test}/IntegrationTestUtils.scala   |  22 +-
 .../{connect/client/util => test}/QueryTest.scala  |   2 +-
 .../apache/spark/sql/test/RemoteSparkSession.scala | 224 
 .../apache/spark/sql/{ => test}/SQLHelper.scala|   3 +-
 .../spark/sql/connect/client/ArtifactManager.scala |  20 +-
 .../sql/connect/client/GrpcRetryHandler.scala  |   4 +-
 .../sql/connect/client/SparkConnectClient.scala|   2 +-
 .../spark/sql/connect/common/ProtoDataTypes.scala  |   2 +-
 .../sql/connect/common/config/ConnectCommon.scala  |   2 +-
 .../artifact/SparkConnectArtifactManager.scala 

[spark] branch master updated: [SPARK-44784][CONNECT] Make SBT testing hermetic

2023-08-26 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9326615592e [SPARK-44784][CONNECT] Make SBT testing hermetic
9326615592e is described below

commit 9326615592eac14c7cab3dd126b3c21222b7778f
Author: Herman van Hovell 
AuthorDate: Sun Aug 27 11:28:58 2023 +0800

[SPARK-44784][CONNECT] Make SBT testing hermetic

### What changes were proposed in this pull request?
This PR makes a bunch of changes to connect testing for the scala client:
- We do not start the connect server with the `SPARK_DIST_CLASSPATH ` 
environment variable. This is set by the build system, but its value for SBT 
and Maven is different. For SBT it also contained the client code.
- We use dependency upload to add the dependencies needed for the tests. 
Currently this entails: the compiled test classes (class files), scalatest 
jars, and scalactic jars.
- The use of classfile sync unearthed an issue with stubbing and the 
`ExecutorClassLoader`. If they load classes in the same namespace then stubbing 
will generate stubs for classes that can be loaded by the 
`ExecutorClassLoader`. Since this is mostly a testing issue I decided to move 
the test code to a different namespace. We should definitely fix this later on.
- A bunch of tiny fixes.

### Why are the changes needed?
SBT testing for connect leaked client side code into the server. This is a 
problem because tests pass and we sign-off on features that do not work when 
well in a normal environment. Stubbing was an example of this. Maven did not 
have this problem and was therefore more correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
It are mostly tests.

### Was this patch authored or co-authored using generative AI tooling?
No. I write my own code thank you...

Closes #42591 from hvanhovell/investigate-stubbing.

Authored-by: Herman van Hovell 
Signed-off-by: yangjie01 
---
 connector/connect/bin/spark-connect-build  |   6 +-
 connector/connect/bin/spark-connect-scala-client   |   2 +-
 .../bin/spark-connect-scala-client-classpath   |   3 +-
 connector/connect/client/jvm/pom.xml   |   5 -
 .../apache/spark/sql/connect/client/ToStub.scala}  |  11 +-
 .../org/apache/spark/sql/JavaEncoderSuite.java |  12 +-
 .../scala/org/apache/spark/sql/CatalogSuite.scala  |   2 +-
 .../spark/sql/ClientDataFrameStatSuite.scala   |   2 +-
 .../org/apache/spark/sql/ClientDatasetSuite.scala  |   2 +-
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  |   4 +-
 .../org/apache/spark/sql/ColumnTestSuite.scala |   2 +-
 .../spark/sql/DataFrameNaFunctionSuite.scala   |   2 +-
 .../org/apache/spark/sql/FunctionTestSuite.scala   |   2 +-
 .../sql/KeyValueGroupedDatasetE2ETestSuite.scala   |   2 +-
 .../apache/spark/sql/PlanGenerationTestSuite.scala |   3 +-
 .../apache/spark/sql/SQLImplicitsTestSuite.scala   |   2 +-
 .../apache/spark/sql/SparkSessionE2ESuite.scala|   2 +-
 .../org/apache/spark/sql/SparkSessionSuite.scala   |   2 +-
 .../org/apache/spark/sql/StubbingTestSuite.scala}  |  23 ++-
 .../client => }/UDFClassLoadingE2ESuite.scala  |   5 +-
 .../sql/UserDefinedFunctionE2ETestSuite.scala  |  16 +-
 .../spark/sql/UserDefinedFunctionSuite.scala   |   2 +-
 .../spark/sql/application/ReplE2ESuite.scala   |   2 +-
 .../spark/sql/connect/client/ArtifactSuite.scala   |   2 +-
 .../CheckConnectJvmClientCompatibility.scala   |   2 +-
 .../sql/connect/client/ClassFinderSuite.scala  |   2 +-
 .../SparkConnectClientBuilderParseTestSuite.scala  |   2 +-
 .../connect/client/SparkConnectClientSuite.scala   |   2 +-
 .../connect/client/arrow/ArrowEncoderSuite.scala   |   2 +-
 .../connect/client/util/RemoteSparkSession.scala   | 228 -
 .../sql/streaming/ClientStreamingQuerySuite.scala  |   4 +-
 .../FlatMapGroupsWithStateStreamingSuite.scala |   4 +-
 .../streaming/StreamingQueryProgressSuite.scala|   2 +-
 .../client/util => test}/ConnectFunSuite.scala |   2 +-
 .../util => test}/IntegrationTestUtils.scala   |  22 +-
 .../{connect/client/util => test}/QueryTest.scala  |   2 +-
 .../apache/spark/sql/test/RemoteSparkSession.scala | 224 
 .../apache/spark/sql/{ => test}/SQLHelper.scala|   3 +-
 .../spark/sql/connect/client/ArtifactManager.scala |  20 +-
 .../sql/connect/client/GrpcRetryHandler.scala  |   4 +-
 .../sql/connect/client/SparkConnectClient.scala|   2 +-
 .../spark/sql/connect/common/ProtoDataTypes.scala  |   2 +-
 .../sql/connect/common/config/ConnectCommon.scala  |   2 +-
 .../artifact/SparkConnectArtifactManager.scala |  40 +++-
 .../org/apache/spark/util/StubClassLoader.scala|   5 +-
 project/SparkBuild.scala  

[spark] branch master updated: [SPARK-44964][ML][CONNECT][TESTS] Clean up pyspark.ml.connect.functions doctest

2023-08-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 23ce9c46fa8 [SPARK-44964][ML][CONNECT][TESTS] Clean up 
pyspark.ml.connect.functions doctest
23ce9c46fa8 is described below

commit 23ce9c46fa80a2256ebe06932bf2963a611d1a4d
Author: Hyukjin Kwon 
AuthorDate: Sat Aug 26 20:26:46 2023 -0700

[SPARK-44964][ML][CONNECT][TESTS] Clean up pyspark.ml.connect.functions 
doctest

### What changes were proposed in this pull request?

This PR proposes to clean up `pyspark.ml.connect.functions` doctest. All of 
the tests under that are being skipped.

### Why are the changes needed?

To remove unused test codes.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually ran the tests via:

```python
./python/run-tests --python-executables=python3 --modules=pyspark-ml-connect
```

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #42679 from HyukjinKwon/SPARK-44964.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 dev/sparktestsupport/modules.py|  2 --
 python/pyspark/ml/connect/__init__.py  |  3 +++
 python/pyspark/ml/connect/functions.py | 43 --
 3 files changed, 3 insertions(+), 45 deletions(-)

diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index 64ccf600ef0..3c018ac7c83 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -886,8 +886,6 @@ pyspark_ml_connect = Module(
 "python/pyspark/ml/connect",
 ],
 python_test_goals=[
-# ml doctests
-"pyspark.ml.connect.functions",
 # ml unittests
 "pyspark.ml.tests.connect.test_connect_function",
 "pyspark.ml.tests.connect.test_parity_torch_distributor",
diff --git a/python/pyspark/ml/connect/__init__.py 
b/python/pyspark/ml/connect/__init__.py
index 2ee152f6a38..fb92b4d81bf 100644
--- a/python/pyspark/ml/connect/__init__.py
+++ b/python/pyspark/ml/connect/__init__.py
@@ -16,6 +16,9 @@
 #
 
 """Spark Connect Python Client - ML module"""
+from pyspark.sql.connect.utils import check_dependencies
+
+check_dependencies(__name__)
 
 from pyspark.ml.connect.base import (
 Estimator,
diff --git a/python/pyspark/ml/connect/functions.py 
b/python/pyspark/ml/connect/functions.py
index ab7e3ab3c9a..c681bf5926b 100644
--- a/python/pyspark/ml/connect/functions.py
+++ b/python/pyspark/ml/connect/functions.py
@@ -14,12 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-from pyspark.sql.connect.utils import check_dependencies
-
-check_dependencies(__name__)
-
 from pyspark.ml import functions as PyMLFunctions
-
 from pyspark.sql.connect.column import Column
 from pyspark.sql.connect.functions import _invoke_function, _to_col, lit
 
@@ -36,41 +31,3 @@ def array_to_vector(col: Column) -> Column:
 
 
 array_to_vector.__doc__ = PyMLFunctions.array_to_vector.__doc__
-
-
-def _test() -> None:
-import sys
-import doctest
-from pyspark.sql import SparkSession as PySparkSession
-import pyspark.ml.connect.functions
-
-globs = pyspark.ml.connect.functions.__dict__.copy()
-
-# TODO: split vector_to_array doctest since it includes .mllib vectors
-del pyspark.ml.connect.functions.vector_to_array.__doc__
-
-# TODO: spark.createDataFrame should support UDT
-del pyspark.ml.connect.functions.array_to_vector.__doc__
-
-globs["spark"] = (
-PySparkSession.builder.appName("ml.connect.functions tests")
-.remote("local[4]")
-.getOrCreate()
-)
-
-(failure_count, test_count) = doctest.testmod(
-pyspark.ml.connect.functions,
-globs=globs,
-optionflags=doctest.ELLIPSIS
-| doctest.NORMALIZE_WHITESPACE
-| doctest.IGNORE_EXCEPTION_DETAIL,
-)
-
-globs["spark"].stop()
-
-if failure_count:
-sys.exit(-1)
-
-
-if __name__ == "__main__":
-_test()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44975][SQL] Remove BinaryArithmetic useless override resolved

2023-08-26 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 04339e30dbd [SPARK-44975][SQL] Remove BinaryArithmetic useless 
override resolved
04339e30dbd is described below

commit 04339e30dbdda2805edbac7e1e3cd8dfb5c3c608
Author: Jia Fan 
AuthorDate: Sat Aug 26 21:11:20 2023 +0300

[SPARK-44975][SQL] Remove BinaryArithmetic useless override resolved

### What changes were proposed in this pull request?
Remove `BinaryArithmetic` useless override resolved, it is exactly the same 
as the abstract class `Expression`

### Why are the changes needed?
remove useless logic

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
exist test

### Was this patch authored or co-authored using generative AI tooling?

Closes #42689 from Hisoka-X/SPARK-44975_remove_resolved_override.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala| 2 --
 1 file changed, 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 31d4d71cd40..2d9bccc0854 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -264,8 +264,6 @@ abstract class BinaryArithmetic extends BinaryOperator
 
   final override val nodePatterns: Seq[TreePattern] = Seq(BINARY_ARITHMETIC)
 
-  override lazy val resolved: Boolean = childrenResolved && 
checkInputDataTypes().isSuccess
-
   override def initQueryContext(): Option[SQLQueryContext] = {
 if (failOnError) {
   Some(origin.context)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1

2023-08-26 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 2a66771b45a [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1
2a66771b45a is described below

commit 2a66771b45a7729143ddc45da5dcf095820fd80d
Author: yangjie01 
AuthorDate: Sat Aug 26 17:31:09 2023 +0800

[SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1

### What changes were proposed in this pull request?
After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for 
Java 11 and Java 17 began to experience ABORTED in the 
`HiveExternalCatalogVersionsSuite` test.

Java 11

- https://github.com/apache/spark/actions/runs/5953716283/job/16148657660
- https://github.com/apache/spark/actions/runs/5966131923/job/16185159550

Java 17

- https://github.com/apache/spark/actions/runs/5956925790/job/16158714165
- https://github.com/apache/spark/actions/runs/5969348559/job/16195073478

```
2023-08-23T23:00:49.6547573Z [info]   2023-08-23 16:00:48.209 - stdout> : 
java.lang.RuntimeException: problem during retrieve of 
org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: 
java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 
are retrieved to the same file! Update the retrieve pattern to fix this error.
2023-08-23T23:00:49.6548745Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238)
2023-08-23T23:00:49.6549572Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89)
2023-08-23T23:00:49.6550334Z [info]   2023-08-23 16:00:48.209 - stdout> 
at org.apache.ivy.Ivy.retrieve(Ivy.java:551)
2023-08-23T23:00:49.6551079Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464)
2023-08-23T23:00:49.6552024Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138)
2023-08-23T23:00:49.6552884Z [info]   2023-08-23 16:00:48.209 - stdout> 
at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
2023-08-23T23:00:49.6553755Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138)
2023-08-23T23:00:49.6554705Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65)
2023-08-23T23:00:49.6555637Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64)
2023-08-23T23:00:49.6556554Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443)
2023-08-23T23:00:49.6557340Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356)
2023-08-23T23:00:49.6558187Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
2023-08-23T23:00:49.6559061Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)
2023-08-23T23:00:49.6559962Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224)
2023-08-23T23:00:49.6560766Z [info]   2023-08-23 16:00:48.209 - stdout> 
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
2023-08-23T23:00:49.6561584Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
2023-08-23T23:00:49.6562510Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224)
2023-08-23T23:00:49.6563435Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150)
2023-08-23T23:00:49.6564323Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140)
2023-08-23T23:00:49.6565340Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 

[spark] branch master updated: [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1

2023-08-26 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f8a1991e79 [SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1
4f8a1991e79 is described below

commit 4f8a1991e793bba2a6620760b6ee2cdc8f3ff21d
Author: yangjie01 
AuthorDate: Sat Aug 26 17:31:09 2023 +0800

[SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1

### What changes were proposed in this pull request?
After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for 
Java 11 and Java 17 began to experience ABORTED in the 
`HiveExternalCatalogVersionsSuite` test.

Java 11

- https://github.com/apache/spark/actions/runs/5953716283/job/16148657660
- https://github.com/apache/spark/actions/runs/5966131923/job/16185159550

Java 17

- https://github.com/apache/spark/actions/runs/5956925790/job/16158714165
- https://github.com/apache/spark/actions/runs/5969348559/job/16195073478

```
2023-08-23T23:00:49.6547573Z [info]   2023-08-23 16:00:48.209 - stdout> : 
java.lang.RuntimeException: problem during retrieve of 
org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: 
java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 
are retrieved to the same file! Update the retrieve pattern to fix this error.
2023-08-23T23:00:49.6548745Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238)
2023-08-23T23:00:49.6549572Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89)
2023-08-23T23:00:49.6550334Z [info]   2023-08-23 16:00:48.209 - stdout> 
at org.apache.ivy.Ivy.retrieve(Ivy.java:551)
2023-08-23T23:00:49.6551079Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464)
2023-08-23T23:00:49.6552024Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138)
2023-08-23T23:00:49.6552884Z [info]   2023-08-23 16:00:48.209 - stdout> 
at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
2023-08-23T23:00:49.6553755Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138)
2023-08-23T23:00:49.6554705Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65)
2023-08-23T23:00:49.6555637Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64)
2023-08-23T23:00:49.6556554Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443)
2023-08-23T23:00:49.6557340Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356)
2023-08-23T23:00:49.6558187Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
2023-08-23T23:00:49.6559061Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)
2023-08-23T23:00:49.6559962Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224)
2023-08-23T23:00:49.6560766Z [info]   2023-08-23 16:00:48.209 - stdout> 
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
2023-08-23T23:00:49.6561584Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
2023-08-23T23:00:49.6562510Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224)
2023-08-23T23:00:49.6563435Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150)
2023-08-23T23:00:49.6564323Z [info]   2023-08-23 16:00:48.209 - stdout> 
at 
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140)
2023-08-23T23:00:49.6565340Z [info]   2023-08-23 16:00:48.209 - stdout> 
at