[spark] branch master updated: [SPARK-44782][INFRA] Adjust PR template to Generative Tooling Guidance recommendations

2023-08-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e2f5e9c28b [SPARK-44782][INFRA] Adjust PR template to Generative 
Tooling Guidance recommendations
2e2f5e9c28b is described below

commit 2e2f5e9c28b4e88171949006937c094304581738
Author: zero323 
AuthorDate: Fri Aug 18 21:13:36 2023 -0500

[SPARK-44782][INFRA] Adjust PR template to Generative Tooling Guidance 
recommendations

### What changes were proposed in this pull request?

This PR adds _Was this patch authored or co-authored using generative AI 
tooling?_ section to the PR template.

### Why are the changes needed?

To reflect recommendations of the [ASF Generative Tooling 
Guidance](https://www.apache.org/legal/generative-tooling.html).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual inspection.

Closes #42469 from zero323/SPARK-44782.

Authored-by: zero323 
Signed-off-by: Sean Owen 
---
 .github/PULL_REQUEST_TEMPLATE | 9 +
 1 file changed, 9 insertions(+)

diff --git a/.github/PULL_REQUEST_TEMPLATE b/.github/PULL_REQUEST_TEMPLATE
index 1548696a3ca..a80bf21312a 100644
--- a/.github/PULL_REQUEST_TEMPLATE
+++ b/.github/PULL_REQUEST_TEMPLATE
@@ -47,3 +47,12 @@ If it was tested in a way different from regular unit tests, 
please clarify how
 If tests were not added, please describe why they were not added and/or why it 
was difficult to add.
 If benchmark tests were added, please run the benchmarks in GitHub Actions for 
the consistent environment, and the instructions could accord to: 
https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
 -->
+
+
+### Was this patch authored or co-authored using generative AI tooling?
+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r63506 - /dev/spark/v3.5.0-rc2-bin/

2023-08-18 Thread liyuanjian
Author: liyuanjian
Date: Fri Aug 18 21:12:20 2023
New Revision: 63506

Log:
Apache Spark v3.5.0-rc2

Added:
dev/spark/v3.5.0-rc2-bin/
dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz   (with props)
dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz.asc
dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz.sha512
dev/spark/v3.5.0-rc2-bin/pyspark-3.5.0.tar.gz   (with props)
dev/spark/v3.5.0-rc2-bin/pyspark-3.5.0.tar.gz.asc
dev/spark/v3.5.0-rc2-bin/pyspark-3.5.0.tar.gz.sha512
dev/spark/v3.5.0-rc2-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.5.0-rc2-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.5.0-rc2-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.5.0-rc2-bin/spark-3.5.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.5.0-rc2-bin/spark-3.5.0-bin-hadoop3.tgz.asc
dev/spark/v3.5.0-rc2-bin/spark-3.5.0-bin-hadoop3.tgz.sha512
dev/spark/v3.5.0-rc2-bin/spark-3.5.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.5.0-rc2-bin/spark-3.5.0-bin-without-hadoop.tgz.asc
dev/spark/v3.5.0-rc2-bin/spark-3.5.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.5.0-rc2-bin/spark-3.5.0.tgz   (with props)
dev/spark/v3.5.0-rc2-bin/spark-3.5.0.tgz.asc
dev/spark/v3.5.0-rc2-bin/spark-3.5.0.tgz.sha512

Added: dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz.asc
==
--- dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz.asc (added)
+++ dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz.asc Fri Aug 18 21:12:20 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJKBAABCgA0FiEE/Drjp+qhusmHcIQOfhq8xTqqIhYFAmTf3lIWHGxpeXVhbmpp
+YW5AYXBhY2hlLm9yZwAKCRB+GrzFOqoiFptrD/4+D/3YCZuXw2tb4GUq4CpCR3uO
+uJogYTjVlDxxuuCD5J8g9CKXTNQEghGwddHJxkGW42B6R8KVoOylxyWDj7ZlUkbK
+gae1c3srPFbNqlCS9wuI9Bxtb1CFr9qvIaNSYwTRWkfBWJvz6nmkLpqccz6QIBFo
+yGcdfNg2+ZaYe3uG6DsSqWWTsUFXqsnaG02QTkxRBE6fxjVZFb7W7PWFMb98d94U
+TnoNpnZSttA/g7cQUJkyuE9e7vEKzf7+Q1DWVyByZB+EjYWrrDkElF1NWQtOUAxw
+NxkVud22JmQMmrbQfD25lPD/rD7DxkVpKHkz6v3Mif9ZUkAvk2BBK9Zud9GnUdrh
+5AnZ3A/YxO68Beqp0mQnFYRXMzLPRB3VuyI5yifUafZUsihly4k7i5B6iSNLiFMU
+Ub1xXVMLPEmrn8ZGAsVc6HcxSmI+GJhf92weOKEs7HgOULcy7IRaUGSefNXXmILB
+g2LC1zS3uwO5i19hyG+j8HoVLuy61yaeeRzEFCfs/9fpIBS9R22PJxFkVuU/2OwS
+Px7vYWxGTUR0xXCHB26Ep/n6413lIfoiKm7bDRFkcHQbpuZHV5GiuSC+0L2WKBGd
+yvRTXknJb0nmfONzjmqXzd6ClkHZbAPlgZg+nY0oaAHA2JXdVhjfcYRIiiMcUU7i
+JLzXAMuXQJrTntoD2w==
+=enEM
+-END PGP SIGNATURE-

Added: dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz.sha512
==
--- dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz.sha512 (added)
+++ dev/spark/v3.5.0-rc2-bin/SparkR_3.5.0.tar.gz.sha512 Fri Aug 18 21:12:20 2023
@@ -0,0 +1 @@
+92a36033042014dea66914f4d0be9a1d243a9ec736fb4efb9fdcbd6ad7c1c54a54183455936f250839826c9cb24cd79084afb033675c3073c2d318fb77c8f64d
  SparkR_3.5.0.tar.gz

Added: dev/spark/v3.5.0-rc2-bin/pyspark-3.5.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.5.0-rc2-bin/pyspark-3.5.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.5.0-rc2-bin/pyspark-3.5.0.tar.gz.asc
==
--- dev/spark/v3.5.0-rc2-bin/pyspark-3.5.0.tar.gz.asc (added)
+++ dev/spark/v3.5.0-rc2-bin/pyspark-3.5.0.tar.gz.asc Fri Aug 18 21:12:20 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJKBAABCgA0FiEE/Drjp+qhusmHcIQOfhq8xTqqIhYFAmTf3lMWHGxpeXVhbmpp
+YW5AYXBhY2hlLm9yZwAKCRB+GrzFOqoiFrzQEACqQmjnoQ+LmZfmbsJxhs1prxX3
+b4FUWk5HVnD0BKebELThLx9Fsu2V8CU6WmOz0RuPcObS0mJrmb7ekh/vRAnzstJq
+W8hYyfGXUMgDWvJcliNgILaN0i+HW5raN3xsMqCe+YVpvX4pEeni0FSUXLDM7Brh
+J23WRoJvT60buDtyjuUTCT/cdSmzuKg+MVSOp2u+gA7qwcXvejNFDFJY9eFndrVa
+gi6BzM4qYWpY4sJuAADy/zA6KBawvMgML4dk0gefswKR1MBz9WWVjy5nvx5ki/Od
+NUV2vYKkRjucqpMrQJ4fY/knkN89bTpua0M4X2Bbbfn0uoj1IuXYKvCt9z+AY4To
+yRyAxO8pMfT6b7tq5qogh0Jc1y3EQsyBjhyuE65NF+RrAGFvASWNikEdXQNjJVlc
+fwYL016esso6KjHCAb7Vwnoca6L4UinB1DfKuWAKzwphQOUK7HQHyOpPBmDn4Ceo
+X4AlO0I3uIxuiGvCKdlhkEJyVleSYzhO2ZLb837A9S4U388VTAgX4N5eXqQPLoOB
+4RHZJ4SFhpl/bks44lFGLD91izxGUs63RwV5f8xABW9qPwBu8eJgYCvB9NjADpGZ
+3zNua3pZ0YK3Mlb0PZGbzWXgwECUQthtZC/xpCNQPXvX82bOQsr+T9zGOC3YGWwg
+V4HhhBHsd5xm/Ng5rg==
+=NtnC
+-END PGP SIGNATURE-

Added: dev/spark/v3.5.0-rc2-bin/pyspark-3.5.0.tar.gz.sha512
==
--- dev/spark/v3.5.0-rc2-bin/pyspark-3.5.0.tar.gz.sha512 (added)
+++ 

[spark] branch branch-3.5 updated: [SPARK-44873] Support alter view with nested columns in Hive client

2023-08-18 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new cb8de35b749 [SPARK-44873] Support alter view with nested columns in 
Hive client
cb8de35b749 is described below

commit cb8de35b749e8e2ff3d0667be3e933c682c528ee
Author: kylerong-db 
AuthorDate: Fri Aug 18 11:07:21 2023 -0700

[SPARK-44873] Support alter view with nested columns in Hive client

### What changes were proposed in this pull request?
Previously, if a view's schema contains a nested struct, alterTable using 
Hive client will fail. This change supports a view with a nested struct. The 
mechanism is to store an empty schema when we call Hive client, since we 
already store the actual schema in table properties. This fix is similar to 
https://github.com/apache/spark/pull/37364

### Why are the changes needed?
This supports using view with nested structs in Hive metastore.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added unit test.

Closes #42532 from kylerong-db/hive_view.

Authored-by: kylerong-db 
Signed-off-by: Gengliang Wang 
(cherry picked from commit c7d63eac48d3a81099456360f1c30e6049824749)
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala | 11 ++-
 .../org/apache/spark/sql/hive/HiveParquetSourceSuite.scala| 10 ++
 .../org/apache/spark/sql/hive/execution/HiveDDLSuite.scala|  8 
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index b1eadea42e0..67b780f13c4 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -595,7 +595,16 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
 
 if (tableDefinition.tableType == VIEW) {
   val newTableProps = tableDefinition.properties ++ 
tableMetaToTableProps(tableDefinition).toMap
-  client.alterTable(tableDefinition.copy(properties = newTableProps))
+  val newTable = tableDefinition.copy(properties = newTableProps)
+  try {
+client.alterTable(newTable)
+  } catch {
+case NonFatal(e) =>
+  // If for some reason we fail to store the schema we store it as 
empty there
+  // since we already store the real schema in the table properties. 
This try-catch
+  // should only be necessary for Spark views which are incompatible 
with Hive
+  client.alterTable(newTable.copy(schema = EMPTY_DATA_SCHEMA))
+  }
 } else {
   val oldTableDef = getRawTable(db, tableDefinition.identifier.table)
 
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
index 7c67f34560e..45668fc683d 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
@@ -385,4 +385,14 @@ class HiveParquetSourceSuite extends 
ParquetPartitioningTest with ParquetTest {
   checkAnswer(spark.table("t"), Row(Row("a", 1)))
 }
   }
+
+  test("Alter view with nested struct") {
+withView("t", "t2") {
+  sql("CREATE OR REPLACE VIEW t AS SELECT " +
+"struct(id AS `$col2`, struct(id AS `$col`) AS s1) AS s2 FROM 
RANGE(5)")
+  sql("ALTER VIEW t SET TBLPROPERTIES ('x' = 'y')")
+  sql("ALTER VIEW t RENAME TO t2")
+  checkAnswer(sql("show TBLPROPERTIES t2 (x)"), Row("x", "y"))
+}
+  }
 }
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
index 006caa02d55..2c5e2956f5f 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
@@ -175,14 +175,6 @@ class HiveCatalogedDDLSuite extends DDLSuite with 
TestHiveSingleton with BeforeA
 withView("v") {
   spark.sql("CREATE VIEW v AS SELECT STRUCT('a' AS `a`, 1 AS b) q")
   checkAnswer(sql("SELECT q.`a`, q.b FROM v"), Row("a", 1) :: Nil)
-
-  checkError(
-exception = intercept[SparkException] {
-  spark.sql("ALTER VIEW v AS SELECT STRUCT('a' AS `$a`, 1 AS b) q")
-},
-errorClass = "CANNOT_RECOGNIZE_HIVE_TYPE",
-parameters = Map("fieldType" -> "\"STRUCT<$A:STRING,B:INT>\"", 
"fieldName" -> "`q`")
-  )
 }
   }
 



[spark] branch master updated: [SPARK-44873] Support alter view with nested columns in Hive client

2023-08-18 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c7d63eac48d [SPARK-44873] Support alter view with nested columns in 
Hive client
c7d63eac48d is described below

commit c7d63eac48d3a81099456360f1c30e6049824749
Author: kylerong-db 
AuthorDate: Fri Aug 18 11:07:21 2023 -0700

[SPARK-44873] Support alter view with nested columns in Hive client

### What changes were proposed in this pull request?
Previously, if a view's schema contains a nested struct, alterTable using 
Hive client will fail. This change supports a view with a nested struct. The 
mechanism is to store an empty schema when we call Hive client, since we 
already store the actual schema in table properties. This fix is similar to 
https://github.com/apache/spark/pull/37364

### Why are the changes needed?
This supports using view with nested structs in Hive metastore.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added unit test.

Closes #42532 from kylerong-db/hive_view.

Authored-by: kylerong-db 
Signed-off-by: Gengliang Wang 
---
 .../scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala | 11 ++-
 .../org/apache/spark/sql/hive/HiveParquetSourceSuite.scala| 10 ++
 .../org/apache/spark/sql/hive/execution/HiveDDLSuite.scala|  8 
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index b1eadea42e0..67b780f13c4 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -595,7 +595,16 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, 
hadoopConf: Configurat
 
 if (tableDefinition.tableType == VIEW) {
   val newTableProps = tableDefinition.properties ++ 
tableMetaToTableProps(tableDefinition).toMap
-  client.alterTable(tableDefinition.copy(properties = newTableProps))
+  val newTable = tableDefinition.copy(properties = newTableProps)
+  try {
+client.alterTable(newTable)
+  } catch {
+case NonFatal(e) =>
+  // If for some reason we fail to store the schema we store it as 
empty there
+  // since we already store the real schema in the table properties. 
This try-catch
+  // should only be necessary for Spark views which are incompatible 
with Hive
+  client.alterTable(newTable.copy(schema = EMPTY_DATA_SCHEMA))
+  }
 } else {
   val oldTableDef = getRawTable(db, tableDefinition.identifier.table)
 
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
index 7c67f34560e..45668fc683d 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
@@ -385,4 +385,14 @@ class HiveParquetSourceSuite extends 
ParquetPartitioningTest with ParquetTest {
   checkAnswer(spark.table("t"), Row(Row("a", 1)))
 }
   }
+
+  test("Alter view with nested struct") {
+withView("t", "t2") {
+  sql("CREATE OR REPLACE VIEW t AS SELECT " +
+"struct(id AS `$col2`, struct(id AS `$col`) AS s1) AS s2 FROM 
RANGE(5)")
+  sql("ALTER VIEW t SET TBLPROPERTIES ('x' = 'y')")
+  sql("ALTER VIEW t RENAME TO t2")
+  checkAnswer(sql("show TBLPROPERTIES t2 (x)"), Row("x", "y"))
+}
+  }
 }
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
index 1c46b558708..201ba5ea6a1 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
@@ -175,14 +175,6 @@ class HiveCatalogedDDLSuite extends DDLSuite with 
TestHiveSingleton with BeforeA
 withView("v") {
   spark.sql("CREATE VIEW v AS SELECT STRUCT('a' AS `a`, 1 AS b) q")
   checkAnswer(sql("SELECT q.`a`, q.b FROM v"), Row("a", 1) :: Nil)
-
-  checkError(
-exception = intercept[SparkException] {
-  spark.sql("ALTER VIEW v AS SELECT STRUCT('a' AS `$a`, 1 AS b) q")
-},
-errorClass = "CANNOT_RECOGNIZE_HIVE_TYPE",
-parameters = Map("fieldType" -> "\"STRUCT<$A:STRING,B:INT>\"", 
"fieldName" -> "`q`")
-  )
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For 

[spark] branch branch-3.3 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 69ca57c7d1c [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
69ca57c7d1c is described below

commit 69ca57c7d1c40a7b25a05df863ced8afff3a2c8f
Author: Kent Yao 
AuthorDate: Sat Aug 19 02:03:29 2023 +0800

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit 8fb799d47bbd5d5ce9db35283d08ab1a31dc37b9)
Signed-off-by: Kent Yao 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index e21a39a6881..1982549707f 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -372,7 +372,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(asf_jira, issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -381,6 +381,19 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
+def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
+"""
+Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
+The original one has an issue that it will search users again 

[spark] branch branch-3.4 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new faba1ff5ef6 [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
faba1ff5ef6 is described below

commit faba1ff5ef6a3ff6eb296870c915efd7f63d1e54
Author: Kent Yao 
AuthorDate: Sat Aug 19 02:03:29 2023 +0800

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit 8fb799d47bbd5d5ce9db35283d08ab1a31dc37b9)
Signed-off-by: Kent Yao 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 250348cf761..bc6f47603b4 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -372,7 +372,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(asf_jira, issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -381,6 +381,19 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
+def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
+"""
+Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
+The original one has an issue that it will search users again 

[spark] branch branch-3.5 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new b24dbc995b7 [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
b24dbc995b7 is described below

commit b24dbc995b713974754b5429136ba6527d6b
Author: Kent Yao 
AuthorDate: Sat Aug 19 02:03:29 2023 +0800

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit 8fb799d47bbd5d5ce9db35283d08ab1a31dc37b9)
Signed-off-by: Kent Yao 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 6d86e918310..94cf3ac262c 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -373,7 +373,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(asf_jira, issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -382,6 +382,19 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
+def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
+"""
+Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
+The original one has an issue that it will search users again 

[spark] branch master updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8fb799d47bb [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
8fb799d47bb is described below

commit 8fb799d47bbd5d5ce9db35283d08ab1a31dc37b9
Author: Kent Yao 
AuthorDate: Sat Aug 19 02:03:29 2023 +0800

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index b75b848f4d2..fd4c34d2fa7 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -394,7 +394,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(asf_jira, issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -403,6 +403,19 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
+def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
+"""
+Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
+The original one has an issue that it will search users again and only 
choose the assignee
+from 20 candidates. If it's unmatched, it picks the head blindly. In our 

[spark] branch SPARK-44813 deleted (was d980909e74d)

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch SPARK-44813
in repository https://gitbox.apache.org/repos/asf/spark.git


 was d980909e74d fix

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 02/02: fix

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch SPARK-44813
in repository https://gitbox.apache.org/repos/asf/spark.git

commit d980909e74da572584ab12930ea250b045ad609a
Author: Kent Yao 
AuthorDate: Sat Aug 19 01:59:31 2023 +0800

fix
---
 dev/merge_spark_pr.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index b5978a49a95..fd4c34d2fa7 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -394,7 +394,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-assign_issue(issue.key, assignee.name)
+assign_issue(asf_jira, issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch SPARK-44813 created (now d980909e74d)

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch SPARK-44813
in repository https://gitbox.apache.org/repos/asf/spark.git


  at d980909e74d fix

This branch includes the following new commits:

 new 995e2c691c9 Merge branch 'master' into SPARK-44813
 new d980909e74d fix

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/02: Merge branch 'master' into SPARK-44813

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch SPARK-44813
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 995e2c691c9623577d0c32ee68371cf776d019c5
Merge: 5b6855bbccf 5d267fe79e0
Author: Kent Yao 
AuthorDate: Sat Aug 19 01:58:54 2023 +0800

Merge branch 'master' into SPARK-44813

 .../sql/connect/planner/SparkConnectPlanner.scala  |  37 +++-
 .../planner/StreamingForeachBatchHelper.scala  | 109 +--
 .../spark/sql/connect/service/SessionHolder.scala  |   9 +-
 .../service/SparkConnectStreamingQueryCache.scala  |  21 ++-
 .../planner/StreamingForeachBatchHelperSuite.scala |  80 
 .../spark/api/python/PythonWorkerFactory.scala |   2 +-
 .../spark/api/python/StreamingPythonRunner.scala   |   9 +-
 .../apache/spark/deploy/worker/ui/LogPage.scala|   1 +
 dev/merge_spark_pr.py  |  10 +-
 docs/sql-ref-identifier-clause.md  | 106 +++
 docs/sql-ref-syntax-dml-insert-table.md|  42 -
 docs/sql-ref.md|   1 +
 docs/structured-streaming-programming-guide.md |   4 +-
 pom.xml|   7 +
 python/docs/source/conf.py |   2 +
 python/pyspark/pandas/groupby.py   |   6 +-
 python/pyspark/pandas/indexes/datetimes.py |  20 +-
 .../pyspark/pandas/tests/computation/test_cov.py   |  51 +-
 .../pandas/tests/data_type_ops/test_date_ops.py|  20 +-
 .../pyspark/pandas/tests/groupby/test_aggregate.py |  12 +-
 .../pandas/tests/groupby/test_apply_func.py|  11 +-
 .../pyspark/pandas/tests/groupby/test_groupby.py   |   7 +-
 python/pyspark/pandas/tests/indexes/test_base.py   |  24 ---
 .../pyspark/pandas/tests/indexes/test_category.py  |  26 +--
 .../pyspark/pandas/tests/indexes/test_datetime.py  |   5 -
 .../pyspark/pandas/tests/indexes/test_reindex.py   |   9 +-
 python/pyspark/pandas/tests/series/test_as_type.py |  34 ++--
 python/pyspark/pandas/tests/test_categorical.py|  17 +-
 python/pyspark/sql/connect/client/artifact.py  |  17 +-
 python/pyspark/sql/connect/client/core.py  |   4 +-
 python/pyspark/sql/dataframe.py| 202 +
 .../sql-tests/analyzer-results/udtf/udtf.sql.out   |  61 +++
 .../test/resources/sql-tests/inputs/udtf/udtf.sql  |  18 ++
 .../resources/sql-tests/results/udtf/udtf.sql.out  |  85 +
 .../apache/spark/sql/IntegratedUDFTestUtils.scala  |  40 
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  28 +++
 .../sql/execution/python/PythonUDTFSuite.scala |  17 +-
 .../thriftserver/ThriftServerQueryTestSuite.scala  |   2 +
 38 files changed, 876 insertions(+), 280 deletions(-)



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.3 updated: Revert "[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again"

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 0ef14199fd3 Revert "[SPARK-44813][INFRA] The Jira Python misses our 
assignee when it searches users again"
0ef14199fd3 is described below

commit 0ef14199fd32df09fe183746ee9e69a92f7d1944
Author: Kent Yao 
AuthorDate: Sat Aug 19 01:57:03 2023 +0800

Revert "[SPARK-44813][INFRA] The Jira Python misses our assignee when it 
searches users again"

This reverts commit 7e7c41bf1007ca05ffc3d818d34d75570d234a6d.
---
 dev/merge_spark_pr.py | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 8555abe9bd0..e21a39a6881 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -372,7 +372,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-assign_issue(issue.key, assignee.name)
+asf_jira.assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -381,19 +381,6 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
-def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
-"""
-Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
-The original one has an issue that it will search users again and only 
choose the assignee
-from 20 candidates. If it's unmatched, it picks the head blindly. In our 
case, the assignee
-is already resolved.
-"""
-url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
-payload = {"name": assignee}
-getattr(client, "_session").put(url, data=json.dumps(payload))
-return True
-
-
 def resolve_jira_issues(title, merge_branches, comment):
 jira_ids = re.findall("SPARK-[0-9]{4,5}", title)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: Revert "[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again"

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 1cfa0806080 Revert "[SPARK-44813][INFRA] The Jira Python misses our 
assignee when it searches users again"
1cfa0806080 is described below

commit 1cfa080608058b3743472c4b145124512d84cd43
Author: Kent Yao 
AuthorDate: Sat Aug 19 01:56:07 2023 +0800

Revert "[SPARK-44813][INFRA] The Jira Python misses our assignee when it 
searches users again"

This reverts commit 3c5e57d886b81808370353781bfce2b2ce20a473.
---
 dev/merge_spark_pr.py | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 65d24ea5b78..250348cf761 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -372,7 +372,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-assign_issue(issue.key, assignee.name)
+asf_jira.assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -381,19 +381,6 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
-def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
-"""
-Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
-The original one has an issue that it will search users again and only 
choose the assignee
-from 20 candidates. If it's unmatched, it picks the head blindly. In our 
case, the assignee
-is already resolved.
-"""
-url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
-payload = {"name": assignee}
-getattr(client, "_session").put(url, data=json.dumps(payload))
-return True
-
-
 def resolve_jira_issues(title, merge_branches, comment):
 jira_ids = re.findall("SPARK-[0-9]{4,5}", title)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: Revert "[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again"

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 7864697a7b1 Revert "[SPARK-44813][INFRA] The Jira Python misses our 
assignee when it searches users again"
7864697a7b1 is described below

commit 7864697a7b1c7acb892d3107703750aa772b3298
Author: Kent Yao 
AuthorDate: Sat Aug 19 01:54:42 2023 +0800

Revert "[SPARK-44813][INFRA] The Jira Python misses our assignee when it 
searches users again"

This reverts commit f7dd0a95727259ff4b7a2f849798f8a93cf78b69.
---
 dev/merge_spark_pr.py | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 6af3945bc57..6d86e918310 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -373,7 +373,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-assign_issue(issue.key, assignee.name)
+asf_jira.assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -382,19 +382,6 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
-def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
-"""
-Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
-The original one has an issue that it will search users again and only 
choose the assignee
-from 20 candidates. If it's unmatched, it picks the head blindly. In our 
case, the assignee
-is already resolved.
-"""
-url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
-payload = {"name": assignee}
-getattr(client, "_session").put(url, data=json.dumps(payload))
-return True
-
-
 def resolve_jira_issues(title, merge_branches, comment):
 jira_ids = re.findall("SPARK-[0-9]{4,5}", title)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (54dd18b5e09 -> 5d267fe79e0)

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 54dd18b5e09 [SPARK-44875][INFRA] Fix spelling for commentator to test 
SPARK-44813
 add 5d267fe79e0 Revert "[SPARK-44813][INFRA] The Jira Python misses our 
assignee when it searches users again"

No new revisions were added by this update.

Summary of changes:
 dev/merge_spark_pr.py | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-44875][INFRA] Fix spelling for commentator to test SPARK-44813

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 62b4846bb77 [SPARK-44875][INFRA] Fix spelling for commentator to test 
SPARK-44813
62b4846bb77 is described below

commit 62b4846bb779201cb12d17b9385b97602ec137c8
Author: Kent Yao 
AuthorDate: Sat Aug 19 01:48:37 2023 +0800

[SPARK-44875][INFRA] Fix spelling for commentator to test SPARK-44813

### What changes were proposed in this pull request?

Fix a typo to verify SPARK-44813

### Why are the changes needed?

Fix a typo and verify SPARK-44813

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #42561 from yaooqinn/SPARK-44875.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit 54dd18b5e0953df37e5f0937f1f79e65db70b787)
Signed-off-by: Kent Yao 
---
 dev/merge_spark_pr.py | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 8a5b6ebe8ef..65d24ea5b78 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -343,13 +343,13 @@ def resolve_jira_issue(merge_branches, comment, 
default_jira_id=""):
 def choose_jira_assignee(issue, asf_jira):
 """
 Prompt the user to choose who to assign the issue to in jira, given a list 
of candidates,
-including the original reporter and all commentors
+including the original reporter and all commentators
 """
 while True:
 try:
 reporter = issue.fields.reporter
-commentors = list(map(lambda x: x.author, 
issue.fields.comment.comments))
-candidates = set(commentors)
+commentators = list(map(lambda x: x.author, 
issue.fields.comment.comments))
+candidates = set(commentators)
 candidates.add(reporter)
 candidates = list(candidates)
 print("JIRA is unassigned, choose assignee")
@@ -357,8 +357,8 @@ def choose_jira_assignee(issue, asf_jira):
 if author.key == "apachespark":
 continue
 annotations = ["Reporter"] if author == reporter else []
-if author in commentors:
-annotations.append("Commentor")
+if author in commentators:
+annotations.append("Commentator")
 print("[%d] %s (%s)" % (idx, author.displayName, 
",".join(annotations)))
 raw_assignee = input(
 "Enter number of user, or userid, to assign to (blank to leave 
unassigned):"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44875][INFRA] Fix spelling for commentator to test SPARK-44813

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 7ffb4a11597 [SPARK-44875][INFRA] Fix spelling for commentator to test 
SPARK-44813
7ffb4a11597 is described below

commit 7ffb4a11597e0d2624830dc5ac044ce6c41835f8
Author: Kent Yao 
AuthorDate: Sat Aug 19 01:48:37 2023 +0800

[SPARK-44875][INFRA] Fix spelling for commentator to test SPARK-44813

### What changes were proposed in this pull request?

Fix a typo to verify SPARK-44813

### Why are the changes needed?

Fix a typo and verify SPARK-44813

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #42561 from yaooqinn/SPARK-44875.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit 54dd18b5e0953df37e5f0937f1f79e65db70b787)
Signed-off-by: Kent Yao 
---
 dev/merge_spark_pr.py | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 37488557fea..6af3945bc57 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -344,13 +344,13 @@ def resolve_jira_issue(merge_branches, comment, 
default_jira_id=""):
 def choose_jira_assignee(issue, asf_jira):
 """
 Prompt the user to choose who to assign the issue to in jira, given a list 
of candidates,
-including the original reporter and all commentors
+including the original reporter and all commentators
 """
 while True:
 try:
 reporter = issue.fields.reporter
-commentors = list(map(lambda x: x.author, 
issue.fields.comment.comments))
-candidates = set(commentors)
+commentators = list(map(lambda x: x.author, 
issue.fields.comment.comments))
+candidates = set(commentators)
 candidates.add(reporter)
 candidates = list(candidates)
 print("JIRA is unassigned, choose assignee")
@@ -358,8 +358,8 @@ def choose_jira_assignee(issue, asf_jira):
 if author.key == "apachespark":
 continue
 annotations = ["Reporter"] if author == reporter else []
-if author in commentors:
-annotations.append("Commentor")
+if author in commentators:
+annotations.append("Commentator")
 print("[%d] %s (%s)" % (idx, author.displayName, 
",".join(annotations)))
 raw_assignee = input(
 "Enter number of user, or userid, to assign to (blank to leave 
unassigned):"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44875][INFRA] Fix spelling for commentator to test SPARK-44813

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 54dd18b5e09 [SPARK-44875][INFRA] Fix spelling for commentator to test 
SPARK-44813
54dd18b5e09 is described below

commit 54dd18b5e0953df37e5f0937f1f79e65db70b787
Author: Kent Yao 
AuthorDate: Sat Aug 19 01:48:37 2023 +0800

[SPARK-44875][INFRA] Fix spelling for commentator to test SPARK-44813

### What changes were proposed in this pull request?

Fix a typo to verify SPARK-44813

### Why are the changes needed?

Fix a typo and verify SPARK-44813

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #42561 from yaooqinn/SPARK-44875.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 dev/merge_spark_pr.py | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 213798e5a1a..b5978a49a95 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -365,13 +365,13 @@ def resolve_jira_issue(merge_branches, comment, 
default_jira_id=""):
 def choose_jira_assignee(issue, asf_jira):
 """
 Prompt the user to choose who to assign the issue to in jira, given a list 
of candidates,
-including the original reporter and all commentors
+including the original reporter and all commentators
 """
 while True:
 try:
 reporter = issue.fields.reporter
-commentors = list(map(lambda x: x.author, 
issue.fields.comment.comments))
-candidates = set(commentors)
+commentators = list(map(lambda x: x.author, 
issue.fields.comment.comments))
+candidates = set(commentators)
 candidates.add(reporter)
 candidates = list(candidates)
 print("JIRA is unassigned, choose assignee")
@@ -379,8 +379,8 @@ def choose_jira_assignee(issue, asf_jira):
 if author.key == "apachespark":
 continue
 annotations = ["Reporter"] if author == reporter else []
-if author in commentors:
-annotations.append("Commentor")
+if author in commentators:
+annotations.append("Commentator")
 print("[%d] %s (%s)" % (idx, author.displayName, 
",".join(annotations)))
 raw_assignee = input(
 "Enter number of user, or userid, to assign to (blank to leave 
unassigned):"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44433][3.5X] Terminate foreach batch runner when streaming query terminates

2023-08-18 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new c380da1357b [SPARK-44433][3.5X] Terminate foreach batch runner when 
streaming query terminates
c380da1357b is described below

commit c380da1357b20f55c8e80a515fc024e1b3b380cc
Author: Raghu Angadi 
AuthorDate: Fri Aug 18 10:39:42 2023 -0700

[SPARK-44433][3.5X] Terminate foreach batch runner when streaming query 
terminates

[This is 3.5x port of #42460 in master. It resolves couple of conflicts. ]

This terminates Python worker created for `foreachBatch` when the streaming 
query terminate. All of the tracking is done inside connect server (inside 
`StreamingForeachBatchHelper`). How this works:

* (A) The helper class returns a cleaner (an `AutoCloseable`) to connect 
server when foreachBatch function is set up (happens before starting the query).
* (B) If the query fails to start, server directly invokes the cleaner.
* (C) If the query starts up, the server registers the cleaner with 
`streamingRunnerCleanerCache` in the `SessionHolder`.
* (D) The cache keeps a mapping of query to cleaner.
* It registers a streaming listener (only once per session), which invokes 
the cleaner when a query terminates.
* There is also finally cleanup when SessionHolder expires.

This ensures Python process created for a streaming query is properly 
terminated when a query terminates.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Unit tests are added for `CleanerCache`
- Existing unit tests for foreachBatch.
- Manual test to verify python process is terminated in different cases.
- Unit tests don't really verify that the process is terminated. There will 
be a follow up PR to verify this.

Closes #42555 from rangadi/pr-terminate-3.5x.

Authored-by: Raghu Angadi 
Signed-off-by: Gengliang Wang 
---
 .../sql/connect/planner/SparkConnectPlanner.scala  |  37 +--
 .../planner/StreamingForeachBatchHelper.scala  | 109 ++---
 .../spark/sql/connect/service/SessionHolder.scala  |   9 +-
 .../service/SparkConnectStreamingQueryCache.scala  |  21 ++--
 .../planner/StreamingForeachBatchHelperSuite.scala |  80 +++
 .../spark/api/python/StreamingPythonRunner.scala   |   9 +-
 6 files changed, 230 insertions(+), 35 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index f3e87b7067d..5120073e2f0 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.connect.planner
 import scala.collection.JavaConverters._
 import scala.collection.mutable
 import scala.util.Try
+import scala.util.control.NonFatal
 
 import com.google.common.base.Throwables
 import com.google.common.collect.{Lists, Maps}
@@ -2853,11 +2854,17 @@ class SparkConnectPlanner(val sessionHolder: 
SessionHolder) extends Logging {
   }
 }
 
+// This is filled when a foreach batch runner started for Python.
+var foreachBatchRunnerCleaner: 
Option[StreamingForeachBatchHelper.RunnerCleaner] = None
+
 if (writeOp.hasForeachBatch) {
   val foreachBatchFn = writeOp.getForeachBatch.getFunctionCase match {
 case StreamingForeachFunction.FunctionCase.PYTHON_FUNCTION =>
   val pythonFn = 
transformPythonFunction(writeOp.getForeachBatch.getPythonFunction)
-  StreamingForeachBatchHelper.pythonForeachBatchWrapper(pythonFn, 
sessionHolder)
+  val (fn, cleaner) =
+StreamingForeachBatchHelper.pythonForeachBatchWrapper(pythonFn, 
sessionHolder)
+  foreachBatchRunnerCleaner = Some(cleaner)
+  fn
 
 case StreamingForeachFunction.FunctionCase.SCALA_FUNCTION =>
   val scalaFn = 
Utils.deserialize[StreamingForeachBatchHelper.ForeachBatchFnType](
@@ -2872,16 +2879,26 @@ class SparkConnectPlanner(val sessionHolder: 
SessionHolder) extends Logging {
   writer.foreachBatch(foreachBatchFn)
 }
 
-val query = writeOp.getPath match {
-  case "" if writeOp.hasTableName => writer.toTable(writeOp.getTableName)
-  case "" => writer.start()
-  case path => writer.start(path)
-}
+val query =
+  try {
+writeOp.getPath match {
+  case "" if writeOp.hasTableName => 
writer.toTable(writeOp.getTableName)
+  case "" => writer.start()
+  case path => writer.start(path)
+}
+  } catch {
+

[spark] branch master updated: [SPARK-44869][DOC] Add doc for insert by name statement

2023-08-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fd8436ae785 [SPARK-44869][DOC] Add doc for insert by name statement
fd8436ae785 is described below

commit fd8436ae785ac91373624d0ef46d94b85dcc094f
Author: Jia Fan 
AuthorDate: Sat Aug 19 01:14:34 2023 +0800

[SPARK-44869][DOC] Add doc for insert by name statement

### What changes were proposed in this pull request?
Add `INSERT BY NAME` statement to the document.

### Why are the changes needed?
Add `INSERT BY NAME` to the doc, so user can easy to know this feature.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
unnecessary.

Closes #42558 from Hisoka-X/SPARK-44869_insert_by_name_doc.

Authored-by: Jia Fan 
Signed-off-by: Kent Yao 
---
 docs/sql-ref-syntax-dml-insert-table.md | 42 -
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/docs/sql-ref-syntax-dml-insert-table.md 
b/docs/sql-ref-syntax-dml-insert-table.md
index ea8a2789cae..6ca062e0817 100644
--- a/docs/sql-ref-syntax-dml-insert-table.md
+++ b/docs/sql-ref-syntax-dml-insert-table.md
@@ -26,7 +26,7 @@ The `INSERT` statement inserts new rows into a table or 
overwrites the existing
 ### Syntax
 
 ```sql
-INSERT [ INTO | OVERWRITE ] [ TABLE ] table_identifier [ partition_spec ] [ ( 
column_list ) ]
+INSERT [ INTO | OVERWRITE ] [ TABLE ] table_identifier [ partition_spec ] [ ( 
column_list ) | [BY NAME] ]
 { VALUES ( { value | NULL } [ , ... ] ) [ , ( ... ) ] | query }
 
 INSERT INTO [ TABLE ] table_identifier REPLACE WHERE boolean_expression query
@@ -318,6 +318,46 @@ SELECT * FROM students;
 +-+--+--+
 ```
 
+# Insert By Name Using a SELECT Statement
+
+```sql
+-- Assuming the persons table has already been created and populated.
+SELECT * FROM persons;
++-+--+-+
+| name|   address|  ssn|
++-+--+-+
+|Dora Williams|134 Forest Ave, Menlo Park|123456789|
++-+--+-+
+|  Eddie Davis|   245 Market St, Milpitas|345678901|
++-+--+-+
+
+-- Spark will reorder the fields of the query according to the order of the 
fields in the table,
+-- so don't worry about the field order mismatch
+INSERT INTO students PARTITION (student_id = 22) BY NAME
+SELECT address, name FROM persons WHERE name = "Dora Williams";
+
+SELECT * FROM students;
++-+--+--+
+| name|   address|student_id|
++-+--+--+
+|   Ashua Hill|   456 Erica Ct, Cupertino|11|
++-+--+--+
+|Dora Williams|134 Forest Ave, Menlo Park|22|
++-+--+--+
+
+INSERT OVERWRITE students PARTITION (student_id = 22) BY NAME
+SELECT 'Unknown' as address, name FROM persons WHERE name = "Dora 
Williams";
+
+SELECT * FROM students;
++-+--+--+
+| name|   address|student_id|
++-+--+--+
+|   Ashua Hill|   456 Erica Ct, Cupertino|11|
++-+--+--+
+|Dora Williams|   Unknown|22|
++-+--+--+
+```
+
 # Insert Using a REPLACE WHERE Statement
 
 ```sql


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44852][BUILD] Exclude `junit-jupiter-api` from `curator-test`

2023-08-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1292b566550 [SPARK-44852][BUILD] Exclude `junit-jupiter-api` from 
`curator-test`
1292b566550 is described below

commit 1292b566550235e29b7a81f7c0dc022b33c92c2f
Author: yangjie01 
AuthorDate: Fri Aug 18 09:04:47 2023 -0700

[SPARK-44852][BUILD] Exclude `junit-jupiter-api` from `curator-test`

### What changes were proposed in this pull request?
This PR excludes `junit-jupiter-api` from `curator-test` to avoid maven 
test wrongly using JUnit 5 for testing.

### Why are the changes needed?
[SPARK-44792](https://issues.apache.org/jira/browse/SPARK-44792) Upgrade 
curator to 5.2.0 and `curator-test` depends on `junit-jupiter-api`, but Apache 
Spark currently does not support testing with JUnit5, so it will report the 
following error when executing

```
build/mvn clean install -pl core -am 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
```

```
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[ERROR] TestEngine with ID 'junit-vintage' failed to discover tests
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 

[INFO] Reactor Summary for Spark Project Parent POM 4.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [  
4.631 s]
[INFO] Spark Project Tags . SUCCESS [  
9.044 s]
[INFO] Spark Project Local DB . SUCCESS [ 
12.686 s]
[INFO] Spark Project Common Utils . SUCCESS [ 
12.216 s]
[INFO] Spark Project Networking ... SUCCESS [ 
54.368 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [ 
14.355 s]
[INFO] Spark Project Unsafe ... SUCCESS [ 
12.321 s]
[INFO] Spark Project Launcher . SUCCESS [ 
10.019 s]
[INFO] Spark Project Core . FAILURE [01:06 
min]
[INFO] 

[INFO] BUILD FAILURE
[INFO] 

[INFO] Total time:  03:16 min
[INFO] Finished at: 2023-08-17T23:30:36+08:00
[INFO] 

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.1.2:test (default-test) on 
project spark-core_2.12:
[ERROR]
[ERROR] Please refer to 
/Users/yangjie01/SourceCode/git/spark-mine-12/core/target/surefire-reports for 
the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, 
[date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] There was an error in the forked process
[ERROR] TestEngine with ID 'junit-vintage' failed to discover tests
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: There 
was an error in the forked process
[ERROR] TestEngine with ID 'junit-vintage' failed to discover tests
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:628)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:285)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:250)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1203)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1055)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:871)
[ERROR] at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:370)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:351)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:215)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:171)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:163)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] at 

[spark] branch branch-3.3 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 7e7c41bf100 [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
7e7c41bf100 is described below

commit 7e7c41bf1007ca05ffc3d818d34d75570d234a6d
Author: Kent Yao 
AuthorDate: Fri Aug 18 10:02:43 2023 -0500

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Sean Owen 
(cherry picked from commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72)
Signed-off-by: Sean Owen 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index e21a39a6881..8555abe9bd0 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -372,7 +372,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -381,6 +381,19 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
+def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
+"""
+Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
+The original one has an issue that it will search users again and 

[spark] branch branch-3.4 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 3c5e57d886b [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
3c5e57d886b is described below

commit 3c5e57d886b81808370353781bfce2b2ce20a473
Author: Kent Yao 
AuthorDate: Fri Aug 18 10:02:43 2023 -0500

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Sean Owen 
(cherry picked from commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72)
Signed-off-by: Sean Owen 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 1621432c01c..8a5b6ebe8ef 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -372,7 +372,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -381,6 +381,19 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
+def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
+"""
+Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
+The original one has an issue that it will search users again and 

[spark] branch branch-3.5 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new f7dd0a95727 [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
f7dd0a95727 is described below

commit f7dd0a95727259ff4b7a2f849798f8a93cf78b69
Author: Kent Yao 
AuthorDate: Fri Aug 18 10:02:43 2023 -0500

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Sean Owen 
(cherry picked from commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72)
Signed-off-by: Sean Owen 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index bc51b8af2eb..37488557fea 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -373,7 +373,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -382,6 +382,19 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
+def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
+"""
+Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
+The original one has an issue that it will search users again and 

[spark] branch master updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 00255bc63b1 [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
00255bc63b1 is described below

commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72
Author: Kent Yao 
AuthorDate: Fri Aug 18 10:02:43 2023 -0500

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Sean Owen 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 27d0afe80ed..213798e5a1a 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -394,7 +394,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -403,6 +403,19 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
+def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool:
+"""
+Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
+The original one has an issue that it will search users again and only 
choose the assignee
+from 20 candidates. If it's unmatched, it picks the head blindly. In our 
case, 

[spark] branch master updated: [SPARK-44289][FOLLOWUP] Cleanup doctest

2023-08-18 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 68345e7f5d9 [SPARK-44289][FOLLOWUP] Cleanup doctest
68345e7f5d9 is described below

commit 68345e7f5d9be121d02f8b23e66eeecaeabc0778
Author: itholic 
AuthorDate: Fri Aug 18 20:38:58 2023 +0800

[SPARK-44289][FOLLOWUP] Cleanup doctest

### What changes were proposed in this pull request?

This is followup for https://github.com/apache/spark/pull/42533 to remove 
meaningless import

### Why are the changes needed?

import numpy is not needed for doctest.

### Does this PR introduce _any_ user-facing change?

This impacts to user-facing documents, so it's sort of user-facing cleanup.

### How was this patch tested?

The existing CI should pass

Closes #42552 from itholic/test-followup.

Authored-by: itholic 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/pandas/groupby.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py
index f9d93299e3e..df671d71eec 100644
--- a/python/pyspark/pandas/groupby.py
+++ b/python/pyspark/pandas/groupby.py
@@ -4165,7 +4165,6 @@ class SeriesGroupBy(GroupBy[Series]):
 
 Examples
 
->>> import numpy as np
 >>> df = ps.DataFrame({'A': [1, 2, 2, 3, 3, 3],
 ...'B': [1, 1, 2, 3, 3, np.nan]},
 ...   columns=['A', 'B'])


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44729][PYTHON][DOCS] Add canonical links to the PySpark docs page

2023-08-18 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 366e7412746 [SPARK-44729][PYTHON][DOCS] Add canonical links to the 
PySpark docs page
366e7412746 is described below

commit 366e74127460e0310981b56584658e1a99d0167e
Author: panbingkun 
AuthorDate: Fri Aug 18 20:36:48 2023 +0800

[SPARK-44729][PYTHON][DOCS] Add canonical links to the PySpark docs page

### What changes were proposed in this pull request?
The pr aims to add canonical links to the PySpark docs page.

### Why are the changes needed?
We should add the canonical link to the PySpark docs page 
https://spark.apache.org/docs/latest/api/python/index.html so that the search 
engine can return the latest PySpark docs.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual testing.
```
cd python/docs
make html
```

Closes #42425 from panbingkun/SPARK-44729.

Authored-by: panbingkun 
Signed-off-by: Ruifeng Zheng 
(cherry picked from commit c88ced88af9a502a9e5352e31bb2963506ecb172)
Signed-off-by: Ruifeng Zheng 
---
 python/docs/source/conf.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index 8203a802053..38c331048e7 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -259,6 +259,8 @@ html_use_index = False
 # Output file base name for HTML help builder.
 htmlhelp_basename = 'pysparkdoc'
 
+# The base URL which points to the root of the HTML documentation.
+html_baseurl = 'https://spark.apache.org/docs/latest/api/python'
 
 # -- Options for LaTeX output -
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44729][PYTHON][DOCS] Add canonical links to the PySpark docs page

2023-08-18 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c88ced88af9 [SPARK-44729][PYTHON][DOCS] Add canonical links to the 
PySpark docs page
c88ced88af9 is described below

commit c88ced88af9a502a9e5352e31bb2963506ecb172
Author: panbingkun 
AuthorDate: Fri Aug 18 20:36:48 2023 +0800

[SPARK-44729][PYTHON][DOCS] Add canonical links to the PySpark docs page

### What changes were proposed in this pull request?
The pr aims to add canonical links to the PySpark docs page.

### Why are the changes needed?
We should add the canonical link to the PySpark docs page 
https://spark.apache.org/docs/latest/api/python/index.html so that the search 
engine can return the latest PySpark docs.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual testing.
```
cd python/docs
make html
```

Closes #42425 from panbingkun/SPARK-44729.

Authored-by: panbingkun 
Signed-off-by: Ruifeng Zheng 
---
 python/docs/source/conf.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index 8203a802053..38c331048e7 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -259,6 +259,8 @@ html_use_index = False
 # Output file base name for HTML help builder.
 htmlhelp_basename = 'pysparkdoc'
 
+# The base URL which points to the root of the HTML documentation.
+html_baseurl = 'https://spark.apache.org/docs/latest/api/python'
 
 # -- Options for LaTeX output -
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44740][CONNECT][FOLLOW] Fix metadata values for Artifacts

2023-08-18 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 94ccbf2fbd8 [SPARK-44740][CONNECT][FOLLOW] Fix metadata values for 
Artifacts
94ccbf2fbd8 is described below

commit 94ccbf2fbd898fa0b6ada231032c1d78178317b7
Author: Martin Grund 
AuthorDate: Fri Aug 18 20:30:57 2023 +0800

[SPARK-44740][CONNECT][FOLLOW] Fix metadata values for Artifacts

### What changes were proposed in this pull request?
This is a followup for a previous fix where we did not properly propagate 
the metadata from the main client into the dependent stubs.

### Why are the changes needed?
compatibility

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UT

Closes #42537 from grundprinzip/spark-44740-follow.

Authored-by: Martin Grund 
Signed-off-by: Ruifeng Zheng 
(cherry picked from commit b37daf5695e59ef2f29c6e084230ac89153cca26)
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/connect/client/artifact.py | 17 +
 python/pyspark/sql/connect/client/core.py |  4 +++-
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/sql/connect/client/artifact.py 
b/python/pyspark/sql/connect/client/artifact.py
index cad030e0d5b..c858768ccbf 100644
--- a/python/pyspark/sql/connect/client/artifact.py
+++ b/python/pyspark/sql/connect/client/artifact.py
@@ -25,7 +25,7 @@ import sys
 import os
 import zlib
 from itertools import chain
-from typing import List, Iterable, BinaryIO, Iterator, Optional
+from typing import List, Iterable, BinaryIO, Iterator, Optional, Tuple
 import abc
 from pathlib import Path
 from urllib.parse import urlparse
@@ -162,12 +162,19 @@ class ArtifactManager:
 # https://github.com/grpc/grpc.github.io/issues/371.
 CHUNK_SIZE: int = 32 * 1024
 
-def __init__(self, user_id: Optional[str], session_id: str, channel: 
grpc.Channel):
+def __init__(
+self,
+user_id: Optional[str],
+session_id: str,
+channel: grpc.Channel,
+metadata: Iterable[Tuple[str, str]],
+):
 self._user_context = proto.UserContext()
 if user_id is not None:
 self._user_context.user_id = user_id
 self._stub = grpc_lib.SparkConnectServiceStub(channel)
 self._session_id = session_id
+self._metadata = metadata
 
 def _parse_artifacts(
 self, path_or_uri: str, pyfile: bool, archive: bool, file: bool
@@ -246,7 +253,7 @@ class ArtifactManager:
 self, requests: Iterator[proto.AddArtifactsRequest]
 ) -> proto.AddArtifactsResponse:
 """Separated for the testing purpose."""
-return self._stub.AddArtifacts(requests)
+return self._stub.AddArtifacts(requests, metadata=self._metadata)
 
 def _request_add_artifacts(self, requests: 
Iterator[proto.AddArtifactsRequest]) -> None:
 response: proto.AddArtifactsResponse = 
self._retrieve_responses(requests)
@@ -382,7 +389,9 @@ class ArtifactManager:
 request = proto.ArtifactStatusesRequest(
 user_context=self._user_context, session_id=self._session_id, 
names=[artifactName]
 )
-resp: proto.ArtifactStatusesResponse = 
self._stub.ArtifactStatus(request)
+resp: proto.ArtifactStatusesResponse = self._stub.ArtifactStatus(
+request, metadata=self._metadata
+)
 status = resp.statuses.get(artifactName)
 return status.exists if status is not None else False
 
diff --git a/python/pyspark/sql/connect/client/core.py 
b/python/pyspark/sql/connect/client/core.py
index c2889c10e41..4b8a2348adc 100644
--- a/python/pyspark/sql/connect/client/core.py
+++ b/python/pyspark/sql/connect/client/core.py
@@ -672,7 +672,9 @@ class SparkConnectClient(object):
 self._channel = self._builder.toChannel()
 self._closed = False
 self._stub = grpc_lib.SparkConnectServiceStub(self._channel)
-self._artifact_manager = ArtifactManager(self._user_id, 
self._session_id, self._channel)
+self._artifact_manager = ArtifactManager(
+self._user_id, self._session_id, self._channel, 
self._builder.metadata()
+)
 self._use_reattachable_execute = use_reattachable_execute
 # Configure logging for the SparkConnect client.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44740][CONNECT][FOLLOW] Fix metadata values for Artifacts

2023-08-18 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b37daf5695e [SPARK-44740][CONNECT][FOLLOW] Fix metadata values for 
Artifacts
b37daf5695e is described below

commit b37daf5695e59ef2f29c6e084230ac89153cca26
Author: Martin Grund 
AuthorDate: Fri Aug 18 20:30:57 2023 +0800

[SPARK-44740][CONNECT][FOLLOW] Fix metadata values for Artifacts

### What changes were proposed in this pull request?
This is a followup for a previous fix where we did not properly propagate 
the metadata from the main client into the dependent stubs.

### Why are the changes needed?
compatibility

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UT

Closes #42537 from grundprinzip/spark-44740-follow.

Authored-by: Martin Grund 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/connect/client/artifact.py | 17 +
 python/pyspark/sql/connect/client/core.py |  4 +++-
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/sql/connect/client/artifact.py 
b/python/pyspark/sql/connect/client/artifact.py
index cad030e0d5b..c858768ccbf 100644
--- a/python/pyspark/sql/connect/client/artifact.py
+++ b/python/pyspark/sql/connect/client/artifact.py
@@ -25,7 +25,7 @@ import sys
 import os
 import zlib
 from itertools import chain
-from typing import List, Iterable, BinaryIO, Iterator, Optional
+from typing import List, Iterable, BinaryIO, Iterator, Optional, Tuple
 import abc
 from pathlib import Path
 from urllib.parse import urlparse
@@ -162,12 +162,19 @@ class ArtifactManager:
 # https://github.com/grpc/grpc.github.io/issues/371.
 CHUNK_SIZE: int = 32 * 1024
 
-def __init__(self, user_id: Optional[str], session_id: str, channel: 
grpc.Channel):
+def __init__(
+self,
+user_id: Optional[str],
+session_id: str,
+channel: grpc.Channel,
+metadata: Iterable[Tuple[str, str]],
+):
 self._user_context = proto.UserContext()
 if user_id is not None:
 self._user_context.user_id = user_id
 self._stub = grpc_lib.SparkConnectServiceStub(channel)
 self._session_id = session_id
+self._metadata = metadata
 
 def _parse_artifacts(
 self, path_or_uri: str, pyfile: bool, archive: bool, file: bool
@@ -246,7 +253,7 @@ class ArtifactManager:
 self, requests: Iterator[proto.AddArtifactsRequest]
 ) -> proto.AddArtifactsResponse:
 """Separated for the testing purpose."""
-return self._stub.AddArtifacts(requests)
+return self._stub.AddArtifacts(requests, metadata=self._metadata)
 
 def _request_add_artifacts(self, requests: 
Iterator[proto.AddArtifactsRequest]) -> None:
 response: proto.AddArtifactsResponse = 
self._retrieve_responses(requests)
@@ -382,7 +389,9 @@ class ArtifactManager:
 request = proto.ArtifactStatusesRequest(
 user_context=self._user_context, session_id=self._session_id, 
names=[artifactName]
 )
-resp: proto.ArtifactStatusesResponse = 
self._stub.ArtifactStatus(request)
+resp: proto.ArtifactStatusesResponse = self._stub.ArtifactStatus(
+request, metadata=self._metadata
+)
 status = resp.statuses.get(artifactName)
 return status.exists if status is not None else False
 
diff --git a/python/pyspark/sql/connect/client/core.py 
b/python/pyspark/sql/connect/client/core.py
index 02afe2c50e7..1e439b8c0f6 100644
--- a/python/pyspark/sql/connect/client/core.py
+++ b/python/pyspark/sql/connect/client/core.py
@@ -672,7 +672,9 @@ class SparkConnectClient(object):
 self._channel = self._builder.toChannel()
 self._closed = False
 self._stub = grpc_lib.SparkConnectServiceStub(self._channel)
-self._artifact_manager = ArtifactManager(self._user_id, 
self._session_id, self._channel)
+self._artifact_manager = ArtifactManager(
+self._user_id, self._session_id, self._channel, 
self._builder.metadata()
+)
 self._use_reattachable_execute = use_reattachable_execute
 # Configure logging for the SparkConnect client.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] yaooqinn commented on pull request #472: Add note on generative tooling to developer tools

2023-08-18 Thread via GitHub


yaooqinn commented on PR #472:
URL: https://github.com/apache/spark-website/pull/472#issuecomment-1683705067

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44853][PYTHON][DOCS] Refine docstring of DataFrame.columns property

2023-08-18 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 3f1a0a50452 [SPARK-44853][PYTHON][DOCS] Refine docstring of 
DataFrame.columns property
3f1a0a50452 is described below

commit 3f1a0a504524b52d499e4b428617b43ff49f9d3b
Author: allisonwang-db 
AuthorDate: Fri Aug 18 17:31:20 2023 +0800

[SPARK-44853][PYTHON][DOCS] Refine docstring of DataFrame.columns property

### What changes were proposed in this pull request?

This PR refines the docstring of `df.columns` and adds more examples.

### Why are the changes needed?

To make PySpark documentation better.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

doctest

Closes #42540 from allisonwang-db/spark-44853-refine-df-columns.

Authored-by: allisonwang-db 
Signed-off-by: Ruifeng Zheng 
(cherry picked from commit fc0be7ebace3aaf22954f1311532db5c33f4d8fa)
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/dataframe.py | 62 ++---
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 932c29910bb..03aaee8f2ec 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -2084,7 +2084,10 @@ class DataFrame(PandasMapOpsMixin, 
PandasConversionMixin):
 
 @property
 def columns(self) -> List[str]:
-"""Returns all column names as a list.
+"""
+Retrieves the names of all columns in the :class:`DataFrame` as a list.
+
+The order of the column names in the list reflects their order in the 
DataFrame.
 
 .. versionadded:: 1.3.0
 
@@ -2094,14 +2097,65 @@ class DataFrame(PandasMapOpsMixin, 
PandasConversionMixin):
 Returns
 ---
 list
-List of column names.
+List of column names in the DataFrame.
 
 Examples
 
+Example 1: Retrieve column names of a DataFrame
+
 >>> df = spark.createDataFrame(
-... [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
+... [(14, "Tom", "CA"), (23, "Alice", "NY"), (16, "Bob", "TX")],
+... ["age", "name", "state"]
+... )
 >>> df.columns
-['age', 'name']
+['age', 'name', 'state']
+
+Example 2: Using column names to project specific columns
+
+>>> selected_cols = [col for col in df.columns if col != "age"]
+>>> df.select(selected_cols).show()
++-+-+
+| name|state|
++-+-+
+|  Tom|   CA|
+|Alice|   NY|
+|  Bob|   TX|
++-+-+
+
+Example 3: Checking if a specific column exists in a DataFrame
+
+>>> "state" in df.columns
+True
+>>> "salary" in df.columns
+False
+
+Example 4: Iterating over columns to apply a transformation
+
+>>> import pyspark.sql.functions as f
+>>> for col_name in df.columns:
+... df = df.withColumn(col_name, f.upper(f.col(col_name)))
+>>> df.show()
++---+-+-+
+|age| name|state|
++---+-+-+
+| 14|  TOM|   CA|
+| 23|ALICE|   NY|
+| 16|  BOB|   TX|
++---+-+-+
+
+Example 5: Renaming columns and checking the updated column names
+
+>>> df = df.withColumnRenamed("name", "first_name")
+>>> df.columns
+['age', 'first_name', 'state']
+
+Example 6: Using the `columns` property to ensure two DataFrames have 
the
+same columns before a union
+
+>>> df2 = spark.createDataFrame(
+... [(30, "Eve", "FL"), (40, "Sam", "WA")], ["age", "name", 
"location"])
+>>> df.columns == df2.columns
+False
 """
 return [f.name for f in self.schema.fields]
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44853][PYTHON][DOCS] Refine docstring of DataFrame.columns property

2023-08-18 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fc0be7ebace [SPARK-44853][PYTHON][DOCS] Refine docstring of 
DataFrame.columns property
fc0be7ebace is described below

commit fc0be7ebace3aaf22954f1311532db5c33f4d8fa
Author: allisonwang-db 
AuthorDate: Fri Aug 18 17:31:20 2023 +0800

[SPARK-44853][PYTHON][DOCS] Refine docstring of DataFrame.columns property

### What changes were proposed in this pull request?

This PR refines the docstring of `df.columns` and adds more examples.

### Why are the changes needed?

To make PySpark documentation better.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

doctest

Closes #42540 from allisonwang-db/spark-44853-refine-df-columns.

Authored-by: allisonwang-db 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/dataframe.py | 62 ++---
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 932c29910bb..03aaee8f2ec 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -2084,7 +2084,10 @@ class DataFrame(PandasMapOpsMixin, 
PandasConversionMixin):
 
 @property
 def columns(self) -> List[str]:
-"""Returns all column names as a list.
+"""
+Retrieves the names of all columns in the :class:`DataFrame` as a list.
+
+The order of the column names in the list reflects their order in the 
DataFrame.
 
 .. versionadded:: 1.3.0
 
@@ -2094,14 +2097,65 @@ class DataFrame(PandasMapOpsMixin, 
PandasConversionMixin):
 Returns
 ---
 list
-List of column names.
+List of column names in the DataFrame.
 
 Examples
 
+Example 1: Retrieve column names of a DataFrame
+
 >>> df = spark.createDataFrame(
-... [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
+... [(14, "Tom", "CA"), (23, "Alice", "NY"), (16, "Bob", "TX")],
+... ["age", "name", "state"]
+... )
 >>> df.columns
-['age', 'name']
+['age', 'name', 'state']
+
+Example 2: Using column names to project specific columns
+
+>>> selected_cols = [col for col in df.columns if col != "age"]
+>>> df.select(selected_cols).show()
++-+-+
+| name|state|
++-+-+
+|  Tom|   CA|
+|Alice|   NY|
+|  Bob|   TX|
++-+-+
+
+Example 3: Checking if a specific column exists in a DataFrame
+
+>>> "state" in df.columns
+True
+>>> "salary" in df.columns
+False
+
+Example 4: Iterating over columns to apply a transformation
+
+>>> import pyspark.sql.functions as f
+>>> for col_name in df.columns:
+... df = df.withColumn(col_name, f.upper(f.col(col_name)))
+>>> df.show()
++---+-+-+
+|age| name|state|
++---+-+-+
+| 14|  TOM|   CA|
+| 23|ALICE|   NY|
+| 16|  BOB|   TX|
++---+-+-+
+
+Example 5: Renaming columns and checking the updated column names
+
+>>> df = df.withColumnRenamed("name", "first_name")
+>>> df.columns
+['age', 'first_name', 'state']
+
+Example 6: Using the `columns` property to ensure two DataFrames have 
the
+same columns before a union
+
+>>> df2 = spark.createDataFrame(
+... [(30, "Eve", "FL"), (40, "Sam", "WA")], ["age", "name", 
"location"])
+>>> df.columns == df2.columns
+False
 """
 return [f.name for f in self.schema.fields]
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] tag v3.5.0-rc2 created (now 010c4a6a05f)

2023-08-18 Thread liyuanjian
This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a change to tag v3.5.0-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 010c4a6a05f (commit)
No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/02: Preparing Spark release v3.5.0-rc2

2023-08-18 Thread liyuanjian
This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 010c4a6a05ff290bec80c12a00cd1bdaed849242
Author: Yuanjian Li 
AuthorDate: Fri Aug 18 08:37:59 2023 +

Preparing Spark release v3.5.0-rc2
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 45 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 66faa8031c4..1c093a4a980 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.5.1
+Version: 3.5.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index d97f724f0b5..a389e3fe9a5 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 1b1a8d0066f..ce180f49ff1 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 54c10a05eed..8da48076a43 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 92bf5bc0785..48e64d21a58 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 3003927e713..2bbacbe71a4 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT

[spark] 02/02: Preparing development version 3.5.1-SNAPSHOT

2023-08-18 Thread liyuanjian
This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 2682cfbf36e04d25a99d1b5b3db0426eb66955d0
Author: Yuanjian Li 
AuthorDate: Fri Aug 18 08:38:03 2023 +

Preparing development version 3.5.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 45 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 1c093a4a980..66faa8031c4 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.5.0
+Version: 3.5.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a389e3fe9a5..d97f724f0b5 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index ce180f49ff1..1b1a8d0066f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 8da48076a43..54c10a05eed 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 48e64d21a58..92bf5bc0785 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 2bbacbe71a4..3003927e713 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0

[spark] branch branch-3.5 updated (41e7234b848 -> 2682cfbf36e)

2023-08-18 Thread liyuanjian
This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from 41e7234b848 [SPARK-44834][PYTHON][SQL][TESTS][FOLLOW-UP] Update the 
analyzer results of the udtf tests
 new 010c4a6a05f Preparing Spark release v3.5.0-rc2
 new 2682cfbf36e Preparing development version 3.5.1-SNAPSHOT

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44834][PYTHON][SQL][TESTS][FOLLOW-UP] Update the analyzer results of the udtf tests

2023-08-18 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 41e7234b848 [SPARK-44834][PYTHON][SQL][TESTS][FOLLOW-UP] Update the 
analyzer results of the udtf tests
41e7234b848 is described below

commit 41e7234b848908afa12a8cc4a319b214a461c12d
Author: allisonwang-db 
AuthorDate: Fri Aug 18 16:31:42 2023 +0800

[SPARK-44834][PYTHON][SQL][TESTS][FOLLOW-UP] Update the analyzer results of 
the udtf tests

### What changes were proposed in this pull request?

This is a follow up for https://github.com/apache/spark/pull/42517.
We need to re-generate the analyzer results for udtf tests after 
https://github.com/apache/spark/pull/42519 is merged. Also updated 
PythonUDTFSuite after https://github.com/apache/spark/pull/42520 is merged.

### Why are the changes needed?

To fix test failures

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes #42543 from allisonwang-db/spark-44834-fix.

Authored-by: allisonwang-db 
Signed-off-by: Yuming Wang 
(cherry picked from commit bb41cd889efdd0602385e70b4c8f1c93740db332)
Signed-off-by: Yuming Wang 
---
 .../sql-tests/analyzer-results/udtf/udtf.sql.out   | 51 --
 .../sql/execution/python/PythonUDTFSuite.scala | 17 +---
 2 files changed, 10 insertions(+), 58 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/udtf/udtf.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/udtf/udtf.sql.out
index acf96794378..b46a1f230a8 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/udtf/udtf.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/udtf/udtf.sql.out
@@ -10,84 +10,49 @@ CreateViewCommand `t1`, VALUES (0, 1), (1, 2) t(c1, c2), 
false, true, LocalTempV
 -- !query
 SELECT * FROM udtf(1, 2)
 -- !query analysis
-Project [x#x, y#x]
-+- Generate TestUDTF(1, 2)#x, false, [x#x, y#x]
-   +- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM udtf(-1, 0)
 -- !query analysis
-Project [x#x, y#x]
-+- Generate TestUDTF(-1, 0)#x, false, [x#x, y#x]
-   +- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM udtf(0, -1)
 -- !query analysis
-Project [x#x, y#x]
-+- Generate TestUDTF(0, -1)#x, false, [x#x, y#x]
-   +- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM udtf(0, 0)
 -- !query analysis
-Project [x#x, y#x]
-+- Generate TestUDTF(0, 0)#x, false, [x#x, y#x]
-   +- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT a, b FROM udtf(1, 2) t(a, b)
 -- !query analysis
-Project [a#x, b#x]
-+- SubqueryAlias t
-   +- Project [x#x AS a#x, y#x AS b#x]
-  +- Generate TestUDTF(1, 2)#x, false, [x#x, y#x]
- +- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM t1, LATERAL udtf(c1, c2)
 -- !query analysis
-Project [c1#x, c2#x, x#x, y#x]
-+- LateralJoin lateral-subquery#x [c1#x && c2#x], Inner
-   :  +- Generate TestUDTF(outer(c1#x), outer(c2#x))#x, false, [x#x, y#x]
-   : +- OneRowRelation
-   +- SubqueryAlias t1
-  +- View (`t1`, [c1#x,c2#x])
- +- Project [cast(c1#x as int) AS c1#x, cast(c2#x as int) AS c2#x]
-+- SubqueryAlias t
-   +- LocalRelation [c1#x, c2#x]
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM t1 LEFT JOIN LATERAL udtf(c1, c2)
 -- !query analysis
-Project [c1#x, c2#x, x#x, y#x]
-+- LateralJoin lateral-subquery#x [c1#x && c2#x], LeftOuter
-   :  +- Generate TestUDTF(outer(c1#x), outer(c2#x))#x, false, [x#x, y#x]
-   : +- OneRowRelation
-   +- SubqueryAlias t1
-  +- View (`t1`, [c1#x,c2#x])
- +- Project [cast(c1#x as int) AS c1#x, cast(c2#x as int) AS c2#x]
-+- SubqueryAlias t
-   +- LocalRelation [c1#x, c2#x]
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM udtf(1, 2) t(c1, c2), LATERAL udtf(c1, c2)
 -- !query analysis
-Project [c1#x, c2#x, x#x, y#x]
-+- LateralJoin lateral-subquery#x [c1#x && c2#x], Inner
-   :  +- Generate TestUDTF(outer(c1#x), outer(c2#x))#x, false, [x#x, y#x]
-   : +- OneRowRelation
-   +- SubqueryAlias t
-  +- Project [x#x AS c1#x, y#x AS c2#x]
- +- Generate TestUDTF(1, 2)#x, false, [x#x, y#x]
-+- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala
index 

[spark] branch master updated: [SPARK-44834][PYTHON][SQL][TESTS][FOLLOW-UP] Update the analyzer results of the udtf tests

2023-08-18 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bb41cd889ef [SPARK-44834][PYTHON][SQL][TESTS][FOLLOW-UP] Update the 
analyzer results of the udtf tests
bb41cd889ef is described below

commit bb41cd889efdd0602385e70b4c8f1c93740db332
Author: allisonwang-db 
AuthorDate: Fri Aug 18 16:31:42 2023 +0800

[SPARK-44834][PYTHON][SQL][TESTS][FOLLOW-UP] Update the analyzer results of 
the udtf tests

### What changes were proposed in this pull request?

This is a follow up for https://github.com/apache/spark/pull/42517.
We need to re-generate the analyzer results for udtf tests after 
https://github.com/apache/spark/pull/42519 is merged. Also updated 
PythonUDTFSuite after https://github.com/apache/spark/pull/42520 is merged.

### Why are the changes needed?

To fix test failures

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only change

Closes #42543 from allisonwang-db/spark-44834-fix.

Authored-by: allisonwang-db 
Signed-off-by: Yuming Wang 
---
 .../sql-tests/analyzer-results/udtf/udtf.sql.out   | 51 --
 .../sql/execution/python/PythonUDTFSuite.scala | 17 +---
 2 files changed, 10 insertions(+), 58 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/udtf/udtf.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/udtf/udtf.sql.out
index acf96794378..b46a1f230a8 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/udtf/udtf.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/udtf/udtf.sql.out
@@ -10,84 +10,49 @@ CreateViewCommand `t1`, VALUES (0, 1), (1, 2) t(c1, c2), 
false, true, LocalTempV
 -- !query
 SELECT * FROM udtf(1, 2)
 -- !query analysis
-Project [x#x, y#x]
-+- Generate TestUDTF(1, 2)#x, false, [x#x, y#x]
-   +- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM udtf(-1, 0)
 -- !query analysis
-Project [x#x, y#x]
-+- Generate TestUDTF(-1, 0)#x, false, [x#x, y#x]
-   +- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM udtf(0, -1)
 -- !query analysis
-Project [x#x, y#x]
-+- Generate TestUDTF(0, -1)#x, false, [x#x, y#x]
-   +- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM udtf(0, 0)
 -- !query analysis
-Project [x#x, y#x]
-+- Generate TestUDTF(0, 0)#x, false, [x#x, y#x]
-   +- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT a, b FROM udtf(1, 2) t(a, b)
 -- !query analysis
-Project [a#x, b#x]
-+- SubqueryAlias t
-   +- Project [x#x AS a#x, y#x AS b#x]
-  +- Generate TestUDTF(1, 2)#x, false, [x#x, y#x]
- +- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM t1, LATERAL udtf(c1, c2)
 -- !query analysis
-Project [c1#x, c2#x, x#x, y#x]
-+- LateralJoin lateral-subquery#x [c1#x && c2#x], Inner
-   :  +- Generate TestUDTF(outer(c1#x), outer(c2#x))#x, false, [x#x, y#x]
-   : +- OneRowRelation
-   +- SubqueryAlias t1
-  +- View (`t1`, [c1#x,c2#x])
- +- Project [cast(c1#x as int) AS c1#x, cast(c2#x as int) AS c2#x]
-+- SubqueryAlias t
-   +- LocalRelation [c1#x, c2#x]
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM t1 LEFT JOIN LATERAL udtf(c1, c2)
 -- !query analysis
-Project [c1#x, c2#x, x#x, y#x]
-+- LateralJoin lateral-subquery#x [c1#x && c2#x], LeftOuter
-   :  +- Generate TestUDTF(outer(c1#x), outer(c2#x))#x, false, [x#x, y#x]
-   : +- OneRowRelation
-   +- SubqueryAlias t1
-  +- View (`t1`, [c1#x,c2#x])
- +- Project [cast(c1#x as int) AS c1#x, cast(c2#x as int) AS c2#x]
-+- SubqueryAlias t
-   +- LocalRelation [c1#x, c2#x]
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
 SELECT * FROM udtf(1, 2) t(c1, c2), LATERAL udtf(c1, c2)
 -- !query analysis
-Project [c1#x, c2#x, x#x, y#x]
-+- LateralJoin lateral-subquery#x [c1#x && c2#x], Inner
-   :  +- Generate TestUDTF(outer(c1#x), outer(c2#x))#x, false, [x#x, y#x]
-   : +- OneRowRelation
-   +- SubqueryAlias t
-  +- Project [x#x AS c1#x, y#x AS c2#x]
- +- Generate TestUDTF(1, 2)#x, false, [x#x, y#x]
-+- OneRowRelation
+[Analyzer test output redacted due to nondeterminism]
 
 
 -- !query
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala
index 8abcb0a6ce1..4c17e3f5392 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala
+++