date:20231031

(spark) branch master updated: [SPARK-45704][BUILD] Fix compile warning - using symbols inherited from a superclass shadow symbols defined in an outer scope

2023-10-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e1bc48b729e [SPARK-45704][BUILD] Fix compile warning - using symbols 
inherited from a superclass shadow symbols defined in an outer scope
e1bc48b729e is described below

commit e1bc48b729e40390a4b0f977eec4a9050c7cac77
Author: panbingkun 
AuthorDate: Tue Oct 31 22:02:39 2023 -0700

[SPARK-45704][BUILD] Fix compile warning - using symbols inherited from a 
superclass shadow symbols defined in an outer scope

### What changes were proposed in this pull request?
After upgrade to scala 2.13, when using symbols inherited from a superclass 
shadow symbols defined in an outer scope, the following warning will appear:
```
[error] 
/Users/panbingkun/Developer/spark/spark-community/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala:1315:39:
 reference to child is ambiguous;
[error] it is both defined in the enclosing method apply and inherited in 
the enclosing anonymous class as value child (defined in class IsNull)
[error] In Scala 2, symbols inherited from a superclass shadow symbols 
defined in an outer scope.
[error] Such references are ambiguous in Scala 3. To continue using the 
inherited symbol, write `this.child`.
[error] Or use `-Wconf:msg=legacy-binding:s` to silence this warning. 
[quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other, 
site=org.apache.spark.sql.catalyst.expressions.IsUnknown.apply
[error]   override def sql: String = s"(${child.sql} IS UNKNOWN)"
[error]   ^
```
The pr aims to fix it.

### Why are the changes needed?
Prepare for upgrading to scala 3.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA
- Manually test:
   ```
   build/sbt -Phadoop-3 -Pdocker-integration-tests -Pspark-ganglia-lgpl 
-Pkinesis-asl -Pkubernetes -Phive-thriftserver -Pconnect -Pyarn -Phive 
-Phadoop-cloud -Pvolcano -Pkubernetes-integration-tests Test/package 
streaming-kinesis-asl-assembly/assembly connect/assembly
   ```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43593 from panbingkun/SPARK-45704.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/deploy/client/StandaloneAppClient.scala|  6 +++---
 .../cluster/CoarseGrainedSchedulerBackend.scala  |  2 +-
 .../apache/spark/storage/DiskBlockObjectWriter.scala |  2 +-
 .../executor/CoarseGrainedExecutorBackendSuite.scala | 20 ++--
 pom.xml  |  7 ---
 project/SparkBuild.scala |  5 -
 .../spark/sql/catalyst/expressions/predicates.scala  |  4 ++--
 .../sql/connector/catalog/InMemoryBaseTable.scala|  2 +-
 .../datasources/parquet/ParquetRowConverter.scala| 18 +-
 .../apache/spark/sql/execution/python/RowQueue.scala |  2 +-
 .../spark/sql/internal/BaseSessionStateBuilder.scala |  2 +-
 .../command/AlignAssignmentsSuiteBase.scala  |  2 +-
 .../sql/execution/command/PlanResolutionSuite.scala  |  2 +-
 .../spark/sql/hive/HiveSessionStateBuilder.scala |  2 +-
 14 files changed, 32 insertions(+), 44 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala 
b/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
index a7e4c1fbab2..b0ee6018970 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
@@ -90,7 +90,7 @@ private[spark] class StandaloneAppClient(
 case e: Exception =>
   logWarning("Failed to connect to master", e)
   markDisconnected()
-  stop()
+  this.stop()
   }
 }
 
@@ -168,7 +168,7 @@ private[spark] class StandaloneAppClient(
 
   case ApplicationRemoved(message) =>
 markDead("Master removed our application: %s".format(message))
-stop()
+this.stop()
 
   case ExecutorAdded(id: Int, workerId: String, hostPort: String, cores: 
Int, memory: Int) =>
 val fullId = s"$appId/$id"
@@ -203,7 +203,7 @@ private[spark] class StandaloneAppClient(
 markDead("Application has been stopped.")
 sendToMaster(UnregisterApplication(appId.get))
 context.reply(true)
-stop()
+this.stop()
 
   case r: RequestExecutors =>
 master match {
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

(spark) branch master updated: [SPARK-45734][BUILD] Upgrade commons-io to 2.15.0

2023-10-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0a7023f55dd [SPARK-45734][BUILD] Upgrade commons-io to 2.15.0
0a7023f55dd is described below

commit 0a7023f55ddc43ab9f0802b84b248fbc646a91aa
Author: yangjie01 
AuthorDate: Tue Oct 31 22:01:38 2023 -0700

[SPARK-45734][BUILD] Upgrade commons-io to 2.15.0

### What changes were proposed in this pull request?
This pr upgrade commons-io from 2.14.0 to 2.15.0

### Why are the changes needed?
The updates of `commons-io` 2.15.0 mainly focus on fixing bugs in file and 
stream handling, adding new file and stream handling features, and optimizing 
the performance of file content comparison:

1. Bug fixes: This version fixes multiple bugs, mainly in file and stream 
handling. For example, it fixes the encoding matching issue of 
`XmlStreamReader` (IO-810), the issue that `FileUtils.listFiles` and 
`FileUtils.iterateFiles` methods failed to close their internal streams 
(IO-811), and the issue that `StreamIterator` failed to close its internal 
stream (IO-811). In addition, it also fixes the null pointer exception 
information of `RandomAccessFileMode.create(Path)`, and the issue [...]

2. New features: This version adds some new classes and methods, such as 
`org.apache.commons.io.channels.FileChannels`, 
`RandomAccessFiles#contentEquals(RandomAccessFile, RandomAccessFile)`, 
`RandomAccessFiles#reset(RandomAccessFile)`, and 
`org.apache.commons.io.StreamIterator`. In addition, it also added 
`MessageDigestInputStream` and deprecated `MessageDigestCalculatingInputStream`.

3. Performance optimization: This version optimizes the performance of 
`PathUtils.fileContentEquals(Path, Path, LinkOption[], OpenOption[])`, 
`PathUtils.fileContentEquals(Path, Path)`, and `FileUtils.contentEquals(File, 
File)`. From the release notes, the related performance has improved by about 
60%.

The full release notes as follow:

- https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43592 from LuciferYang/commons-io-215.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index b5e345ecf0e..6fa0f738cf1 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -43,7 +43,7 @@ commons-compiler/3.1.9//commons-compiler-3.1.9.jar
 commons-compress/1.24.0//commons-compress-1.24.0.jar
 commons-crypto/1.1.0//commons-crypto-1.1.0.jar
 commons-dbcp/1.4//commons-dbcp-1.4.jar
-commons-io/2.14.0//commons-io-2.14.0.jar
+commons-io/2.15.0//commons-io-2.15.0.jar
 commons-lang/2.6//commons-lang-2.6.jar
 commons-lang3/3.13.0//commons-lang3-3.13.0.jar
 commons-logging/1.1.3//commons-logging-1.1.3.jar
diff --git a/pom.xml b/pom.xml
index 9f550456cbc..d545c743928 100644
--- a/pom.xml
+++ b/pom.xml
@@ -192,7 +192,7 @@
 3.0.3
 1.16.0
 1.24.0
-2.14.0
+2.15.0
 
 2.6
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` column properly

2023-10-31 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new ed3a74dbed3 [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to 
sort `Duration` column properly
ed3a74dbed3 is described below

commit ed3a74dbed36ee1f180e836e88b84947c3e4198a
Author: Dongjoon Hyun 
AuthorDate: Wed Nov 1 09:55:45 2023 +0800

[SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` 
column properly

### What changes were proposed in this pull request?

This PR aims to fix an UI regression at Apache Spark 3.2.0 caused by 
SPARK-34123.

From Apache Spark **3.2.0** to **3.5.0**, `Spark History Server` cannot 
sort `Duration` column.

After this PR, Spark History Server can sort `Duration` column properly 
like Apache Spark 3.1.3 and before.

### Why are the changes needed?

Before SPARK-34123, Apache Spark had the `title` attribute for sorting.
- https://github.com/apache/spark/pull/31191
```
{{duration}}
```

Without `title`, `title-numeric` doesn't work.

### Does this PR introduce _any_ user-facing change?

No. This is a bug fix.

### How was this patch tested?

Manual test. Please use `Safari Private Browsing ` or `Chrome Incognito` 
mode.

https://github.com/apache/spark/assets/9700541/8c8464d2-c58b-465c-8f98-edab1ec2317d;>

https://github.com/apache/spark/assets/9700541/03e8373d-bda3-4835-90ad-9a45670e853a;>

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43613 from dongjoon-hyun/SPARK-45749.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kent Yao 
(cherry picked from commit f72510ca9e04ae88660346de440b231fc8225698)
Signed-off-by: Kent Yao 
---
 core/src/main/resources/org/apache/spark/ui/static/historypage.js | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/historypage.js 
b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
index b334bceb5a0..68dc8ba316d 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/historypage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
@@ -192,7 +192,12 @@ $(document).ready(function() {
   },
   {name: startedColumnName, data: 'startTime' },
   {name: completedColumnName, data: 'endTime' },
-  {name: durationColumnName, type: "title-numeric", data: 'duration' },
+  {
+name: durationColumnName,
+type: "title-numeric",
+data: 'duration',
+render:  (id, type, row) => `${row.duration}`
+  },
   {name: 'user', data: 'sparkUser' },
   {name: 'lastUpdated', data: 'lastUpdated' },
   {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.3 updated: [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` column properly

2023-10-31 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new b77e7ef2f19 [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to 
sort `Duration` column properly
b77e7ef2f19 is described below

commit b77e7ef2f19b113afa8417ed4caa51e41667fabd
Author: Dongjoon Hyun 
AuthorDate: Wed Nov 1 09:55:45 2023 +0800

[SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` 
column properly

### What changes were proposed in this pull request?

This PR aims to fix an UI regression at Apache Spark 3.2.0 caused by 
SPARK-34123.

From Apache Spark **3.2.0** to **3.5.0**, `Spark History Server` cannot 
sort `Duration` column.

After this PR, Spark History Server can sort `Duration` column properly 
like Apache Spark 3.1.3 and before.

### Why are the changes needed?

Before SPARK-34123, Apache Spark had the `title` attribute for sorting.
- https://github.com/apache/spark/pull/31191
```
{{duration}}
```

Without `title`, `title-numeric` doesn't work.

### Does this PR introduce _any_ user-facing change?

No. This is a bug fix.

### How was this patch tested?

Manual test. Please use `Safari Private Browsing ` or `Chrome Incognito` 
mode.

https://github.com/apache/spark/assets/9700541/8c8464d2-c58b-465c-8f98-edab1ec2317d;>

https://github.com/apache/spark/assets/9700541/03e8373d-bda3-4835-90ad-9a45670e853a;>

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43613 from dongjoon-hyun/SPARK-45749.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kent Yao 
(cherry picked from commit f72510ca9e04ae88660346de440b231fc8225698)
Signed-off-by: Kent Yao 
---
 core/src/main/resources/org/apache/spark/ui/static/historypage.js | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/historypage.js 
b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
index b334bceb5a0..68dc8ba316d 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/historypage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
@@ -192,7 +192,12 @@ $(document).ready(function() {
   },
   {name: startedColumnName, data: 'startTime' },
   {name: completedColumnName, data: 'endTime' },
-  {name: durationColumnName, type: "title-numeric", data: 'duration' },
+  {
+name: durationColumnName,
+type: "title-numeric",
+data: 'duration',
+render:  (id, type, row) => `${row.duration}`
+  },
   {name: 'user', data: 'sparkUser' },
   {name: 'lastUpdated', data: 'lastUpdated' },
   {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` column properly

2023-10-31 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 1cf1c6a3a8a [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to 
sort `Duration` column properly
1cf1c6a3a8a is described below

commit 1cf1c6a3a8a8cffc5048c584ca1cdae149843d42
Author: Dongjoon Hyun 
AuthorDate: Wed Nov 1 09:55:45 2023 +0800

[SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` 
column properly

### What changes were proposed in this pull request?

This PR aims to fix an UI regression at Apache Spark 3.2.0 caused by 
SPARK-34123.

From Apache Spark **3.2.0** to **3.5.0**, `Spark History Server` cannot 
sort `Duration` column.

After this PR, Spark History Server can sort `Duration` column properly 
like Apache Spark 3.1.3 and before.

### Why are the changes needed?

Before SPARK-34123, Apache Spark had the `title` attribute for sorting.
- https://github.com/apache/spark/pull/31191
```
{{duration}}
```

Without `title`, `title-numeric` doesn't work.

### Does this PR introduce _any_ user-facing change?

No. This is a bug fix.

### How was this patch tested?

Manual test. Please use `Safari Private Browsing ` or `Chrome Incognito` 
mode.

https://github.com/apache/spark/assets/9700541/8c8464d2-c58b-465c-8f98-edab1ec2317d;>

https://github.com/apache/spark/assets/9700541/03e8373d-bda3-4835-90ad-9a45670e853a;>

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43613 from dongjoon-hyun/SPARK-45749.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kent Yao 
(cherry picked from commit f72510ca9e04ae88660346de440b231fc8225698)
Signed-off-by: Kent Yao 
---
 core/src/main/resources/org/apache/spark/ui/static/historypage.js | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/historypage.js 
b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
index b334bceb5a0..68dc8ba316d 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/historypage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
@@ -192,7 +192,12 @@ $(document).ready(function() {
   },
   {name: startedColumnName, data: 'startTime' },
   {name: completedColumnName, data: 'endTime' },
-  {name: durationColumnName, type: "title-numeric", data: 'duration' },
+  {
+name: durationColumnName,
+type: "title-numeric",
+data: 'duration',
+render:  (id, type, row) => `${row.duration}`
+  },
   {name: 'user', data: 'sparkUser' },
   {name: 'lastUpdated', data: 'lastUpdated' },
   {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` column properly

2023-10-31 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f72510ca9e0 [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to 
sort `Duration` column properly
f72510ca9e0 is described below

commit f72510ca9e04ae88660346de440b231fc8225698
Author: Dongjoon Hyun 
AuthorDate: Wed Nov 1 09:55:45 2023 +0800

[SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` 
column properly

### What changes were proposed in this pull request?

This PR aims to fix an UI regression at Apache Spark 3.2.0 caused by 
SPARK-34123.

From Apache Spark **3.2.0** to **3.5.0**, `Spark History Server` cannot 
sort `Duration` column.

After this PR, Spark History Server can sort `Duration` column properly 
like Apache Spark 3.1.3 and before.

### Why are the changes needed?

Before SPARK-34123, Apache Spark had the `title` attribute for sorting.
- https://github.com/apache/spark/pull/31191
```
{{duration}}
```

Without `title`, `title-numeric` doesn't work.

### Does this PR introduce _any_ user-facing change?

No. This is a bug fix.

### How was this patch tested?

Manual test. Please use `Safari Private Browsing ` or `Chrome Incognito` 
mode.

https://github.com/apache/spark/assets/9700541/8c8464d2-c58b-465c-8f98-edab1ec2317d;>

https://github.com/apache/spark/assets/9700541/03e8373d-bda3-4835-90ad-9a45670e853a;>

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43613 from dongjoon-hyun/SPARK-45749.

Authored-by: Dongjoon Hyun 
Signed-off-by: Kent Yao 
---
 core/src/main/resources/org/apache/spark/ui/static/historypage.js | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/historypage.js 
b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
index b334bceb5a0..68dc8ba316d 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/historypage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/historypage.js
@@ -192,7 +192,12 @@ $(document).ready(function() {
   },
   {name: startedColumnName, data: 'startTime' },
   {name: completedColumnName, data: 'endTime' },
-  {name: durationColumnName, type: "title-numeric", data: 'duration' },
+  {
+name: durationColumnName,
+type: "title-numeric",
+data: 'duration',
+render:  (id, type, row) => `${row.duration}`
+  },
   {name: 'user', data: 'sparkUser' },
   {name: 'lastUpdated', data: 'lastUpdated' },
   {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45654][PYTHON] Add Python data source write API

2023-10-31 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 14d347c6c85 [SPARK-45654][PYTHON] Add Python data source write API
14d347c6c85 is described below

commit 14d347c6c85172cee904bd3131392839594d5f2f
Author: allisonwang-db 
AuthorDate: Wed Nov 1 09:14:23 2023 +0900

[SPARK-45654][PYTHON] Add Python data source write API

### What changes were proposed in this pull request?

This PR adds Python data source write API and `DataSourceWriter` class 
`datasource.py`.

Here is an overview of writer class:

```python
class DataSourceWriter(ABC):
abstractmethod
def write(self, iterator: Iterator[Row]) -> Any:
...

def commit(self, messages: List[Any]) -> None:
...

def abort(self, messages: List[Any]) -> None:
...
```

### Why are the changes needed?

To support Python data source write.

### Does this PR introduce _any_ user-facing change?

No. This PR alone does not introduce any user-facing change.

### How was this patch tested?

Unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43516 from allisonwang-db/spark-45654-write-api.

Authored-by: allisonwang-db 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/datasource.py   | 99 +-
 python/pyspark/sql/tests/test_python_datasource.py |  2 +
 2 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/datasource.py b/python/pyspark/sql/datasource.py
index 5cda6596b3f..c30a2c8689d 100644
--- a/python/pyspark/sql/datasource.py
+++ b/python/pyspark/sql/datasource.py
@@ -26,9 +26,10 @@ if TYPE_CHECKING:
 from pyspark.sql.session import SparkSession
 
 
-__all__ = ["DataSource", "DataSourceReader", "DataSourceRegistration"]
+__all__ = ["DataSource", "DataSourceReader", "DataSourceWriter", 
"DataSourceRegistration"]
 
 
+@since(4.0)
 class DataSource(ABC):
 """
 A base class for data sources.
@@ -133,7 +134,29 @@ class DataSource(ABC):
 """
 raise NotImplementedError
 
+def writer(self, schema: StructType, saveMode: str) -> "DataSourceWriter":
+"""
+Returns a ``DataSourceWriter`` instance for writing data.
+
+The implementation is required for writable data sources.
+
+Parameters
+--
+schema : StructType
+The schema of the data to be written.
+saveMode : str
+A string identifies the save mode. It can be one of the following:
+`append`, `overwrite`, `error`, `ignore`.
+
+Returns
+---
+writer : DataSourceWriter
+A writer instance for this data source.
+"""
+raise NotImplementedError
+
 
+@since(4.0)
 class DataSourceReader(ABC):
 """
 A base class for data source readers. Data source readers are responsible 
for
@@ -229,6 +252,80 @@ class DataSourceReader(ABC):
 ...
 
 
+@since(4.0)
+class DataSourceWriter(ABC):
+"""
+A base class for data source writers. Data source writers are responsible 
for saving
+the data to the data source.
+"""
+
+@abstractmethod
+def write(self, iterator: Iterator[Row]) -> "WriterCommitMessage":
+"""
+Writes data into the data source.
+
+This method is called once on each executor to write data to the data 
source.
+It accepts an iterator of input data and returns a single row 
representing a
+commit message, or None if there is no commit message.
+
+The driver collects commit messages, if any, from all executors and 
passes them
+to the ``commit`` method if all tasks run successfully. If any task 
fails, the
+``abort`` method will be called with the collected commit messages.
+
+Parameters
+--
+iterator : Iterator[Row]
+An iterator of input data.
+
+Returns
+---
+WriterCommitMessage : a serializable commit message
+"""
+...
+
+def commit(self, messages: List["WriterCommitMessage"]) -> None:
+"""
+Commits this writing job with a list of commit messages.
+
+This method is invoked on the driver when all tasks run successfully. 
The
+commit messages are collected from the ``write`` method call from each 
task,
+and are passed to this method. The implementation should use the 
commit messages
+to commit the writing job to the data source.
+
+Parameters
+--
+messages : List[WriterCommitMessage]
+A list of commit messages.
+"""
+...
+
+def abort(self, messages:

(spark) branch branch-3.5 updated: [SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read

2023-10-31 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 64242bf6a64 [SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read
64242bf6a64 is described below

commit 64242bf6a6425274b83bc1191230437c2d3fbc71
Author: zeruibao 
AuthorDate: Tue Oct 31 16:46:40 2023 -0700

[SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read

### What changes were proposed in this pull request?
Fix slowdown in Avro read. There is a 
https://github.com/apache/spark/pull/42503 that causes the performance 
regression. It seems that `SQLConf.get.getConf(confKey)` is very costly. Move 
it out of `newWriter` function.

### Why are the changes needed?
Need to fix the performance regression of Avro read.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UT test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43606 from zeruibao/SPARK-43380-FIX-SLOWDOWN.

Authored-by: zeruibao 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 45f73bc69655a236323be1bcb2988341d2aa5203)
Signed-off-by: Gengliang Wang 
---
 .../src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
index fe0bd7392b6..ec34d10a5ff 100644
--- 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
+++ 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
@@ -105,6 +105,9 @@ private[sql] class AvroDeserializer(
   s"Cannot convert Avro type $rootAvroType to SQL type 
${rootCatalystType.sql}.", ise)
   }
 
+  private lazy val preventReadingIncorrectType = !SQLConf.get
+.getConf(SQLConf.LEGACY_AVRO_ALLOW_INCOMPATIBLE_SCHEMA)
+
   def deserialize(data: Any): Option[Any] = converter(data)
 
   /**
@@ -122,8 +125,6 @@ private[sql] class AvroDeserializer(
 s"schema is incompatible (avroType = $avroType, sqlType = 
${catalystType.sql})"
 
 val realDataType = SchemaConverters.toSqlType(avroType, 
useStableIdForUnionType).dataType
-val confKey = SQLConf.LEGACY_AVRO_ALLOW_INCOMPATIBLE_SCHEMA
-val preventReadingIncorrectType = !SQLConf.get.getConf(confKey)
 
 (avroType.getType, catalystType) match {
   case (NULL, NullType) => (updater, ordinal, _) =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read

2023-10-31 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 45f73bc6965 [SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read
45f73bc6965 is described below

commit 45f73bc69655a236323be1bcb2988341d2aa5203
Author: zeruibao 
AuthorDate: Tue Oct 31 16:46:40 2023 -0700

[SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read

### What changes were proposed in this pull request?
Fix slowdown in Avro read. There is a 
https://github.com/apache/spark/pull/42503 that causes the performance 
regression. It seems that `SQLConf.get.getConf(confKey)` is very costly. Move 
it out of `newWriter` function.

### Why are the changes needed?
Need to fix the performance regression of Avro read.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing UT test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43606 from zeruibao/SPARK-43380-FIX-SLOWDOWN.

Authored-by: zeruibao 
Signed-off-by: Gengliang Wang 
---
 .../src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
index c04fe820f0b..29b9fdf9dfb 100644
--- 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
+++ 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
@@ -105,6 +105,9 @@ private[sql] class AvroDeserializer(
   s"Cannot convert Avro type $rootAvroType to SQL type 
${rootCatalystType.sql}.", ise)
   }
 
+  private lazy val preventReadingIncorrectType = !SQLConf.get
+.getConf(SQLConf.LEGACY_AVRO_ALLOW_INCOMPATIBLE_SCHEMA)
+
   def deserialize(data: Any): Option[Any] = converter(data)
 
   /**
@@ -122,8 +125,6 @@ private[sql] class AvroDeserializer(
 s"schema is incompatible (avroType = $avroType, sqlType = 
${catalystType.sql})"
 
 val realDataType = SchemaConverters.toSqlType(avroType, 
useStableIdForUnionType).dataType
-val confKey = SQLConf.LEGACY_AVRO_ALLOW_INCOMPATIBLE_SCHEMA
-val preventReadingIncorrectType = !SQLConf.get.getConf(confKey)
 
 (avroType.getType, catalystType) match {
   case (NULL, NullType) => (updater, ordinal, _) =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][DOCS] Fix the variable name in the docs - testing_pyspark.ipynb

2023-10-31 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 285e97465a5 [MINOR][DOCS] Fix the variable name in the docs - 
testing_pyspark.ipynb
285e97465a5 is described below

commit 285e97465a5365c2cc227c9ece32d51d7ebb0483
Author: huciaa 
AuthorDate: Wed Nov 1 08:37:15 2023 +0900

[MINOR][DOCS] Fix the variable name in the docs - testing_pyspark.ipynb

### What changes were proposed in this pull request?

I'm changing a variable name in one of the cells in the documentation 
regarding pyspark testing.

### Why are the changes needed?

Currently the code won't run. This simple fix will make the code in the 
documenatatnion usable.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?
I tested this code loccally with pytest and pyspark 3.5.0

![image](https://github.com/apache/spark/assets/51126637/3a29a495-38d3-492c-b192-3e26af6fc72c)

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43610 from huciaa/fix_typo_in_pystest_doc.

Authored-by: huciaa 
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/getting_started/testing_pyspark.ipynb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/docs/source/getting_started/testing_pyspark.ipynb 
b/python/docs/source/getting_started/testing_pyspark.ipynb
index afccbe4004c..3934c116763 100644
--- a/python/docs/source/getting_started/testing_pyspark.ipynb
+++ b/python/docs/source/getting_started/testing_pyspark.ipynb
@@ -318,7 +318,7 @@
 "   {\"name\": \"Eve   A.\", \"age\": 28}] \n",
 "\n",
 "# Create a Spark DataFrame\n",
-"original_df = spark.createDataFrame(sample_data)\n",
+"original_df = spark_fixture.createDataFrame(sample_data)\n",
 "\n",
 "# Apply the transformation function from before\n",
 "transformed_df = remove_extra_spaces(original_df, \"name\")\n",
@@ -328,7 +328,7 @@
 "{\"name\": \"Bob T.\", \"age\": 35}, \n",
 "{\"name\": \"Eve A.\", \"age\": 28}]\n",
 "\n",
-"expected_df = spark.createDataFrame(expected_data)\n",
+"expected_df = spark_fixture.createDataFrame(expected_data)\n",
 "\n",
 "assertDataFrameEqual(transformed_df, expected_df)"
]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45741][BUILD] Upgrade Netty to 4.1.100.Final

2023-10-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f7e70e047a0 [SPARK-45741][BUILD] Upgrade Netty to 4.1.100.Final
f7e70e047a0 is described below

commit f7e70e047a0a9985226d9d6278cc588e799afae6
Author: Dongjoon Hyun 
AuthorDate: Tue Oct 31 16:27:27 2023 -0700

[SPARK-45741][BUILD] Upgrade Netty to 4.1.100.Final

### What changes were proposed in this pull request?

This PR aims to upgrade `Netty` to 4.1.100.Final.

### Why are the changes needed?

To bring the latest bug fixes
- https://github.com/netty/netty/milestone/280?closed=1
- https://github.com/netty/netty/releases/tag/netty-4.1.100.Final

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43605 from dongjoon-hyun/SPARK-45741.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 36 +--
 pom.xml   |  2 +-
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index a973d4e851d..b5e345ecf0e 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -187,30 +187,30 @@ metrics-jmx/4.2.19//metrics-jmx-4.2.19.jar
 metrics-json/4.2.19//metrics-json-4.2.19.jar
 metrics-jvm/4.2.19//metrics-jvm-4.2.19.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.99.Final//netty-all-4.1.99.Final.jar
-netty-buffer/4.1.99.Final//netty-buffer-4.1.99.Final.jar
-netty-codec-http/4.1.99.Final//netty-codec-http-4.1.99.Final.jar
-netty-codec-http2/4.1.99.Final//netty-codec-http2-4.1.99.Final.jar
-netty-codec-socks/4.1.99.Final//netty-codec-socks-4.1.99.Final.jar
-netty-codec/4.1.99.Final//netty-codec-4.1.99.Final.jar
-netty-common/4.1.99.Final//netty-common-4.1.99.Final.jar
-netty-handler-proxy/4.1.99.Final//netty-handler-proxy-4.1.99.Final.jar
-netty-handler/4.1.99.Final//netty-handler-4.1.99.Final.jar
-netty-resolver/4.1.99.Final//netty-resolver-4.1.99.Final.jar
+netty-all/4.1.100.Final//netty-all-4.1.100.Final.jar
+netty-buffer/4.1.100.Final//netty-buffer-4.1.100.Final.jar
+netty-codec-http/4.1.100.Final//netty-codec-http-4.1.100.Final.jar
+netty-codec-http2/4.1.100.Final//netty-codec-http2-4.1.100.Final.jar
+netty-codec-socks/4.1.100.Final//netty-codec-socks-4.1.100.Final.jar
+netty-codec/4.1.100.Final//netty-codec-4.1.100.Final.jar
+netty-common/4.1.100.Final//netty-common-4.1.100.Final.jar
+netty-handler-proxy/4.1.100.Final//netty-handler-proxy-4.1.100.Final.jar
+netty-handler/4.1.100.Final//netty-handler-4.1.100.Final.jar
+netty-resolver/4.1.100.Final//netty-resolver-4.1.100.Final.jar
 
netty-tcnative-boringssl-static/2.0.61.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-aarch_64.jar
 
netty-tcnative-boringssl-static/2.0.61.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-x86_64.jar
 
netty-tcnative-boringssl-static/2.0.61.Final/osx-aarch_64/netty-tcnative-boringssl-static-2.0.61.Final-osx-aarch_64.jar
 
netty-tcnative-boringssl-static/2.0.61.Final/osx-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-osx-x86_64.jar
 
netty-tcnative-boringssl-static/2.0.61.Final/windows-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-windows-x86_64.jar
 netty-tcnative-classes/2.0.61.Final//netty-tcnative-classes-2.0.61.Final.jar
-netty-transport-classes-epoll/4.1.99.Final//netty-transport-classes-epoll-4.1.99.Final.jar
-netty-transport-classes-kqueue/4.1.99.Final//netty-transport-classes-kqueue-4.1.99.Final.jar
-netty-transport-native-epoll/4.1.99.Final/linux-aarch_64/netty-transport-native-epoll-4.1.99.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.99.Final/linux-x86_64/netty-transport-native-epoll-4.1.99.Final-linux-x86_64.jar
-netty-transport-native-kqueue/4.1.99.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.99.Final-osx-aarch_64.jar
-netty-transport-native-kqueue/4.1.99.Final/osx-x86_64/netty-transport-native-kqueue-4.1.99.Final-osx-x86_64.jar
-netty-transport-native-unix-common/4.1.99.Final//netty-transport-native-unix-common-4.1.99.Final.jar
-netty-transport/4.1.99.Final//netty-transport-4.1.99.Final.jar
+netty-transport-classes-epoll/4.1.100.Final//netty-transport-classes-epoll-4.1.100.Final.jar
+netty-transport-classes-kqueue/4.1.100.Final//netty-transport-classes-kqueue-4.1.100.Final.jar
+netty-transport-native-epoll/4.1.100.Final/linux-aarch_64/netty-transport-native-epoll-4.1.100.Final-linux-aarch_64.jar
+netty-transport-native-epoll/4.1.100.Final/linux-x86_64/netty-transport-native-epoll-4.1.100.Final-linux-x86_64.jar

Re: [PR] Add Matomo analytics to the published documents [spark-website]

2023-10-31 Thread via GitHub



HyukjinKwon closed pull request #485: Add Matomo analytics to the published 
documents
URL: https://github.com/apache/spark-website/pull/485


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Re: [PR] Add Matomo analytics to the published documents [spark-website]

2023-10-31 Thread via GitHub



HyukjinKwon commented on PR #485:
URL: https://github.com/apache/spark-website/pull/485#issuecomment-1788145222

   Merged to asf-site.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45739][PYTHON] Catch IOException instead of EOFException alone for faulthandler

2023-10-31 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 18e07956e94 [SPARK-45739][PYTHON] Catch IOException instead of 
EOFException alone for faulthandler
18e07956e94 is described below

commit 18e07956e9476bf3e9264eb878a25b838feff4a6
Author: Hyukjin Kwon 
AuthorDate: Wed Nov 1 07:33:35 2023 +0900

[SPARK-45739][PYTHON] Catch IOException instead of EOFException alone for 
faulthandler

### What changes were proposed in this pull request?

This PR improves `spark.python.worker.faulthandler.enabled` feature by 
catching `IOException` instead of `EOFException` (narrower).

### Why are the changes needed?

Exceptions such as `java.net.SocketException: Connection reset` can happen 
because the worker unexpectedly die. We should better catch all IO exception 
there.

### Does this PR introduce _any_ user-facing change?

Yes, but only in special cases. When the worker dies unexpectedly during 
its initialization, this can happen.

### How was this patch tested?

I tested this with Spark Connect:

```bash
$ cat <> malformed_daemon.py
import ctypes

from pyspark import daemon
from pyspark import TaskContext

def raise_segfault():
ctypes.string_at(0)

# Throw a segmentation fault during init.
TaskContext._getOrCreate = raise_segfault

if __name__ == '__main__':
daemon.manager()
EOT
```

```bash
./sbin/stop-connect-server.sh$ ./sbin/start-connect-server.sh --conf 
spark.python.daemon.module=malformed_daemon --conf 
spark.python.worker.faulthandler.enabled=true --jars `ls 
connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
```
```bash
./bin/pyspark --remote "sc://localhost:15002"
```

```python
from pyspark.sql.functions import udf
spark.addArtifact("malformed_daemon.py", pyfile=True)
spark.range(1).select(udf(lambda x: x)("id")).collect()
```

**Before**

```
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1710, in 
collect
table, schema = self._to_table()
...
  File "/.../spark/python/pyspark/sql/connect/client/core.py", line 1575, 
in _handle_rpc_error
raise convert_exception(
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkException) Job aborted due to stage failure: Task 8 in 
stage 0.0 failed 1 times, most recent failure: Lost task 8.0 in stage 0.0 (TID 
8) (192.168.123.102 executor driver): java.net.SocketException: Connection reset
at
  ...

java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)

Driver stacktrace:

JVM stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 
in stage 0.0 failed 1 times, most recent failure: Lost task 8.0 in stage 0.0 
(TID 8) (192.168.123.102 executor driver): java.net.SocketException: Connection 
reset
at 
java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394)
at
...

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:833)
```

**After**

```
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1710, in 
collect
table, schema = self._to_table()
...
"/.../spark/python/pyspark/sql/connect/client/core.py", line 1575, in 
_handle_rpc_error
raise convert_exception(
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkException) Job aborted due to stage failure: Task 4 in 
stage 0.0 failed 1 times, most recent failure: Lost task 4.0 in stage 0.0 (TID 
4) (192.168.123.102 executor driver): org.apache.spark.SparkException: Python 
worker exited unexpectedly (crashed): Fatal Python error: Segmentation fault

Current thread 0x7ff85d338700 (most recent call first):
  File "/.../miniconda3/envs/python3.9/lib/python3.9/ctypes/__init__.py", 
line 525 in string_at
  File 
"/private/var/folders/0c/q8y15ybd3tn7sr2_jmbmftr8gp/T/spark-397ac42b-c05b-4f50-a6b8-ede30254edc9/userFiles-fd70c41e-46b9-44ed-b781-f8dea10bcb4a/5ce3da24-912a-4207-af82-5dfc8a845714/malformed_daemon.py",
 line 8 in raise_segfault
  File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 1450 in 
main
  ...
"/.../miniconda3/envs/python3.9/lib/python3.9/runpy.py", line 197 in 
_run_module_as_main

at

(spark) branch master updated (49f9e74973f -> af8907a0873)

2023-10-31 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 49f9e74973f [SPARK-45481][SPARK-45664][SPARK-45711][SQL][FOLLOWUP] 
Avoid magic strings copy from parquet|orc|avro compression codes
 add af8907a0873 [SPARK-45242][SQL][FOLLOWUP] Canonicalize DataFrame ID in 
CollectMetrics

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/plans/logical/basicLogicalOperators.scala | 4 
 .../scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala | 5 +
 2 files changed, 9 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45481][SPARK-45664][SPARK-45711][SQL][FOLLOWUP] Avoid magic strings copy from parquet|orc|avro compression codes

2023-10-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 49f9e74973f [SPARK-45481][SPARK-45664][SPARK-45711][SQL][FOLLOWUP] 
Avoid magic strings copy from parquet|orc|avro compression codes
49f9e74973f is described below

commit 49f9e74973faadeddfab944d822dd3bcd6365c5b
Author: Jiaan Geng 
AuthorDate: Tue Oct 31 11:44:59 2023 -0700

[SPARK-45481][SPARK-45664][SPARK-45711][SQL][FOLLOWUP] Avoid magic strings 
copy from parquet|orc|avro compression codes

### What changes were proposed in this pull request?
This PR follows up https://github.com/apache/spark/pull/43562, 
https://github.com/apache/spark/pull/43528 and 
https://github.com/apache/spark/pull/43308.
The aim of this PR is to avoid magic strings copy from `parquet|orc|avro` 
compression codes.

This PR also simplify some test cases.

### Why are the changes needed?
Avoid magic strings copy from parquet|orc|avro compression codes

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Exists test cases.

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #43604 from beliefer/parquet_orc_avro.

Authored-by: Jiaan Geng 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 29 +++--
 .../execution/datasources/orc/OrcSourceSuite.scala | 36 +-
 .../apache/spark/sql/internal/SQLConfSuite.scala   | 13 
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  2 +-
 4 files changed, 34 insertions(+), 46 deletions(-)

diff --git 
a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala 
b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
index d618c0035fb..f4a88bd0db2 100644
--- a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
+++ b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
@@ -38,6 +38,7 @@ import org.apache.spark.{SPARK_VERSION_SHORT, SparkConf, 
SparkException, SparkUp
 import org.apache.spark.TestUtils.assertExceptionMsg
 import org.apache.spark.sql._
 import org.apache.spark.sql.TestingUDT.IntervalData
+import org.apache.spark.sql.avro.AvroCompressionCodec._
 import org.apache.spark.sql.catalyst.expressions.AttributeReference
 import org.apache.spark.sql.catalyst.plans.logical.Filter
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils
@@ -680,24 +681,18 @@ abstract class AvroSuite
   val zstandardDir = s"$dir/zstandard"
 
   val df = spark.read.format("avro").load(testAvro)
-  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key,
-AvroCompressionCodec.UNCOMPRESSED.lowerCaseName())
+  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key, 
UNCOMPRESSED.lowerCaseName())
   df.write.format("avro").save(uncompressDir)
-  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key,
-AvroCompressionCodec.BZIP2.lowerCaseName())
+  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key, BZIP2.lowerCaseName())
   df.write.format("avro").save(bzip2Dir)
-  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key,
-AvroCompressionCodec.XZ.lowerCaseName())
+  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key, XZ.lowerCaseName())
   df.write.format("avro").save(xzDir)
-  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key,
-AvroCompressionCodec.DEFLATE.lowerCaseName())
+  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key, 
DEFLATE.lowerCaseName())
   spark.conf.set(SQLConf.AVRO_DEFLATE_LEVEL.key, "9")
   df.write.format("avro").save(deflateDir)
-  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key,
-AvroCompressionCodec.SNAPPY.lowerCaseName())
+  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key, 
SNAPPY.lowerCaseName())
   df.write.format("avro").save(snappyDir)
-  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key,
-AvroCompressionCodec.ZSTANDARD.lowerCaseName())
+  spark.conf.set(SQLConf.AVRO_COMPRESSION_CODEC.key, 
ZSTANDARD.lowerCaseName())
   df.write.format("avro").save(zstandardDir)
 
   val uncompressSize = FileUtils.sizeOfDirectory(new File(uncompressDir))
@@ -2132,7 +2127,7 @@ abstract class AvroSuite
 val reader = new DataFileReader(file, new GenericDatumReader[Any]())
 val r = reader.getMetaString("avro.codec")
 r
-  }.map(v => if (v == "null") "uncompressed" else v).headOption
+  }.map(v => if (v == "null") UNCOMPRESSED.lowerCaseName() else 
v).headOption
 }
 def checkCodec(df: DataFrame, dir: String, codec: String): Unit = {
   val subdir = s"$dir/$codec"
@@ -2143,11 +2138,9 @@ abstract class AvroSuite
   val path = dir.toString
   val df =

(spark) branch master updated: [SPARK-45737][SQL] Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake`

2023-10-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 936c1cd22b8 [SPARK-45737][SQL] Remove unnecessary 
`.toArray[InternalRow]` in `SparkPlan#executeTake`
936c1cd22b8 is described below

commit 936c1cd22b8c8a3b6c2050f3cfc37bce5807ba28
Author: yangjie01 
AuthorDate: Tue Oct 31 08:56:21 2023 -0700

[SPARK-45737][SQL] Remove unnecessary `.toArray[InternalRow]` in 
`SparkPlan#executeTake`

### What changes were proposed in this pull request?

https://github.com/apache/spark/blob/8dd3ec87e26969df6fe08f5fddc3f8d6efc2420d/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L535-L559

In the above code, the input parameters of `mutable.Buffer#prependAll` and 
`mutable.Growable#++=` functions are `IterableOnce`

- `mutable.Buffer#prependAll`

```scala
  def prependAll(elems: IterableOnce[A]): this.type = { insertAll(0, 
elems); this }
```

- `mutable.Growable#++=`

```
  `inline` final def ++= (xs: IterableOnce[A]): this.type = addAll(xs)
```

and the type of `rows` is `Iterator[InternalRow]`, which inherits from 
`IterableOnce`

```
val rows = decodeUnsafeRows(res(i)._2)
private def decodeUnsafeRows(bytes: ChunkedByteBuffer): 
Iterator[InternalRow]
```

So there is no need to cast to an `Array` of `InternalRow` anymore.

### Why are the changes needed?
Remove unnecessary `.toArray[InternalRow]`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43599 from LuciferYang/sparkplan.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
index d93a83dec44..c65d1931dd1 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
@@ -536,13 +536,13 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 while (buf.length < n && i < res.length) {
   val rows = decodeUnsafeRows(res(i)._2)
   if (n - buf.length >= res(i)._1) {
-buf.prependAll(rows.toArray[InternalRow])
+buf.prependAll(rows)
   } else {
 val dropUntil = res(i)._1 - (n - buf.length)
 // Same as Iterator.drop but this only takes a long.
 var j: Long = 0L
 while (j < dropUntil) { rows.next(); j += 1L}
-buf.prependAll(rows.toArray[InternalRow])
+buf.prependAll(rows)
   }
   i += 1
 }
@@ -550,9 +550,9 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with 
Logging with Serializ
 while (buf.length < n && i < res.length) {
   val rows = decodeUnsafeRows(res(i)._2)
   if (n - buf.length >= res(i)._1) {
-buf ++= rows.toArray[InternalRow]
+buf ++= rows
   } else {
-buf ++= rows.take(n - buf.length).toArray[InternalRow]
+buf ++= rows.take(n - buf.length)
   }
   i += 1
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45700][SPARK-45702][SPARK-45703] Fix the compile warning suppression rule added in SPARK-35496

2023-10-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9169d93ed5b [SPARK-45700][SPARK-45702][SPARK-45703] Fix the compile 
warning suppression rule added in SPARK-35496
9169d93ed5b is described below

commit 9169d93ed5b8cc53926a44ce497568d36cb275b0
Author: yangjie01 
AuthorDate: Tue Oct 31 08:53:54 2023 -0700

[SPARK-45700][SPARK-45702][SPARK-45703] Fix the compile warning suppression 
rule added in SPARK-35496

### What changes were proposed in this pull request?
This PR fixed the issues corresponding to the compile suppression rules 
added in SPARK-35496 for upgrading to Scala 2.13.7, and cleaned up these rules.

```
// SPARK-35496 Upgrade Scala to 2.13.7 and suppress:
// 1. `The outer reference in this type test cannot be checked at run time`
// 2. `the type test for pattern TypeA cannot be checked at runtime because 
it
//has type parameters eliminated by erasure`
// 3. `abstract type TypeA in type pattern Seq[TypeA] (the underlying of
//Seq[TypeA]) is unchecked since it is eliminated by erasure`
// 4. `fruitless type test: a value of TypeA cannot also be a TypeB`
"-Wconf:cat=unchecked=outer reference:s",
"-Wconf:cat=unchecked=eliminated by erasure:s",
"-Wconf:msg=^(?=.*?a value of type)(?=.*?cannot also be).+$:s",
```

The specific fixes are as follows:
- Added the corresponding `outer reference` to fix issue 1
- Used fine-grained `unchecked` declarations to clean up issues 2 and 3
- Since the case corresponding to issue 4 no longer exists, the 
corresponding suppression rule was directly deleted.

### Why are the changes needed?
Clean up the compile issue suppression rules added in SPARK-35496 for 
upgrading to Scala 2.13.7.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43582 from LuciferYang/SPARK-45702.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/rdd/OrderedRDDFunctions.scala |  2 +-
 .../spark/mllib/evaluation/RankingMetrics.scala|  6 --
 pom.xml| 12 ---
 project/SparkBuild.scala   | 10 -
 .../sql/catalyst/CatalystTypeConverters.scala  |  2 +-
 .../spark/sql/catalyst/plans/QueryPlan.scala   |  8 
 .../apache/spark/sql/catalyst/trees/TreeNode.scala |  2 +-
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   | 24 +++---
 .../thriftserver/ThriftServerQueryTestSuite.scala  |  6 +++---
 9 files changed, 26 insertions(+), 46 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala 
b/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
index 2701f457ee8..965a57da8db 100644
--- a/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala
@@ -97,7 +97,7 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
 def inRange(k: K): Boolean = ordering.gteq(k, lower) && ordering.lteq(k, 
upper)
 
 val rddToFilter: RDD[P] = self.partitioner match {
-  case Some(rp: RangePartitioner[K, V]) =>
+  case Some(rp: RangePartitioner[_, _]) =>
 val partitionIndices = (rp.getPartition(lower), 
rp.getPartition(upper)) match {
   case (l, u) => Math.min(l, u) to Math.max(l, u)
 }
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
index 641f55bb05f..888b07dd4e6 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
@@ -44,8 +44,10 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
 with Serializable {
 
   private val rdd = predictionAndLabels.map {
-case (pred: Array[T], lab: Array[T]) => (pred, lab, Array.empty[Double])
-case (pred: Array[T], lab: Array[T], rel: Array[Double]) => (pred, lab, 
rel)
+case (pred: Array[T] @unchecked, lab: Array[T] @unchecked) =>
+  (pred, lab, Array.empty[Double])
+case (pred: Array[T] @unchecked, lab: Array[T] @unchecked, rel: 
Array[Double]) =>
+  (pred, lab, rel)
 case _ => throw new IllegalArgumentException(s"Expected RDD of tuples or 
triplets")
   }
 
diff --git a/pom.xml b/pom.xml
index a76d8eaddb5..da8e713bd55 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2981,18 +2981,6 @@
 `procedure syntax is deprecated`

(spark) branch master updated (5ba708f8fff -> 0487507c8fe)

2023-10-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5ba708f8fff [SPARK-45725][SQL] Remove the non-default IN subquery 
runtime filter
 add 0487507c8fe [SPARK-45683][CORE] Fix `method any2stringadd in object 
Predef is deprecated`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  8 ++---
 .../spark/sql/connect/client/ArtifactManager.scala |  2 +-
 .../sql/connect/artifact/util/ArtifactUtils.scala  |  2 +-
 .../spark/deploy/client/StandaloneAppClient.scala  |  4 +--
 .../scala/org/apache/spark/rpc/RpcEndpoint.scala   |  4 +--
 .../test/scala/org/apache/spark/FileSuite.scala|  4 +--
 .../apache/spark/deploy/RPackageUtilsSuite.scala   |  2 +-
 .../org/apache/spark/deploy/SparkSubmitSuite.scala |  2 +-
 .../apache/spark/rdd/LocalCheckpointSuite.scala|  2 +-
 .../org/apache/spark/scheduler/PoolSuite.scala |  2 +-
 .../spark/scheduler/SparkListenerSuite.scala   |  4 +--
 .../scala/org/apache/spark/util/UtilsSuite.scala   |  2 +-
 .../org/apache/spark/repl/SingletonReplSuite.scala |  2 +-
 .../network/yarn/YarnShuffleServiceSuite.scala |  2 +-
 .../scala/org/apache/spark/sql/types/Decimal.scala |  2 +-
 .../sql/catalyst/expressions/Expression.scala  | 42 +++---
 .../codegen/GenerateSafeProjection.scala   |  2 +-
 .../expressions/collectionOperations.scala | 14 
 .../spark/sql/catalyst/expressions/hash.scala  |  2 +-
 .../catalyst/expressions/stringExpressions.scala   | 33 +
 .../plans/logical/basicLogicalOperators.scala  |  2 +-
 .../apache/spark/sql/catalyst/trees/TreeNode.scala |  2 +-
 .../apache/spark/sql/catalyst/util/package.scala   |  2 +-
 .../spark/sql/execution/datasources/ddl.scala  |  2 +-
 .../spark/sql/execution/command/DDLSuite.scala |  4 +--
 .../binaryfile/BinaryFileFormatSuite.scala |  2 +-
 .../execution/datasources/orc/OrcFilterSuite.scala |  4 +--
 .../datasources/parquet/ParquetFilterSuite.scala   |  2 +-
 ...cProgressTrackingMicroBatchExecutionSuite.scala | 20 +--
 .../streaming/MicroBatchExecutionSuite.scala   |  8 ++---
 .../state/RocksDBStateStoreIntegrationSuite.scala  |  4 +--
 .../sql/streaming/FileStreamSourceSuite.scala  |  2 +-
 .../spark/sql/hive/HiveSparkSubmitSuite.scala  |  2 +-
 .../apache/spark/sql/hive/StatisticsSuite.scala|  8 ++---
 .../spark/sql/hive/execution/HiveUDFSuite.scala|  4 +--
 .../apache/spark/streaming/dstream/DStream.scala   |  4 +--
 36 files changed, 106 insertions(+), 103 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45725][SQL] Remove the non-default IN subquery runtime filter

2023-10-31 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5ba708f8fff [SPARK-45725][SQL] Remove the non-default IN subquery 
runtime filter
5ba708f8fff is described below

commit 5ba708f8fffd21b675d819b01e53d11d8166dc9f
Author: Wenchen Fan 
AuthorDate: Tue Oct 31 08:49:14 2023 -0700

[SPARK-45725][SQL] Remove the non-default IN subquery runtime filter

### What changes were proposed in this pull request?

The IN subquery runtime filter is useless:
1. for small data (the most common case due to the heuristic we use), bloom 
filter is as selective as the IN subquery, but more performant (hash + mod vs 
value comparison).
2. for big data, IN subquery will likely OOM and runtime filter is much 
more efficient.

This PR removes the IN subquery runtime filter (the default is bloom 
filter) to simplify code.

### Why are the changes needed?

simplify code and tests, and makes Spark simple by removing one knob.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43585 from cloud-fan/filter.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../catalyst/optimizer/InjectRuntimeFilter.scala   |  97 +++
 .../org/apache/spark/sql/internal/SQLConf.scala|  15 +--
 .../spark/sql/InjectRuntimeFilterSuite.scala   | 132 -
 3 files changed, 39 insertions(+), 205 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
index 30526bd8106..5f5508d6b22 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
@@ -26,47 +26,27 @@ import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.catalyst.trees.TreePattern.{INVOKE, 
JSON_TO_STRUCT, LIKE_FAMLIY, PYTHON_UDF, REGEXP_EXTRACT_FAMILY, REGEXP_REPLACE, 
SCALA_UDF}
 import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types._
 
 /**
  * Insert a runtime filter on one side of the join (we call this side the 
application side) if
  * we can extract a runtime filter from the other side (creation side). A 
simple case is that
  * the creation side is a table scan with a selective filter.
- * The runtime filter is logically an IN subquery with the join keys 
(converted to a semi join),
- * but can be something different physically, such as a bloom filter.
+ * The runtime filter is logically an IN subquery with the join keys. 
Currently it's always
+ * bloom filter but we may add other physical implementations in the future.
  */
 object InjectRuntimeFilter extends Rule[LogicalPlan] with PredicateHelper with 
JoinSelectionHelper {
 
-  // Wraps `joinKey` with a hash function if its byte size is larger than an 
integer.
-  private def mayWrapWithHash(joinKey: Expression): Expression = {
-if (joinKey.dataType.defaultSize > IntegerType.defaultSize) {
-  new Murmur3Hash(Seq(joinKey))
-} else {
-  joinKey
-}
-  }
-
   private def injectFilter(
   filterApplicationSideKey: Expression,
   filterApplicationSidePlan: LogicalPlan,
   filterCreationSideKey: Expression,
   filterCreationSidePlan: LogicalPlan): LogicalPlan = {
-require(conf.runtimeFilterBloomFilterEnabled || 
conf.runtimeFilterSemiJoinReductionEnabled)
-if (conf.runtimeFilterBloomFilterEnabled) {
-  injectBloomFilter(
-filterApplicationSideKey,
-filterApplicationSidePlan,
-filterCreationSideKey,
-filterCreationSidePlan
-  )
-} else {
-  injectInSubqueryFilter(
-filterApplicationSideKey,
-filterApplicationSidePlan,
-filterCreationSideKey,
-filterCreationSidePlan
-  )
-}
+injectBloomFilter(
+  filterApplicationSideKey,
+  filterApplicationSidePlan,
+  filterCreationSideKey,
+  filterCreationSidePlan
+)
   }
 
   private def injectBloomFilter(
@@ -95,26 +75,6 @@ object InjectRuntimeFilter extends Rule[LogicalPlan] with 
PredicateHelper with J
 Filter(filter, filterApplicationSidePlan)
   }
 
-  private def injectInSubqueryFilter(
-  filterApplicationSideKey: Expression,
-  filterApplicationSidePlan: LogicalPlan,
-  filterCreationSideKey: Expression,
-  filterCreationSidePlan: LogicalPlan): LogicalPlan = {
-require(filterApplicationSideKey.dataType ==

(spark) branch master updated: [SPARK-45368][SQL] Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

2023-10-31 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 102daf9d149 [SPARK-45368][SQL] Remove scala2.12 compatibility logic 
for DoubleType, FloatType, Decimal
102daf9d149 is described below

commit 102daf9d1490d12b812be4432c77ce102e82c3bb
Author: tangjiafu 
AuthorDate: Tue Oct 31 08:42:46 2023 -0500

[SPARK-45368][SQL] Remove scala2.12 compatibility logic for DoubleType, 
FloatType, Decimal

### What changes were proposed in this pull request?

Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

### Why are the changes needed?

Drop Scala 2.12 and make Scala 2.13 by default
https://issues.apache.org/jira/browse/SPARK-45368

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

test by ci

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43456 from laglangyue/f_SPARK-45368_scala12_dataType.

Lead-authored-by: tangjiafu 
Co-authored-by: laglangyue 
Signed-off-by: Sean Owen 
---
 sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala| 4 +---
 sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala | 5 +
 sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala  | 5 +
 3 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala
index afe73635a68..3ce0508951f 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala
@@ -681,9 +681,7 @@ object Decimal {
 override def toLong(x: Decimal): Long = x.toLong
 override def fromInt(x: Int): Decimal = new Decimal().set(x)
 override def compare(x: Decimal, y: Decimal): Int = x.compare(y)
-// Added from Scala 2.13; don't override to work in 2.12
-// TODO revisit once Scala 2.12 support is dropped
-def parseString(str: String): Option[Decimal] = Try(Decimal(str)).toOption
+override def parseString(str: String): Option[Decimal] = 
Try(Decimal(str)).toOption
   }
 
   /** A [[scala.math.Fractional]] evidence parameter for Decimals. */
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala
index d18c7b98af2..bc0ed725cf2 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala
@@ -42,8 +42,6 @@ class DoubleType private() extends FractionalType {
 @Stable
 case object DoubleType extends DoubleType {
 
-  // Traits below copied from Scala 2.12; not present in 2.13
-  // TODO: SPARK-30011 revisit once Scala 2.12 support is dropped
   trait DoubleIsConflicted extends Numeric[Double] {
 def plus(x: Double, y: Double): Double = x + y
 def minus(x: Double, y: Double): Double = x - y
@@ -56,8 +54,7 @@ case object DoubleType extends DoubleType {
 def toDouble(x: Double): Double = x
 // logic in Numeric base trait mishandles abs(-0.0)
 override def abs(x: Double): Double = math.abs(x)
-// Added from Scala 2.13; don't override to work in 2.12
-def parseString(str: String): Option[Double] =
+override def parseString(str: String): Option[Double] =
   Try(java.lang.Double.parseDouble(str)).toOption
 
   }
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala
index 978384eebfe..8b54f830d48 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala
@@ -43,8 +43,6 @@ class FloatType private() extends FractionalType {
 @Stable
 case object FloatType extends FloatType {
 
-  // Traits below copied from Scala 2.12; not present in 2.13
-  // TODO: SPARK-30011 revisit once Scala 2.12 support is dropped
   trait FloatIsConflicted extends Numeric[Float] {
 def plus(x: Float, y: Float): Float = x + y
 def minus(x: Float, y: Float): Float = x - y
@@ -57,8 +55,7 @@ case object FloatType extends FloatType {
 def toDouble(x: Float): Double = x.toDouble
 // logic in Numeric base trait mishandles abs(-0.0f)
 override def abs(x: Float): Float = math.abs(x)
-// Added from Scala 2.13; don't override to work in 2.12
-def parseString(str: String): Option[Float] =
+override def parseString(str: String): Option[Float] =
   Try(java.lang.Float.parseFloat(str)).toOption
   }
 


-
To unsubscribe, e-mail:

(spark) branch master updated: [SPARK-45732][BUILD] Upgrade commons-text to 1.11.0

2023-10-31 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d38f0745949 [SPARK-45732][BUILD] Upgrade commons-text to 1.11.0
d38f0745949 is described below

commit d38f07459494a99177b0436c1b4a784a8af8cbab
Author: panbingkun 
AuthorDate: Tue Oct 31 20:51:07 2023 +0900

[SPARK-45732][BUILD] Upgrade commons-text to 1.11.0

### What changes were proposed in this pull request?
The pr aims to upgrade `commons-text` from `1.10.0` to `1.11.0`.

### Why are the changes needed?
Release note: 
https://commons.apache.org/proper/commons-text/changes-report.html#a1.11.0
includes some bug fix, eg:
- Fix StringTokenizer.getTokenList to return an independent modifiable 
list. Fixes [TEXT-219](https://issues.apache.org/jira/browse/TEXT-219).
- Fix TextStringBuilder to over-allocate when ensuring capacity #452. Fixes 
[TEXT-228](https://issues.apache.org/jira/browse/TEXT-228).
- TextStringBuidler#hashCode() allocates a String on each call #387.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43590 from panbingkun/SPARK-45732.

Authored-by: panbingkun 
Signed-off-by: Hyukjin Kwon 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 4f7bd8aeff8..a973d4e851d 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -49,7 +49,7 @@ commons-lang3/3.13.0//commons-lang3-3.13.0.jar
 commons-logging/1.1.3//commons-logging-1.1.3.jar
 commons-math3/3.6.1//commons-math3-3.6.1.jar
 commons-pool/1.5.4//commons-pool-1.5.4.jar
-commons-text/1.10.0//commons-text-1.10.0.jar
+commons-text/1.11.0//commons-text-1.11.0.jar
 compress-lzf/1.1.2//compress-lzf-1.1.2.jar
 curator-client/5.2.0//curator-client-5.2.0.jar
 curator-framework/5.2.0//curator-framework-5.2.0.jar
diff --git a/pom.xml b/pom.xml
index b714c3239b8..a76d8eaddb5 100644
--- a/pom.xml
+++ b/pom.xml
@@ -605,7 +605,7 @@
   
 org.apache.commons
 commons-text
-1.10.0
+1.11.0
   
   
 commons-lang


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45713][PYTHON][FOLLOWUP] Fix SparkThrowableSuite for GA

2023-10-31 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8dd3ec87e26 [SPARK-45713][PYTHON][FOLLOWUP] Fix SparkThrowableSuite 
for GA
8dd3ec87e26 is described below

commit 8dd3ec87e26969df6fe08f5fddc3f8d6efc2420d
Author: panbingkun 
AuthorDate: Tue Oct 31 17:41:44 2023 +0900

[SPARK-45713][PYTHON][FOLLOWUP] Fix SparkThrowableSuite for GA

### What changes were proposed in this pull request?
The pr aims to fix SparkThrowableSuite for GA.
After pr: https://github.com/apache/spark/pull/43566, Failed to run 
'SparkThrowableSuite' in GA, eg:
https://github.com/panbingkun/spark/actions/runs/6702704679/job/18212102847
https://github.com/apache/spark/assets/15246973/9538d6c8-1d75-452c-b72c-567f90ae9f14;>

### Why are the changes needed?
Make GA happy.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually test.
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43598 from panbingkun/fix_SparkThrowableSuite.

Authored-by: panbingkun 
Signed-off-by: Hyukjin Kwon 
---
 docs/sql-error-conditions.md | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 7c537f6fe20..1741e49f561 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -431,6 +431,12 @@ For more details see 
[DATATYPE_MISMATCH](sql-error-conditions-datatype-mismatch-
 
 DataType `` requires a length parameter, for example ``(10). 
Please specify the length.
 
+### DATA_SOURCE_ALREADY_EXISTS
+
+[SQLSTATE: 
42710](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
+
+Data source '``' already exists in the registry. Please use a 
different name for the new data source.
+
 ### DATA_SOURCE_NOT_FOUND
 
 [SQLSTATE: 
42K02](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
@@ -1685,7 +1691,7 @@ Protobuf type not yet supported: ``.
 
 [SQLSTATE: 
38000](sql-error-conditions-sqlstates.html#class-38-external-routine-exception)
 
-Failed to plan Python data source `` in Python: ``
+Failed to `` Python data source `` in Python: ``
 
 ### RECURSIVE_PROTOBUF_SCHEMA
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark Connect

2023-10-31 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 05f48d74f77 [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable 
CatalogTests without Spark Connect
05f48d74f77 is described below

commit 05f48d74f77c4eaa20e4f6626649cb9201978bb5
Author: Hyukjin Kwon 
AuthorDate: Tue Oct 31 16:47:30 2023 +0900

[SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark 
Connect

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/39214 that 
restores the original Catalog tests in PySpark. That PR mistakenly disabled the 
tests without Spark Connect:


https://github.com/apache/spark/blob/fc6a5cca06cf15c4a952cb56720f627efdba7cce/python/pyspark/sql/tests/test_catalog.py#L489

### Why are the changes needed?

To restore the test coverage.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Reenabled unittests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43595 from HyukjinKwon/SPARK-45735.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 76d9a70932df97d8ea4cc6e279933dee29a88571)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/tests/test_catalog.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/tests/test_catalog.py 
b/python/pyspark/sql/tests/test_catalog.py
index ae92ce57dc8..204491b6876 100644
--- a/python/pyspark/sql/tests/test_catalog.py
+++ b/python/pyspark/sql/tests/test_catalog.py
@@ -401,7 +401,7 @@ class CatalogTestsMixin:
 self.assertEqual(spark.table("my_tab").count(), 0)
 
 
-class CatalogTests(ReusedSQLTestCase):
+class CatalogTests(CatalogTestsMixin, ReusedSQLTestCase):
 pass
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark Connect

2023-10-31 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 2a7e3fec1c8 [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable 
CatalogTests without Spark Connect
2a7e3fec1c8 is described below

commit 2a7e3fec1c8e3867afb9bdecf7a02d6ba7b36f90
Author: Hyukjin Kwon 
AuthorDate: Tue Oct 31 16:47:30 2023 +0900

[SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark 
Connect

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/39214 that 
restores the original Catalog tests in PySpark. That PR mistakenly disabled the 
tests without Spark Connect:


https://github.com/apache/spark/blob/fc6a5cca06cf15c4a952cb56720f627efdba7cce/python/pyspark/sql/tests/test_catalog.py#L489

### Why are the changes needed?

To restore the test coverage.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Reenabled unittests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43595 from HyukjinKwon/SPARK-45735.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 76d9a70932df97d8ea4cc6e279933dee29a88571)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/tests/test_catalog.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/tests/test_catalog.py 
b/python/pyspark/sql/tests/test_catalog.py
index cafffdc9ae8..b72172a402b 100644
--- a/python/pyspark/sql/tests/test_catalog.py
+++ b/python/pyspark/sql/tests/test_catalog.py
@@ -486,7 +486,7 @@ class CatalogTestsMixin:
 self.assertEqual(spark.table("my_tab").count(), 0)
 
 
-class CatalogTests(ReusedSQLTestCase):
+class CatalogTests(CatalogTestsMixin, ReusedSQLTestCase):
 pass
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark Connect

2023-10-31 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 76d9a70932d [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable 
CatalogTests without Spark Connect
76d9a70932d is described below

commit 76d9a70932df97d8ea4cc6e279933dee29a88571
Author: Hyukjin Kwon 
AuthorDate: Tue Oct 31 16:47:30 2023 +0900

[SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark 
Connect

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/39214 that 
restores the original Catalog tests in PySpark. That PR mistakenly disabled the 
tests without Spark Connect:


https://github.com/apache/spark/blob/fc6a5cca06cf15c4a952cb56720f627efdba7cce/python/pyspark/sql/tests/test_catalog.py#L489

### Why are the changes needed?

To restore the test coverage.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Reenabled unittests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43595 from HyukjinKwon/SPARK-45735.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/tests/test_catalog.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/tests/test_catalog.py 
b/python/pyspark/sql/tests/test_catalog.py
index cafffdc9ae8..b72172a402b 100644
--- a/python/pyspark/sql/tests/test_catalog.py
+++ b/python/pyspark/sql/tests/test_catalog.py
@@ -486,7 +486,7 @@ class CatalogTestsMixin:
 self.assertEqual(spark.table("my_tab").count(), 0)
 
 
-class CatalogTests(ReusedSQLTestCase):
+class CatalogTests(CatalogTestsMixin, ReusedSQLTestCase):
 pass
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45701][SPARK-45684][SPARK-45692][CORE][SQL][SS][ML][K8S] Clean up the deprecated API usage related to `mutable.SetOps/c.SeqOps/Iterator/Iterable/IterableOps`

2023-10-31 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 29eb577cefa 
[SPARK-45701][SPARK-45684][SPARK-45692][CORE][SQL][SS][ML][K8S] Clean up the 
deprecated API usage related to 
`mutable.SetOps/c.SeqOps/Iterator/Iterable/IterableOps`
29eb577cefa is described below

commit 29eb577cefabd954f3de2c284a692790d621d0ba
Author: yangjie01 
AuthorDate: Tue Oct 31 15:07:47 2023 +0800

[SPARK-45701][SPARK-45684][SPARK-45692][CORE][SQL][SS][ML][K8S] Clean up 
the deprecated API usage related to 
`mutable.SetOps/c.SeqOps/Iterator/Iterable/IterableOps`

### What changes were proposed in this pull request?
This pr clean up the deprecated API usage related to `SetOps`:

- `--` -> `diff`
- `-` -> `diff`
- `+` -> `union`
- `retain` -> `filterInPlace`

the changes are refer to

```scala
  deprecated("Consider requiring an immutable Set", "2.13.0")
  def -- (that: IterableOnce[A]): C = {
val toRemove = that.iterator.to(immutable.Set)
fromSpecific(view.filterNot(toRemove))
  }

 deprecated("Consider requiring an immutable Set or fall back to Set.diff", 
"2.13.0")
  def - (elem: A): C = diff(Set(elem))

  deprecated("Consider requiring an immutable Set or fall back to 
Set.union", "2.13.0")
  def + (elem: A): C = fromSpecific(new View.Appended(this, elem))

  deprecated("Use filterInPlace instead", "2.13.0")
  inline final def retain(p: A => Boolean): Unit = filterInPlace(p)
```

This pr also clean up deprecated API usage related to `SeqOps`

- `transform` -> `mapInPlace`
- `reverseMap` -> `.reverseIterator.map(f).to(...)`
- `union` -> `concat`

the changes are refer to

```scala
  deprecated("Use `mapInPlace` on an `IndexedSeq` instead", "2.13.0")
  `inline`final def transform(f: A => A): this.type = {
var i = 0
val siz = size
while (i < siz) { this(i) = f(this(i)); i += 1 }
this
  }

  deprecated("Use .reverseIterator.map(f).to(...) instead of 
.reverseMap(f)", "2.13.0")
  def reverseMap[B](f: A => B): CC[B] = iterableFactory.from(new 
View.Map(View.fromIteratorProvider(() => reverseIterator), f))

  deprecated("Use `concat` instead", "2.13.0")
  inline final def union[B >: A](that: Seq[B]): CC[B] = concat(that)
```

This pr also clean up deprecated API usage related to 
`Iterator/Iterable/IterableOps` refer to

trait Iterable

- `toIterable` -> immutable.ArraySeq.unsafeWrapArray

```scala
  deprecated("toIterable is internal and will be made protected; its name 
is similar to `toList` or `toSeq`, but it doesn't copy non-immutable 
collections", "2.13.7")
  final def toIterable: this.type = this
```

- s.c.Iterator

- `.seq` -> removed

```scala
  deprecated("Iterator.seq always returns the iterator itself", "2.13.0")
  def seq: this.type = this
```

- s.c.IterableOps

- `toTraversable ` -> removed

```
  deprecated("toTraversable is internal and will be made protected; its 
name is similar to `toList` or `toSeq`, but it doesn't copy non-immutable 
collections", "2.13.0")
  final def toTraversable: Traversable[A] = toIterable
```

### Why are the changes needed?
Clean up deprecated Scala API usage

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43575 from LuciferYang/SPARK-45701.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: yangjie01 
---
 .../org/apache/spark/util/ClosureCleaner.scala |  4 +--
 .../scala/org/apache/spark/sql/SparkSession.scala  |  3 ++-
 .../sql/KeyValueGroupedDatasetE2ETestSuite.scala   |  2 +-
 .../connect/client/GrpcExceptionConverter.scala|  3 ++-
 .../org/apache/spark/sql/connect/dsl/package.scala |  4 +--
 .../sql/connect/planner/SparkConnectPlanner.scala  | 26 +-
 .../spark/sql/connect/utils/ErrorUtils.scala   | 31 +++---
 .../spark/storage/BlockReplicationPolicy.scala |  2 +-
 .../collection/ExternalAppendOnlyMapSuite.scala|  6 ++---
 .../ml/classification/LogisticRegression.scala |  2 +-
 .../apache/spark/ml/feature/NormalizerSuite.scala  |  4 +--
 .../spark/ml/feature/StringIndexerSuite.scala  |  4 +--
 .../cluster/k8s/ExecutorPodsAllocator.scala|  2 +-
 .../cluster/k8s/ExecutorPodsLifecycleManager.scala |  2 +-
 .../sql/catalyst/expressions/AttributeSet.scala|  4 +--
 .../catalyst/expressions/stringExpressions.scala   |  2 +-

(spark) branch master updated: [SPARK-38723][SS][TEST][FOLLOWUP] Deflake the newly added test in QueryExecutionErrorsSuite

2023-10-31 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 97ba9d29cd1 [SPARK-38723][SS][TEST][FOLLOWUP] Deflake the newly added 
test in QueryExecutionErrorsSuite
97ba9d29cd1 is described below

commit 97ba9d29cd1ff83755b0d02251d249a625caace5
Author: Wei Liu 
AuthorDate: Tue Oct 31 09:38:50 2023 +0300

[SPARK-38723][SS][TEST][FOLLOWUP] Deflake the newly added test in 
QueryExecutionErrorsSuite

### What changes were proposed in this pull request?

The newly added test in 
https://github.com/apache/spark/commit/7d7afb06f682c10f3900eb8adeab9fad6d49cb24 
could be flaky, this change deflakes it. Details see comments.

### Why are the changes needed?

Deflaky

### Does this PR introduce _any_ user-facing change?

Test only change

### How was this patch tested?

Test only change

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43565 from WweiL/SPARK-38723-followup.

Authored-by: Wei Liu 
Signed-off-by: Max Gekk 
---
 .../spark/sql/errors/QueryExecutionErrorsSuite.scala | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
index 945dd782da0..dd3f3dc6004 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
@@ -22,6 +22,8 @@ import java.net.{URI, URL}
 import java.sql.{Connection, DatabaseMetaData, Driver, DriverManager, 
PreparedStatement, ResultSet, ResultSetMetaData}
 import java.util.{Locale, Properties, ServiceConfigurationError}
 
+import scala.jdk.CollectionConverters._
+
 import org.apache.hadoop.fs.{LocalFileSystem, Path}
 import org.apache.hadoop.fs.permission.FsPermission
 import org.mockito.Mockito.{mock, spy, when}
@@ -910,15 +912,15 @@ class QueryExecutionErrorsSuite
   }
   exception
 }
-  assert(exceptions.map(e => e.isDefined).reduceLeft(_ || _))
-  exceptions.map { e =>
-if (e.isDefined) {
-  checkError(
-e.get,
-errorClass = "CONCURRENT_QUERY",
-sqlState = Some("0A000")
-  )
-}
+  // Only check if errors exist to deflake. We couldn't guarantee that
+  // the above 50 runs must hit this error.
+  exceptions.flatten.map { e =>
+checkError(
+  e,
+  errorClass = "CONCURRENT_QUERY",
+  sqlState = Some("0A000"),
+  parameters = e.getMessageParameters.asScala.toMap
+)
   }
   spark.streams.active.foreach(_.stop())
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45704][BUILD] Fix compile warning - using symbols inherited from a superclass shadow symbols defined in an outer scope

(spark) branch master updated: [SPARK-45734][BUILD] Upgrade commons-io to 2.15.0

(spark) branch branch-3.4 updated: [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` column properly

(spark) branch branch-3.3 updated: [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` column properly

(spark) branch branch-3.5 updated: [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` column properly

(spark) branch master updated: [SPARK-45749][CORE][WEBUI] Fix `Spark History Server` to sort `Duration` column properly

(spark) branch master updated: [SPARK-45654][PYTHON] Add Python data source write API

(spark) branch branch-3.5 updated: [SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read

(spark) branch master updated: [SPARK-43380][SQL][FOLLOW-UP] Fix slowdown in Avro read

(spark) branch master updated: [MINOR][DOCS] Fix the variable name in the docs - testing_pyspark.ipynb

(spark) branch master updated: [SPARK-45741][BUILD] Upgrade Netty to 4.1.100.Final

Re: [PR] Add Matomo analytics to the published documents [spark-website]

Re: [PR] Add Matomo analytics to the published documents [spark-website]

(spark) branch master updated: [SPARK-45739][PYTHON] Catch IOException instead of EOFException alone for faulthandler

(spark) branch master updated (49f9e74973f -> af8907a0873)

(spark) branch master updated: [SPARK-45481][SPARK-45664][SPARK-45711][SQL][FOLLOWUP] Avoid magic strings copy from parquet|orc|avro compression codes

(spark) branch master updated: [SPARK-45737][SQL] Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake`

(spark) branch master updated: [SPARK-45700][SPARK-45702][SPARK-45703] Fix the compile warning suppression rule added in SPARK-35496

(spark) branch master updated (5ba708f8fff -> 0487507c8fe)

(spark) branch master updated: [SPARK-45725][SQL] Remove the non-default IN subquery runtime filter

(spark) branch master updated: [SPARK-45368][SQL] Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

(spark) branch master updated: [SPARK-45732][BUILD] Upgrade commons-text to 1.11.0

(spark) branch master updated: [SPARK-45713][PYTHON][FOLLOWUP] Fix SparkThrowableSuite for GA

(spark) branch branch-3.4 updated: [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark Connect

(spark) branch branch-3.5 updated: [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark Connect

(spark) branch master updated: [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark Connect

(spark) branch master updated: [SPARK-45701][SPARK-45684][SPARK-45692][CORE][SQL][SS][ML][K8S] Clean up the deprecated API usage related to `mutable.SetOps/c.SeqOps/Iterator/Iterable/IterableOps`

(spark) branch master updated: [SPARK-38723][SS][TEST][FOLLOWUP] Deflake the newly added test in QueryExecutionErrorsSuite

28 matches

Site Navigation

Mail list logo

Footer information