date:20230912

[spark] branch master updated (ef89b278f8e -> c119b8aec87)

2023-09-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ef89b278f8e [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile
 add c119b8aec87 [SPARK-45133][CONNECT] Make Spark Connect queries be 
FINISHED when last result task is finished

No new revisions were added by this update.

Summary of changes:
 .../sql/connect/execution/SparkConnectPlanExecution.scala | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

2023-09-12 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ef89b278f8e [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile
ef89b278f8e is described below

commit ef89b278f8e04b459cd7539fd16754d6cdc77a2d
Author: Ruifeng Zheng 
AuthorDate: Wed Sep 13 10:19:12 2023 +0800

[SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

### What changes were proposed in this pull request?
follow the [Best practices for writing 
Dockerfiles](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#apt-get)
 :

> Always combine RUN apt-get update with apt-get install in the same RUN 
statement.

### Why are the changes needed?
1, to address 
https://github.com/apache/spark/pull/42253#discussion_r1280479837
2, when I attempted to change the apt-get install in 
https://github.com/apache/spark/pull/41918, the behavior was confusing. By 
following the best practices, further changes should work immediately.

### Does this PR introduce _any_ user-facing change?
NO, dev-only

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #42842 from zhengruifeng/infra_docker_file_opt.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 dev/infra/Dockerfile | 50 +++---
 1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index b69e682f239..60204dcc49e 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -24,19 +24,44 @@ ENV FULL_REFRESH_DATE 20221118
 ENV DEBIAN_FRONTEND noninteractive
 ENV DEBCONF_NONINTERACTIVE_SEEN true
 
-ARG APT_INSTALL="apt-get install --no-install-recommends -y"
-
-RUN apt-get clean
-RUN apt-get update
-RUN $APT_INSTALL software-properties-common git libxml2-dev pkg-config curl 
wget openjdk-8-jdk libpython3-dev python3-pip python3-setuptools python3.8 
python3.9
-RUN update-alternatives --set java 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
+RUN apt-get update && apt-get install -y \
+software-properties-common \
+git \
+pkg-config  \
+curl  \
+wget  \
+openjdk-8-jdk  \
+gfortran  \
+libopenblas-dev  \
+liblapack-dev \
+build-essential \
+gnupg \
+ca-certificates  \
+pandoc \
+libpython3-dev  \
+python3-pip  \
+python3-setuptools  \
+python3.8  \
+python3.9 \
+r-base  \
+libcurl4-openssl-dev  \
+qpdf  \
+zlib1g-dev \
+libssl-dev  \
+libpng-dev \
+libharfbuzz-dev \
+libfribidi-dev \
+libtiff5-dev \
+libgit2-dev \
+libxml2-dev  \
+libjpeg-dev \
+libfontconfig1-dev \
+libfreetype6-dev \
+&& rm -rf /var/lib/apt/lists/*
 
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
 
 RUN add-apt-repository ppa:pypy/ppa
-RUN apt update
-RUN $APT_INSTALL gfortran libopenblas-dev liblapack-dev
-RUN $APT_INSTALL build-essential
 
 RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 curl -sqL 
https://downloads.python.org/pypy/pypy3.8-v7.3.11-linux64.tar.bz2 | tar xjf - 
-C /usr/local/pypy/pypy3.8 --strip-components=1 && \
@@ -45,19 +70,14 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
 
-RUN $APT_INSTALL gnupg ca-certificates pandoc
 RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' >> 
/etc/apt/sources.list
 RUN gpg --keyserver hkps://keyserver.ubuntu.com --recv-key 
E298A3A825C0D65DFD57CBB651716619E084DAB9
 RUN gpg -a --export E084DAB9 | apt-key add -
 RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu 
focal-cran40/'
-RUN apt update
-RUN $APT_INSTALL r-base libcurl4-openssl-dev qpdf libssl-dev zlib1g-dev
+
 RUN Rscript -e "install.packages(c('knitr', 'markdown', 'rmarkdown', 
'testthat', 'devtools', 'e1071', 'survival', 'arrow', 'roxygen2', 'xml2'), 
repos='https://cloud.r-project.org/')"
 
 # See more in SPARK-39959, roxygen2 < 7.2.1
-RUN apt-get install -y libcurl4-openssl-dev libgit2-dev libssl-dev libxml2-dev 
\
-  libfontconfig1-dev libharfbuzz-dev libfribidi-dev libfreetype6-dev 
libpng-dev \
-  libtiff5-dev libjpeg-dev
 RUN Rscript -e "install.packages(c('devtools'), 
repos='https://cloud.r-project.org/')"
 RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', 
repos='https://cloud.r-project.org')"
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45131][PYTHON][DOCS] Refine docstring of `ceil/ceiling/floor/round/bround`

2023-09-12 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 74c78970cd9 [SPARK-45131][PYTHON][DOCS] Refine docstring of 
`ceil/ceiling/floor/round/bround`
74c78970cd9 is described below

commit 74c78970cd9e99aa750713574bf175fd1efac7c3
Author: panbingkun 
AuthorDate: Wed Sep 13 10:17:42 2023 +0800

[SPARK-45131][PYTHON][DOCS] Refine docstring of 
`ceil/ceiling/floor/round/bround`

### What changes were proposed in this pull request?
This pr aims to refine docstring of `ceil/ceiling/floor/round/bround`.

### Why are the changes needed?
To improve PySpark documentation.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42892 from panbingkun/SPARK-45131.

Authored-by: panbingkun 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/functions.py | 48 +
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index d3ad7cfc84e..2d4194c98e9 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1631,19 +1631,21 @@ def ceil(col: "ColumnOrName", scale: 
Optional[Union[Column, int]] = None) -> Col
 Parameters
 --
 col : :class:`~pyspark.sql.Column` or str
-target column to compute on.
+The target column or column name to compute the ceiling on.
 scale : :class:`~pyspark.sql.Column` or int
-an optional parameter to control the rounding behavior.
+An optional parameter to control the rounding behavior.
 
 .. versionadded:: 4.0.0
 
 Returns
 ---
 :class:`~pyspark.sql.Column`
-the column for computed results.
+A column for the computed results.
 
 Examples
 
+Example 1: Compute the ceiling of a column value
+
 >>> from pyspark.sql import functions as sf
 >>> spark.range(1).select(sf.ceil(sf.lit(-0.1))).show()
 +--+
@@ -1652,6 +1654,8 @@ def ceil(col: "ColumnOrName", scale: 
Optional[Union[Column, int]] = None) -> Col
 | 0|
 +--+
 
+Example 2: Compute the ceiling of a column value with a specified scale
+
 >>> from pyspark.sql import functions as sf
 >>> spark.range(1).select(sf.ceil(sf.lit(-0.1), 1)).show()
 +-+
@@ -1680,19 +1684,21 @@ def ceiling(col: "ColumnOrName", scale: 
Optional[Union[Column, int]] = None) ->
 Parameters
 --
 col : :class:`~pyspark.sql.Column` or str
-target column to compute on.
+The target column or column name to compute the ceiling on.
 scale : :class:`~pyspark.sql.Column` or int
-an optional parameter to control the rounding behavior.
+An optional parameter to control the rounding behavior.
 
 .. versionadded:: 4.0.0
 
 Returns
 ---
 :class:`~pyspark.sql.Column`
-the column for computed results.
+A column for the computed results.
 
 Examples
 
+Example 1: Compute the ceiling of a column value
+
 >>> from pyspark.sql import functions as sf
 >>> spark.range(1).select(sf.ceiling(sf.lit(-0.1))).show()
 +-+
@@ -1701,6 +1707,8 @@ def ceiling(col: "ColumnOrName", scale: 
Optional[Union[Column, int]] = None) ->
 |0|
 +-+
 
+Example 2: Compute the ceiling of a column value with a specified scale
+
 >>> from pyspark.sql import functions as sf
 >>> spark.range(1).select(sf.ceiling(sf.lit(-0.1), 1)).show()
 ++
@@ -1928,9 +1936,9 @@ def floor(col: "ColumnOrName", scale: 
Optional[Union[Column, int]] = None) -> Co
 Parameters
 --
 col : :class:`~pyspark.sql.Column` or str
-column to find floor for.
+The target column or column name to compute the floor on.
 scale : :class:`~pyspark.sql.Column` or int
-an optional parameter to control the rounding behavior.
+An optional parameter to control the rounding behavior.
 
 .. versionadded:: 4.0.0
 
@@ -1942,6 +1950,8 @@ def floor(col: "ColumnOrName", scale: 
Optional[Union[Column, int]] = None) -> Co
 
 Examples
 
+Example 1: Compute the floor of a column value
+
 >>> import pyspark.sql.functions as sf
 >>> spark.range(1).select(sf.floor(sf.lit(2.5))).show()
 +--+
@@ -1950,6 +1960,8 @@ def floor(col: "ColumnOrName", scale: 
Optional[Union[Column, int]] = None) -> Co
 | 2|
 +--+
 
+Example 2: Compute the floor of a column value with a specified scale
+
 >>> import

svn commit: r63958 - /dev/spark/v3.5.0-rc5-bin/ /release/spark/spark-3.5.0/

2023-09-12 Thread gengliang

Author: gengliang
Date: Wed Sep 13 02:07:27 2023
New Revision: 63958

Log:
Apache Spark 3.5.0

Added:
release/spark/spark-3.5.0/
  - copied from r63957, dev/spark/v3.5.0-rc5-bin/
Removed:
dev/spark/v3.5.0-rc5-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45120][UI] Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in UI

2023-09-12 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4221235a6b2 [SPARK-45120][UI] Upgrade d3 from v3 to v7(v7.8.5) and 
apply api changes in UI
4221235a6b2 is described below

commit 4221235a6b2742a42e63f143a62191f94ae05ed8
Author: Kent Yao 
AuthorDate: Wed Sep 13 09:50:54 2023 +0800

[SPARK-45120][UI] Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in 
UI

### What changes were proposed in this pull request?

This PR upgrades d3 from v3 to v7(v7.8.5).

This PR also applies the API changes to our UI drawing. The related changes 
are listed below:

- d3.svg.line -> d3.line
- ~~d3.transfrom~~
- d3.layout.histogram -> d3.histogram() -> d3.bin()
  - d.x -> d.x0
  - d.dx -> d.x1 - d.x0
  - d.y -> d.length
- d3.scale.linear() -> d3.scaleLinear
- d3.svg.axis and axis.orient -> d3.axisTop, d3.axisRight, d3.axisBottom, 
d3.axisLeft
  - d3.svg.axis().scale(x).orient("bottom") -> d3.axisBottom
- d3.time.format -> d3.timeFormat
- d3.layout.stack ↦ d3.stack
- d3.scale.ordinal ↦ d3.scaleOrdinal
- d3.mouse -> d3.pointer
  - selection.on("mousemove", function(d) {
… do something with d3.event and d…
  })
  - becomes:

  - selection.on("mousemove", function(event, d) {
… do something with event and d …
  })
- svg.selectAll("g rect")[0] -> svg.selectAll("g rect").nodes()

Most of these changes come from v4 and v6, a full list of changes can be 
found at https://github.com/d3/d3/blob/main/CHANGES.md

### Why are the changes needed?

d3 v3 is very old(Feb 11, 2015), deprecated by some of its downstream that 
we depends on

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

This PR was locally tested with 
`org.apache.spark.examples.streaming.SqlNetworkWordCount`

 Streaming page loading


![image](https://github.com/apache/spark/assets/8326978/d688b9a7-4e14-42de-a28c-9d81a96bc4d5)

 SQL page loading


![image](https://github.com/apache/spark/assets/8326978/a1cac4e8-8f1f-480e-a0ac-359cbcfefaee)

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #42879 from yaooqinn/SPARK-45120.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 LICENSE|  6 ++-
 LICENSE-binary |  6 ++-
 .../resources/org/apache/spark/ui/static/d3.min.js |  7 +---
 .../org/apache/spark/ui/static/spark-dag-viz.js|  6 +--
 .../org/apache/spark/ui/static/streaming-page.js   | 48 --
 .../spark/ui/static/structured-streaming-page.js   | 31 ++
 licenses-binary/LICENSE-d3.min.js.txt  | 39 ++
 licenses/LICENSE-d3.min.js.txt | 39 ++
 .../spark/sql/execution/ui/static/spark-sql-viz.js |  4 +-
 9 files changed, 81 insertions(+), 105 deletions(-)

diff --git a/LICENSE b/LICENSE
index 1735d3208f2..3fee963db74 100644
--- a/LICENSE
+++ b/LICENSE
@@ -229,7 +229,6 @@ BSD 3-Clause
 python/lib/py4j-*-src.zip
 python/pyspark/cloudpickle/*.py
 python/pyspark/join.py
-core/src/main/resources/org/apache/spark/ui/static/d3.min.js
 
 The CSS style for the navigation sidebar of the documentation was originally
 submitted by Óscar Nájera for the scikit-learn project. The scikit-learn 
project
@@ -248,6 +247,11 @@ docs/js/vendor/anchor.min.js
 docs/js/vendor/jquery*
 docs/js/vendor/modernizer*
 
+ISC License
+---
+
+core/src/main/resources/org/apache/spark/ui/static/d3.min.js
+
 
 Creative Commons CC0 1.0 Universal Public Domain Dedication
 ---
diff --git a/LICENSE-binary b/LICENSE-binary
index 05645977a0b..900b6461106 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -461,7 +461,6 @@ org.jdom:jdom2
 python/lib/py4j-*-src.zip
 python/pyspark/cloudpickle.py
 python/pyspark/join.py
-core/src/main/resources/org/apache/spark/ui/static/d3.min.js
 
 The CSS style for the navigation sidebar of the documentation was originally
 submitted by Óscar Nájera for the scikit-learn project. The scikit-learn 
project
@@ -498,6 +497,11 @@ docs/js/vendor/anchor.min.js
 docs/js/vendor/jquery*
 docs/js/vendor/modernizer*
 
+ISC License
+---
+
+core/src/main/resources/org/apache/spark/ui/static/d3.min.js
+
 
 Common Development and Distribution License (CDDL) 1.0
 --
diff --git a/core/src/main/resources/org/apache/spark/ui/static/d3.min.js 
b/core/src/main/resources/org/apache/spark/ui/static/d3.min.js
index 30cd292198b..8d56002d90f 100644
---

svn commit: r63956 - /dev/spark/v3.5.0-rc5-docs/

2023-09-12 Thread liyuanjian

Author: liyuanjian
Date: Tue Sep 12 20:53:39 2023
New Revision: 63956

Log:
Remove RC artifacts

Removed:
dev/spark/v3.5.0-rc5-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-44872][CONNECT] Server testing infra and ReattachableExecuteSuite

2023-09-12 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new af8c0b999be [SPARK-44872][CONNECT] Server testing infra and 
ReattachableExecuteSuite
af8c0b999be is described below

commit af8c0b999be746b661efe2439ac015a0c7d12c00
Author: Juliusz Sompolski 
AuthorDate: Tue Sep 12 16:48:26 2023 +0200

[SPARK-44872][CONNECT] Server testing infra and ReattachableExecuteSuite

### What changes were proposed in this pull request?

Add `SparkConnectServerTest` with infra to test real server with real 
client in the same process, but communicating over RPC.

Add `ReattachableExecuteSuite` with some tests for reattachable execute.

Two bugs were found by the tests:
* Fix bug in `SparkConnectExecutionManager.createExecuteHolder` when 
attempting to resubmit an operation that was deemed abandoned. This bug is 
benign in reattachable execute, because reattachable execute would first send a 
ReattachExecute, which would be handled correctly in 
SparkConnectReattachExecuteHandler. For non-reattachable execute (disabled or 
old client), this is also a very unlikely scenario, because the retrying 
mechanism should be able to resubmit before the query is decl [...]
* In `ExecuteGrpcResponseSender` there was an assertion that assumed that 
if `sendResponse` did not send, it was because deadline was reached. But it can 
also be because of interrupt. This would have resulted in interrupt returning 
an assertion error instead of CURSOR_DISCONNECTED in testing. Outside of 
testing assertions are not enabled, so this was not a problem outside of 
testing.

### Why are the changes needed?

Testing of reattachable execute.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tests added.

Closes #42560 from juliuszsompolski/sc-reattachable-tests.

Authored-by: Juliusz Sompolski 
Signed-off-by: Herman van Hovell 
(cherry picked from commit 4b96add471d292ed5c63ccc625489ff78cfb9b25)
Signed-off-by: Herman van Hovell 
---
 .../sql/connect/client/CloseableIterator.scala |  22 +-
 .../client/CustomSparkConnectBlockingStub.scala|   2 +-
 .../ExecutePlanResponseReattachableIterator.scala  |  18 +-
 .../connect/client/GrpcExceptionConverter.scala|   5 +-
 .../sql/connect/client/GrpcRetryHandler.scala  |   4 +-
 .../execution/ExecuteGrpcResponseSender.scala  |  17 +-
 .../execution/ExecuteResponseObserver.scala|   8 +-
 .../spark/sql/connect/service/ExecuteHolder.scala  |  10 +
 .../service/SparkConnectExecutionManager.scala |  40 ++-
 .../spark/sql/connect/SparkConnectServerTest.scala | 261 +++
 .../execution/ReattachableExecuteSuite.scala   | 352 +
 .../scala/org/apache/spark/SparkFunSuite.scala |  24 ++
 12 files changed, 735 insertions(+), 28 deletions(-)

diff --git 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/CloseableIterator.scala
 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/CloseableIterator.scala
index 891e50ed6e7..d3fc9963edc 100644
--- 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/CloseableIterator.scala
+++ 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/CloseableIterator.scala
@@ -27,6 +27,20 @@ private[sql] trait CloseableIterator[E] extends Iterator[E] 
with AutoCloseable {
   }
 }
 
+private[sql] abstract class WrappedCloseableIterator[E] extends 
CloseableIterator[E] {
+
+  def innerIterator: Iterator[E]
+
+  override def next(): E = innerIterator.next()
+
+  override def hasNext(): Boolean = innerIterator.hasNext
+
+  override def close(): Unit = innerIterator match {
+case it: CloseableIterator[E] => it.close()
+case _ => // nothing
+  }
+}
+
 private[sql] object CloseableIterator {
 
   /**
@@ -35,12 +49,8 @@ private[sql] object CloseableIterator {
   def apply[T](iterator: Iterator[T]): CloseableIterator[T] = iterator match {
 case closeable: CloseableIterator[T] => closeable
 case _ =>
-  new CloseableIterator[T] {
-override def next(): T = iterator.next()
-
-override def hasNext(): Boolean = iterator.hasNext
-
-override def close() = { /* empty */ }
+  new WrappedCloseableIterator[T] {
+override def innerIterator = iterator
   }
   }
 }
diff --git 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/CustomSparkConnectBlockingStub.scala
 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/CustomSparkConnectBlockingStub.scala
index 73ff01e223f..80edcfa8be1 100644
---

[spark] branch branch-3.5 updated: [SPARK-45117][SQL] Implement missing otherCopyArgs for the MultiCommutativeOp expression

2023-09-12 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 4e44d929005 [SPARK-45117][SQL] Implement missing otherCopyArgs for the 
MultiCommutativeOp expression
4e44d929005 is described below

commit 4e44d929005ac457fc853b256c02fd93f35fcceb
Author: Supun Nakandala 
AuthorDate: Tue Sep 12 23:52:22 2023 +0800

[SPARK-45117][SQL] Implement missing otherCopyArgs for the 
MultiCommutativeOp expression

### What changes were proposed in this pull request?
- This PR implements the missing otherCopyArgs in the MultiCommutativeOp 
expression

### Why are the changes needed?
- Without this method implementation, calling toJSON will throw an 
exception from the TreeNode::jsonFields method.
- This is because the jsonFields method has an assertion that the number of 
fields defined in the constructor is equal to the number of field values 
(productIterator.toSeq ++ otherCopyArgs).
- The originalRoot field of the MultiCommutativeOp is not part of the 
productIterator. Hence, it has to be explicitly set in the otherCopyArgs field.

### Does this PR introduce _any_ user-facing change?
- No

### How was this patch tested?
- Added unit test

### Was this patch authored or co-authored using generative AI tooling?
- No

Closes #42873 from db-scnakandala/multi-commutative-op.

Authored-by: Supun Nakandala 
Signed-off-by: Wenchen Fan 
(cherry picked from commit d999f622dc68b4fb2734e2ac7cbe203b062c257f)
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/catalyst/expressions/Expression.scala  |  2 ++
 .../spark/sql/catalyst/expressions/CanonicalizeSuite.scala  | 13 +
 2 files changed, 15 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
index c2330cdb59d..bd7369e57b0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
@@ -1410,4 +1410,6 @@ case class MultiCommutativeOp(
 
   override protected def withNewChildrenInternal(newChildren: 
IndexedSeq[Expression]): Expression =
 this.copy(operands = newChildren)(originalRoot)
+
+  override protected final def otherCopyArgs: Seq[AnyRef] = originalRoot :: Nil
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
index 0e22b0d2876..89175ea1970 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
@@ -338,4 +338,17 @@ class CanonicalizeSuite extends SparkFunSuite {
 
 SQLConf.get.setConfString(MULTI_COMMUTATIVE_OP_OPT_THRESHOLD.key, 
default.toString)
   }
+
+  test("toJSON works properly with MultiCommutativeOp") {
+val default = SQLConf.get.getConf(MULTI_COMMUTATIVE_OP_OPT_THRESHOLD)
+SQLConf.get.setConfString(MULTI_COMMUTATIVE_OP_OPT_THRESHOLD.key, "1")
+
+val d = Decimal(1.2)
+val literal1 = Literal.create(d, DecimalType(2, 1))
+val literal2 = Literal.create(d, DecimalType(2, 1))
+val literal3 = Literal.create(d, DecimalType(3, 2))
+val op = Add(literal1, Add(literal2, literal3))
+assert(op.canonicalized.toJSON.nonEmpty)
+SQLConf.get.setConfString(MULTI_COMMUTATIVE_OP_OPT_THRESHOLD.key, 
default.toString)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45117][SQL] Implement missing otherCopyArgs for the MultiCommutativeOp expression

2023-09-12 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d999f622dc6 [SPARK-45117][SQL] Implement missing otherCopyArgs for the 
MultiCommutativeOp expression
d999f622dc6 is described below

commit d999f622dc68b4fb2734e2ac7cbe203b062c257f
Author: Supun Nakandala 
AuthorDate: Tue Sep 12 23:52:22 2023 +0800

[SPARK-45117][SQL] Implement missing otherCopyArgs for the 
MultiCommutativeOp expression

### What changes were proposed in this pull request?
- This PR implements the missing otherCopyArgs in the MultiCommutativeOp 
expression

### Why are the changes needed?
- Without this method implementation, calling toJSON will throw an 
exception from the TreeNode::jsonFields method.
- This is because the jsonFields method has an assertion that the number of 
fields defined in the constructor is equal to the number of field values 
(productIterator.toSeq ++ otherCopyArgs).
- The originalRoot field of the MultiCommutativeOp is not part of the 
productIterator. Hence, it has to be explicitly set in the otherCopyArgs field.

### Does this PR introduce _any_ user-facing change?
- No

### How was this patch tested?
- Added unit test

### Was this patch authored or co-authored using generative AI tooling?
- No

Closes #42873 from db-scnakandala/multi-commutative-op.

Authored-by: Supun Nakandala 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/catalyst/expressions/Expression.scala  |  2 ++
 .../spark/sql/catalyst/expressions/CanonicalizeSuite.scala  | 13 +
 2 files changed, 15 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
index c2330cdb59d..bd7369e57b0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
@@ -1410,4 +1410,6 @@ case class MultiCommutativeOp(
 
   override protected def withNewChildrenInternal(newChildren: 
IndexedSeq[Expression]): Expression =
 this.copy(operands = newChildren)(originalRoot)
+
+  override protected final def otherCopyArgs: Seq[AnyRef] = originalRoot :: Nil
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
index 0e22b0d2876..89175ea1970 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
@@ -338,4 +338,17 @@ class CanonicalizeSuite extends SparkFunSuite {
 
 SQLConf.get.setConfString(MULTI_COMMUTATIVE_OP_OPT_THRESHOLD.key, 
default.toString)
   }
+
+  test("toJSON works properly with MultiCommutativeOp") {
+val default = SQLConf.get.getConf(MULTI_COMMUTATIVE_OP_OPT_THRESHOLD)
+SQLConf.get.setConfString(MULTI_COMMUTATIVE_OP_OPT_THRESHOLD.key, "1")
+
+val d = Decimal(1.2)
+val literal1 = Literal.create(d, DecimalType(2, 1))
+val literal2 = Literal.create(d, DecimalType(2, 1))
+val literal3 = Literal.create(d, DecimalType(3, 2))
+val op = Add(literal1, Add(literal2, literal3))
+assert(op.canonicalized.toJSON.nonEmpty)
+SQLConf.get.setConfString(MULTI_COMMUTATIVE_OP_OPT_THRESHOLD.key, 
default.toString)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: Revert "[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3"

2023-09-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 6a2284feaac Revert "[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3"
6a2284feaac is described below

commit 6a2284feaac4f632d645a93361d29e693eeb9d32
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 12 08:49:40 2023 -0700

Revert "[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3"

This reverts commit 6a2aa1d48c304095dcdf2816a46ec1f5a8af41a2.
---
 dev/deps/spark-deps-hadoop-3-hive-2.3  |   2 +-
 pom.xml|   2 +-
 ...StoreBasicOperationsBenchmark-jdk11-results.txt | 120 ++---
 ...StoreBasicOperationsBenchmark-jdk17-results.txt | 120 ++---
 .../StateStoreBasicOperationsBenchmark-results.txt | 120 ++---
 5 files changed, 182 insertions(+), 182 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 3d3f710e74c..1d02f8dba56 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -227,7 +227,7 @@ parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar
 pickle/1.3//pickle-1.3.jar
 py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
-rocksdbjni/8.5.3//rocksdbjni-8.5.3.jar
+rocksdbjni/8.3.2//rocksdbjni-8.3.2.jar
 scala-collection-compat_2.12/2.7.0//scala-collection-compat_2.12-2.7.0.jar
 scala-compiler/2.12.18//scala-compiler-2.12.18.jar
 scala-library/2.12.18//scala-library-2.12.18.jar
diff --git a/pom.xml b/pom.xml
index 70e1ee71568..8fc4b89a78c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -679,7 +679,7 @@
   
 org.rocksdb
 rocksdbjni
-8.5.3
+8.3.2
   
   
 ${leveldbjni.group}
diff --git 
a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt
index 70e9849572c..d5c175a320d 100644
--- a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt
@@ -2,110 +2,110 @@
 put rows
 

 
-OpenJDK 64-Bit Server VM 11.0.20.1+1 on Linux 5.15.0-1045-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1040-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
 putting 1 rows (1 rows to overwrite - rate 100):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
---
-In-memory8 
 9   1  1.3 770.7   1.0X
-RocksDB (trackTotalNumberOfRows: true)  62 
63   1  0.26174.3   0.1X
-RocksDB (trackTotalNumberOfRows: false) 22 
23   1  0.52220.7   0.3X
+In-memory9 
11   2  1.1 872.7   1.0X
+RocksDB (trackTotalNumberOfRows: true)  61 
63   1  0.26148.5   0.1X
+RocksDB (trackTotalNumberOfRows: false) 21 
22   0  0.52108.9   0.4X
 
-OpenJDK 64-Bit Server VM 11.0.20.1+1 on Linux 5.15.0-1045-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1040-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
 putting 1 rows (5000 rows to overwrite - rate 50):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
-
-In-memory  8   
   9   1  1.3 781.2   1.0X
-RocksDB (trackTotalNumberOfRows: true)52   
  53   1  0.25196.0   0.2X
-RocksDB (trackTotalNumberOfRows: false)   22   
  24   1  0.42230.3   0.4X
+In-memory  9   
  10   1  1.1 872.0   1.0X
+RocksDB (trackTotalNumberOfRows: true)51   
  53   1  0.25134.7   0.2X
+RocksDB (trackTotalNumberOfRows: false)   21

[spark] branch master updated: Revert "[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3"

2023-09-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f33c2051c01 Revert "[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3"
f33c2051c01 is described below

commit f33c2051c01eb09f6bbb602ed3c7c637e1c6f421
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 12 08:48:59 2023 -0700

Revert "[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3"

This reverts commit fa2bc21ba1e6cbde31f33faa681f5a1c47219c69.
---
 dev/deps/spark-deps-hadoop-3-hive-2.3  |   2 +-
 pom.xml|   2 +-
 ...StoreBasicOperationsBenchmark-jdk11-results.txt | 120 ++---
 ...StoreBasicOperationsBenchmark-jdk17-results.txt | 120 ++---
 .../StateStoreBasicOperationsBenchmark-results.txt | 120 ++---
 5 files changed, 182 insertions(+), 182 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 91cad456a21..652127a9bb8 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -225,7 +225,7 @@ parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar
 pickle/1.3//pickle-1.3.jar
 py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
-rocksdbjni/8.5.3//rocksdbjni-8.5.3.jar
+rocksdbjni/8.3.2//rocksdbjni-8.3.2.jar
 scala-collection-compat_2.12/2.7.0//scala-collection-compat_2.12-2.7.0.jar
 scala-compiler/2.12.18//scala-compiler-2.12.18.jar
 scala-library/2.12.18//scala-library-2.12.18.jar
diff --git a/pom.xml b/pom.xml
index c632186f6e5..02920c0ae74 100644
--- a/pom.xml
+++ b/pom.xml
@@ -681,7 +681,7 @@
   
 org.rocksdb
 rocksdbjni
-8.5.3
+8.3.2
   
   
 ${leveldbjni.group}
diff --git 
a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt
index 70e9849572c..d5c175a320d 100644
--- a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt
@@ -2,110 +2,110 @@
 put rows
 

 
-OpenJDK 64-Bit Server VM 11.0.20.1+1 on Linux 5.15.0-1045-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1040-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
 putting 1 rows (1 rows to overwrite - rate 100):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
---
-In-memory8 
 9   1  1.3 770.7   1.0X
-RocksDB (trackTotalNumberOfRows: true)  62 
63   1  0.26174.3   0.1X
-RocksDB (trackTotalNumberOfRows: false) 22 
23   1  0.52220.7   0.3X
+In-memory9 
11   2  1.1 872.7   1.0X
+RocksDB (trackTotalNumberOfRows: true)  61 
63   1  0.26148.5   0.1X
+RocksDB (trackTotalNumberOfRows: false) 21 
22   0  0.52108.9   0.4X
 
-OpenJDK 64-Bit Server VM 11.0.20.1+1 on Linux 5.15.0-1045-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+OpenJDK 64-Bit Server VM 11.0.19+7 on Linux 5.15.0-1040-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
 putting 1 rows (5000 rows to overwrite - rate 50):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
-
-In-memory  8   
   9   1  1.3 781.2   1.0X
-RocksDB (trackTotalNumberOfRows: true)52   
  53   1  0.25196.0   0.2X
-RocksDB (trackTotalNumberOfRows: false)   22   
  24   1  0.42230.3   0.4X
+In-memory  9   
  10   1  1.1 872.0   1.0X
+RocksDB (trackTotalNumberOfRows: true)51   
  53   1  0.25134.7   0.2X
+RocksDB (trackTotalNumberOfRows: false)   21

[spark] branch master updated (d8298bffd91 -> 4b96add471d)

2023-09-12 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d8298bffd91 [SPARK-45081][SQL] Encoders.bean does no longer work with 
read-only properties
 add 4b96add471d [SPARK-44872][CONNECT] Server testing infra and 
ReattachableExecuteSuite

No new revisions were added by this update.

Summary of changes:
 .../sql/connect/client/CloseableIterator.scala |  22 +-
 .../client/CustomSparkConnectBlockingStub.scala|   2 +-
 .../ExecutePlanResponseReattachableIterator.scala  |  18 +-
 .../connect/client/GrpcExceptionConverter.scala|   5 +-
 .../sql/connect/client/GrpcRetryHandler.scala  |   4 +-
 .../execution/ExecuteGrpcResponseSender.scala  |  17 +-
 .../execution/ExecuteResponseObserver.scala|   8 +-
 .../spark/sql/connect/service/ExecuteHolder.scala  |  10 +
 .../service/SparkConnectExecutionManager.scala |  40 ++-
 .../spark/sql/connect/SparkConnectServerTest.scala | 261 +++
 .../execution/ReattachableExecuteSuite.scala   | 352 +
 .../scala/org/apache/spark/SparkFunSuite.scala |  24 ++
 12 files changed, 735 insertions(+), 28 deletions(-)
 create mode 100644 
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/SparkConnectServerTest.scala
 create mode 100644 
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/execution/ReattachableExecuteSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45081][SQL] Encoders.bean does no longer work with read-only properties

2023-09-12 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new ffa4127c774 [SPARK-45081][SQL] Encoders.bean does no longer work with 
read-only properties
ffa4127c774 is described below

commit ffa4127c774ea13b4d6bbcc82bc5a9bee23d7156
Author: Giambattista Bloisi 
AuthorDate: Tue Sep 12 16:16:04 2023 +0200

[SPARK-45081][SQL] Encoders.bean does no longer work with read-only 
properties

### What changes were proposed in this pull request?
This PR re-enables Encoders.bean to be called against beans having 
read-only properties, that is properties that have only getters and no setter 
method. Beans with read only properties are even used in internal tests.
Setter methods of a Java bean encoder are stored within an Option wrapper 
because they are missing in case of read-only properties. When a java bean has 
to be initialized, setter methods for the bean properties have to be called: 
this PR filters out read-only properties from that process.

### Why are the changes needed?
The changes are required to avoid an exception to the thrown by getting the 
value of a None option object.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
An additional regression test has been added

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42829 from gbloisi-openaire/SPARK-45081.

Authored-by: Giambattista Bloisi 
Signed-off-by: Herman van Hovell 
(cherry picked from commit d8298bffd91de01299f9456b37e4454e8b4a6ae8)
Signed-off-by: Herman van Hovell 
---
 .../sql/connect/client/arrow/ArrowDeserializer.scala | 20 +++-
 .../spark/sql/catalyst/DeserializerBuildHelper.scala |  4 +++-
 .../test/org/apache/spark/sql/JavaDatasetSuite.java  | 17 +
 3 files changed, 31 insertions(+), 10 deletions(-)

diff --git 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
index cd54966ccf5..94295785987 100644
--- 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
+++ 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
@@ -332,15 +332,17 @@ object ArrowDeserializers {
 val constructor =
   methodLookup.findConstructor(tag.runtimeClass, 
MethodType.methodType(classOf[Unit]))
 val lookup = createFieldLookup(vectors)
-val setters = fields.map { field =>
-  val vector = lookup(field.name)
-  val deserializer = deserializerFor(field.enc, vector, timeZoneId)
-  val setter = methodLookup.findVirtual(
-tag.runtimeClass,
-field.writeMethod.get,
-MethodType.methodType(classOf[Unit], 
field.enc.clsTag.runtimeClass))
-  (bean: Any, i: Int) => setter.invoke(bean, deserializer.get(i))
-}
+val setters = fields
+  .filter(_.writeMethod.isDefined)
+  .map { field =>
+val vector = lookup(field.name)
+val deserializer = deserializerFor(field.enc, vector, timeZoneId)
+val setter = methodLookup.findVirtual(
+  tag.runtimeClass,
+  field.writeMethod.get,
+  MethodType.methodType(classOf[Unit], 
field.enc.clsTag.runtimeClass))
+(bean: Any, i: Int) => setter.invoke(bean, deserializer.get(i))
+  }
 new StructFieldSerializer[Any](struct) {
   def value(i: Int): Any = {
 val instance = constructor.invoke()
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala
index 16a7d7ff065..0b88d5a4130 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala
@@ -390,7 +390,9 @@ object DeserializerBuildHelper {
 CreateExternalRow(convertedFields, enc.schema))
 
 case JavaBeanEncoder(tag, fields) =>
-  val setters = fields.map { f =>
+  val setters = fields
+.filter(_.writeMethod.isDefined)
+.map { f =>
 val newTypePath = walkedTypePath.recordField(
   f.enc.clsTag.runtimeClass.getName,
   f.name)
diff --git 
a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java 
b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
index 4f7cf8da787..f416d411322 100644
---

[spark] branch master updated: [SPARK-45081][SQL] Encoders.bean does no longer work with read-only properties

2023-09-12 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d8298bffd91 [SPARK-45081][SQL] Encoders.bean does no longer work with 
read-only properties
d8298bffd91 is described below

commit d8298bffd91de01299f9456b37e4454e8b4a6ae8
Author: Giambattista Bloisi 
AuthorDate: Tue Sep 12 16:16:04 2023 +0200

[SPARK-45081][SQL] Encoders.bean does no longer work with read-only 
properties

### What changes were proposed in this pull request?
This PR re-enables Encoders.bean to be called against beans having 
read-only properties, that is properties that have only getters and no setter 
method. Beans with read only properties are even used in internal tests.
Setter methods of a Java bean encoder are stored within an Option wrapper 
because they are missing in case of read-only properties. When a java bean has 
to be initialized, setter methods for the bean properties have to be called: 
this PR filters out read-only properties from that process.

### Why are the changes needed?
The changes are required to avoid an exception to the thrown by getting the 
value of a None option object.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
An additional regression test has been added

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42829 from gbloisi-openaire/SPARK-45081.

Authored-by: Giambattista Bloisi 
Signed-off-by: Herman van Hovell 
---
 .../sql/connect/client/arrow/ArrowDeserializer.scala | 20 +++-
 .../spark/sql/catalyst/DeserializerBuildHelper.scala |  4 +++-
 .../test/org/apache/spark/sql/JavaDatasetSuite.java  | 17 +
 3 files changed, 31 insertions(+), 10 deletions(-)

diff --git 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
index cd54966ccf5..94295785987 100644
--- 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
+++ 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
@@ -332,15 +332,17 @@ object ArrowDeserializers {
 val constructor =
   methodLookup.findConstructor(tag.runtimeClass, 
MethodType.methodType(classOf[Unit]))
 val lookup = createFieldLookup(vectors)
-val setters = fields.map { field =>
-  val vector = lookup(field.name)
-  val deserializer = deserializerFor(field.enc, vector, timeZoneId)
-  val setter = methodLookup.findVirtual(
-tag.runtimeClass,
-field.writeMethod.get,
-MethodType.methodType(classOf[Unit], 
field.enc.clsTag.runtimeClass))
-  (bean: Any, i: Int) => setter.invoke(bean, deserializer.get(i))
-}
+val setters = fields
+  .filter(_.writeMethod.isDefined)
+  .map { field =>
+val vector = lookup(field.name)
+val deserializer = deserializerFor(field.enc, vector, timeZoneId)
+val setter = methodLookup.findVirtual(
+  tag.runtimeClass,
+  field.writeMethod.get,
+  MethodType.methodType(classOf[Unit], 
field.enc.clsTag.runtimeClass))
+(bean: Any, i: Int) => setter.invoke(bean, deserializer.get(i))
+  }
 new StructFieldSerializer[Any](struct) {
   def value(i: Int): Any = {
 val instance = constructor.invoke()
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala
index 16a7d7ff065..0b88d5a4130 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala
@@ -390,7 +390,9 @@ object DeserializerBuildHelper {
 CreateExternalRow(convertedFields, enc.schema))
 
 case JavaBeanEncoder(tag, fields) =>
-  val setters = fields.map { f =>
+  val setters = fields
+.filter(_.writeMethod.isDefined)
+.map { f =>
 val newTypePath = walkedTypePath.recordField(
   f.enc.clsTag.runtimeClass.getName,
   f.name)
diff --git 
a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java 
b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
index 4f7cf8da787..f416d411322 100644
--- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
+++ b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
@@ -1783,6

[spark] branch master updated: [MINOR][DOCS] Add errors.rst to .gitignore

2023-09-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2b731565941 [MINOR][DOCS] Add errors.rst to .gitignore
2b731565941 is described below

commit 2b731565941a9aedd52e51a10aabed4115be41cf
Author: panbingkun 
AuthorDate: Tue Sep 12 19:01:34 2023 +0900

[MINOR][DOCS] Add errors.rst to .gitignore

### What changes were proposed in this pull request?
After PR [[SPARK-44945][DOCS][PYTHON] Automate PySpark error class 
documentation](https://github.com/apache/spark/pull/42658), `errors.rst` file 
will be automatically generated during document build process.
https://github.com/apache/spark/assets/15246973/f6048e2e-7fc8-4930-9c11-767ccbfa1c68;>
Add the file `python/docs/source/development/errors.rst` to git ignore.

### Why are the changes needed?
- To avoid developers from accidentally adding those files when working on 
docs.

- According on what we have seen, the files generated ` 
python/docs/source/user_guide/pandas_on_spark/supported_pandas_api.rst` in the 
building document have also been added to the `.gitignore` file.

https://github.com/apache/spark/blob/994b6976b2a5a53323a83e70e0c6195cd74292a1/python/docs/source/conf.py#L26-L41

https://github.com/apache/spark/blob/994b6976b2a5a53323a83e70e0c6195cd74292a1/.gitignore#L77

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42885 from panbingkun/minor_errors_rst.

Authored-by: panbingkun 
Signed-off-by: Hyukjin Kwon 
---
 .gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitignore b/.gitignore
index 064b502175b..174f66c6064 100644
--- a/.gitignore
+++ b/.gitignore
@@ -73,6 +73,7 @@ python/.eggs/
 python/coverage.xml
 python/deps
 python/docs/_site/
+python/docs/source/development/errors.rst
 python/docs/source/reference/**/api/
 python/docs/source/user_guide/pandas_on_spark/supported_pandas_api.rst
 python/test_coverage/coverage_data


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44911][SQL] Create hive table with invalid column should return error class

2023-09-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1e03db36a93 [SPARK-44911][SQL] Create hive table with invalid column 
should return error class
1e03db36a93 is described below

commit 1e03db36a939aea5b4d55059967ccde96cb29564
Author: ming95 <505306...@qq.com>
AuthorDate: Tue Sep 12 11:55:08 2023 +0300

[SPARK-44911][SQL] Create hive table with invalid column should return 
error class

### What changes were proposed in this pull request?

create hive table with invalid column should return error class.

run sql
```
create table test stored as parquet as select id, date'2018-01-01' + 
make_dt_interval(0, id)  from range(0, 10)
```

before this issue , error would be :

```
org.apache.spark.sql.AnalysisException: Cannot create a table having a 
column whose name contains commas in Hive metastore. Table: 
`spark_catalog`.`default`.`test`; Column: DATE '2018-01-01' + 
make_dt_interval(0, id, 0, 0.00)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4(HiveExternalCatalog.scala:175)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4$adapted(HiveExternalCatalog.scala:171)
at scala.collection.Iterator.foreach(Iterator.scala:943)
```

after this issue
```
Exception in thread "main" org.apache.spark.sql.AnalysisException: 
[INVALID_HIVE_COLUMN_NAME] Cannot create the table 
`spark_catalog`.`default`.`parquet_ds1` having the column `DATE '2018-01-01' + 
make_dt_interval(0, id, 0, 0`.`00)` whose name contains invalid characters 
',' in Hive metastore.
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4(HiveExternalCatalog.scala:180)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$verifyDataSchema$4$adapted(HiveExternalCatalog.scala:171)
at scala.collection.Iterator.foreach(Iterator.scala:943)
```

### Why are the changes needed?

as above

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

add UT

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #42609 from ming95/SPARK-44911.

Authored-by: ming95 <505306...@qq.com>
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  2 +-
 docs/sql-error-conditions.md   |  2 +-
 .../spark/sql/hive/HiveExternalCatalog.scala   | 11 ---
 .../spark/sql/hive/execution/HiveDDLSuite.scala| 21 
 .../spark/sql/hive/execution/SQLQuerySuite.scala   | 23 +++---
 5 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 415bdbaf42a..4740ed72f89 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1587,7 +1587,7 @@
   },
   "INVALID_HIVE_COLUMN_NAME" : {
 "message" : [
-  "Cannot create the table  having the nested column 
 whose name contains invalid characters  in Hive 
metastore."
+  "Cannot create the table  having the column  
whose name contains invalid characters  in Hive metastore."
 ]
   },
   "INVALID_IDENTIFIER" : {
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 0d54938593c..444c2b7c0d1 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -971,7 +971,7 @@ For more details see 
[INVALID_HANDLE](sql-error-conditions-invalid-handle-error-
 
 SQLSTATE: none assigned
 
-Cannot create the table `` having the nested column `` 
whose name contains invalid characters `` in Hive metastore.
+Cannot create the table `` having the column `` whose 
name contains invalid characters `` in Hive metastore.
 
 ### INVALID_IDENTIFIER
 
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
index e4325989b70..67292460bbc 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
@@ -42,7 +42,7 @@ import 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils._
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.types.DataTypeUtils
 import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, 
CharVarcharUtils}
-import org.apache.spark.sql.catalyst.util.TypeUtils.toSQLId
+import org.apache.spark.sql.catalyst.util.TypeUtils.{toSQLId,

[spark] branch master updated: [SPARK-45092][SQL][UI] Avoid analyzing twice for failed queries

2023-09-12 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4d213ff3dea [SPARK-45092][SQL][UI] Avoid analyzing twice for failed 
queries
4d213ff3dea is described below

commit 4d213ff3dea4d66e5dec7be3b35c5441d9187c30
Author: Kent Yao 
AuthorDate: Tue Sep 12 16:35:39 2023 +0800

[SPARK-45092][SQL][UI] Avoid analyzing twice for failed queries

### What changes were proposed in this pull request?

As a discussion starting from 
https://github.com/apache/spark/pull/42481#discussion_r1316776270, for failed 
queries, we need to avoid calling SparkPlanInfo fromSparkPlan, which triggers 
another round of analyzing.

This patch uses `Either[Throwable, () => T]` to pass the throwable 
conditionally and bypass plan explain functions on error.

### Why are the changes needed?

improvements of https://github.com/apache/spark/pull/42481

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #42838 from yaooqinn/SPARK-45092.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../spark/sql/execution/QueryExecution.scala   |  2 +-
 .../apache/spark/sql/execution/SQLExecution.scala  | 72 ++
 .../spark/sql/execution/ui/UISeleniumSuite.scala   |  2 +-
 3 files changed, 49 insertions(+), 27 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
index 8ddfde8acf8..b3c97a83970 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
@@ -71,7 +71,7 @@ class QueryExecution(
 // Because we do eager analysis for Dataframe, there will be no 
execution created after
 // AnalysisException occurs. So we need to explicitly create a new 
execution to post
 // start/end events to notify the listener and UI components.
-SQLExecution.withNewExecutionId(this, Some("analyze"))(throw e)
+SQLExecution.withNewExecutionIdOnError(this, Some("analyze"))(e)
 }
   }
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala
index 2a44a016d2d..b96b9c25dda 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala
@@ -66,9 +66,10 @@ object SQLExecution extends Logging {
* Wrap an action that will execute "queryExecution" to track all Spark jobs 
in the body so that
* we can connect them with an execution.
*/
-  def withNewExecutionId[T](
+  private def withNewExecutionId0[T](
   queryExecution: QueryExecution,
-  name: Option[String] = None)(body: => T): T = 
queryExecution.sparkSession.withActive {
+  name: Option[String] = None)(
+  body: Either[Throwable, () => T]): T = 
queryExecution.sparkSession.withActive {
 val sparkSession = queryExecution.sparkSession
 val sc = sparkSession.sparkContext
 val oldExecutionId = sc.getLocalProperty(EXECUTION_ID_KEY)
@@ -103,9 +104,6 @@ object SQLExecution extends Logging {
   redactedStr.substring(0, Math.min(truncateLength, 
redactedStr.length))
 }.getOrElse(callSite.shortForm)
 
-  val planDescriptionMode =
-ExplainMode.fromString(sparkSession.sessionState.conf.uiExplainMode)
-
   val globalConfigs = sparkSession.sharedState.conf.getAll.toMap
   val modifiedConfigs = sparkSession.sessionState.conf.getAllConfs
 .filterNot { case (key, value) =>
@@ -118,28 +116,39 @@ object SQLExecution extends Logging {
   withSQLConfPropagated(sparkSession) {
 var ex: Option[Throwable] = None
 val startTime = System.nanoTime()
+val startEvent = SparkListenerSQLExecutionStart(
+  executionId = executionId,
+  rootExecutionId = Some(rootExecutionId),
+  description = desc,
+  details = callSite.longForm,
+  physicalPlanDescription = "",
+  sparkPlanInfo = SparkPlanInfo.EMPTY,
+  time = System.currentTimeMillis(),
+  modifiedConfigs = redactedConfigs,
+  jobTags = sc.getJobTags()
+)
 try {
-  val planInfo = try {
-SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan)
-  } catch {
-case NonFatal(e) =>
-  logDebug("Failed to generate SparkPlanInfo", e)
-  // If the queryExecution already failed before this, we are not 
able

[spark] branch master updated: [SPARK-44915][CORE] Validate checksum of remounted PVC's shuffle data before recovery

2023-09-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 994b6976b2a [SPARK-44915][CORE] Validate checksum of remounted PVC's 
shuffle data before recovery
994b6976b2a is described below

commit 994b6976b2a5a53323a83e70e0c6195cd74292a1
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 12 01:15:01 2023 -0700

[SPARK-44915][CORE] Validate checksum of remounted PVC's shuffle data 
before recovery

### What changes were proposed in this pull request?

This PR aims to validate checksum of remounted PVC's shuffle data before 
recovery.

### Why are the changes needed?

In general, there are many reasons which causes the executor terminations 
and some of them causes data corruptions on disks. Since Apache Spark has 
checksum files already, we can take advantage of it in order to improve the 
robustness by preventing any potential remounted disk issues.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with newly added test suite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42724 from dongjoon-hyun/SPARK-44915.

Lead-authored-by: Dongjoon Hyun 
Co-authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/shuffle/ShuffleChecksumUtils.scala   |  13 ++
 ...ernetesLocalDiskShuffleExecutorComponents.scala |  69 +--
 .../spark/shuffle/ShuffleChecksumUtilsSuite.scala  | 134 +
 3 files changed, 209 insertions(+), 7 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/shuffle/ShuffleChecksumUtils.scala 
b/core/src/main/scala/org/apache/spark/shuffle/ShuffleChecksumUtils.scala
index 75b0efcf5cd..b2a18d75387 100644
--- a/core/src/main/scala/org/apache/spark/shuffle/ShuffleChecksumUtils.scala
+++ b/core/src/main/scala/org/apache/spark/shuffle/ShuffleChecksumUtils.scala
@@ -22,9 +22,22 @@ import java.util.zip.CheckedInputStream
 
 import org.apache.spark.network.shuffle.checksum.ShuffleChecksumHelper
 import org.apache.spark.network.util.LimitedInputStream
+import org.apache.spark.shuffle.IndexShuffleBlockResolver.NOOP_REDUCE_ID
+import org.apache.spark.storage.{BlockId, ShuffleChecksumBlockId, 
ShuffleDataBlockId}
 
 object ShuffleChecksumUtils {
 
+  /**
+   * Return checksumFile for shuffle data block ID. Otherwise, null.
+   */
+  def getChecksumFileName(blockId: BlockId, algorithm: String): String = 
blockId match {
+case ShuffleDataBlockId(shuffleId, mapId, _) =>
+  ShuffleChecksumHelper.getChecksumFileName(
+ShuffleChecksumBlockId(shuffleId, mapId, NOOP_REDUCE_ID).name, 
algorithm)
+case _ =>
+  null
+  }
+
   /**
* Ensure that the checksum values are consistent with index file and data 
file.
*/
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleExecutorComponents.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleExecutorComponents.scala
index e553a56b7e1..a858db374df 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleExecutorComponents.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleExecutorComponents.scala
@@ -20,6 +20,7 @@ package org.apache.spark.shuffle
 import java.io.File
 import java.util.Optional
 
+import scala.collection.mutable
 import scala.reflect.ClassTag
 
 import org.apache.commons.io.FileExistsException
@@ -27,9 +28,11 @@ import org.apache.commons.io.FileExistsException
 import org.apache.spark.{SparkConf, SparkEnv}
 import org.apache.spark.deploy.k8s.Config.KUBERNETES_DRIVER_REUSE_PVC
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.{SHUFFLE_CHECKSUM_ALGORITHM, 
SHUFFLE_CHECKSUM_ENABLED}
+import org.apache.spark.shuffle.ShuffleChecksumUtils.{compareChecksums, 
getChecksumFileName}
 import org.apache.spark.shuffle.api.{ShuffleExecutorComponents, 
ShuffleMapOutputWriter, SingleSpillShuffleMapOutputWriter}
 import org.apache.spark.shuffle.sort.io.LocalDiskShuffleExecutorComponents
-import org.apache.spark.storage.{BlockId, BlockManager, StorageLevel, 
UnrecognizedBlockId}
+import org.apache.spark.storage.{BlockId, BlockManager, ShuffleDataBlockId, 
StorageLevel, UnrecognizedBlockId}
 import org.apache.spark.util.Utils
 
 class KubernetesLocalDiskShuffleExecutorComponents(sparkConf: SparkConf)
@@ -73,7 +76,7 @@ object KubernetesLocalDiskShuffleExecutorComponents extends 
Logging {
*/
   def recoverDiskStore(conf: SparkConf, bm: BlockManager): Unit = {
 // Find All files
-val files = Utils.getConfiguredLocalDirs(conf)
+val

[spark] tag v3.5.0 created (now ce5ddad9903)

2023-09-12 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a change to tag v3.5.0
in repository https://gitbox.apache.org/repos/asf/spark.git


  at ce5ddad9903 (commit)
No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6565ae47cae -> d8129f837c4)

2023-09-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6565ae47cae [SPARK-43252][SQL] Replace the error class 
`_LEGACY_ERROR_TEMP_2016` with an internal error
 add d8129f837c4 [SPARK-45085][SQL] Merge UNSUPPORTED_TEMP_VIEW_OPERATION 
into UNSUPPORTED_VIEW_OPERATION and refactor some logic

No new revisions were added by this update.

Summary of changes:
 R/pkg/tests/fulltests/test_sparkSQL.R  |  2 +-
 .../src/main/resources/error/error-classes.json| 17 --
 docs/sql-error-conditions.md   |  8 ---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 19 +++
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  |  4 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  | 52 +++--
 .../analyzer-results/change-column.sql.out |  8 +--
 .../sql-tests/results/change-column.sql.out|  8 +--
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  4 +-
 .../apache/spark/sql/execution/SQLViewSuite.scala  | 66 +++---
 .../spark/sql/execution/SQLViewTestSuite.scala |  4 +-
 .../spark/sql/execution/command/DDLSuite.scala |  6 +-
 .../execution/command/TruncateTableSuiteBase.scala | 10 ++--
 .../execution/command/v1/ShowPartitionsSuite.scala | 10 ++--
 .../apache/spark/sql/internal/CatalogSuite.scala   |  4 +-
 15 files changed, 80 insertions(+), 142 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fa2bc21ba1e -> 6565ae47cae)

2023-09-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from fa2bc21ba1e [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3
 add 6565ae47cae [SPARK-43252][SQL] Replace the error class 
`_LEGACY_ERROR_TEMP_2016` with an internal error

No new revisions were added by this update.

Summary of changes:
 common/utils/src/main/resources/error/error-classes.json|  5 -
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala  |  6 ++
 .../sql/catalyst/expressions/codegen/CodeBlockSuite.scala   | 13 -
 3 files changed, 10 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d7845da6ddf -> fa2bc21ba1e)

2023-09-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d7845da6ddf [SPARK-45125][INFRA] Remove dev/github_jira_sync.py in 
favor of ASF jira_options
 add fa2bc21ba1e [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3  |   2 +-
 pom.xml|   2 +-
 ...StoreBasicOperationsBenchmark-jdk11-results.txt | 120 ++---
 ...StoreBasicOperationsBenchmark-jdk17-results.txt | 120 ++---
 .../StateStoreBasicOperationsBenchmark-results.txt | 120 ++---
 5 files changed, 182 insertions(+), 182 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3

2023-09-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 6a2aa1d48c3 [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3
6a2aa1d48c3 is described below

commit 6a2aa1d48c304095dcdf2816a46ec1f5a8af41a2
Author: panbingkun 
AuthorDate: Tue Sep 12 00:29:38 2023 -0700

[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3

### What changes were proposed in this pull request?
This pr aims to upgrade rocksdbjni from 8.3.2 to 8.5.3.

### Why are the changes needed?
1.The full release notes:
- https://github.com/facebook/rocksdb/releases/tag/v8.5.3
- https://github.com/facebook/rocksdb/releases/tag/v8.4.4
- https://github.com/facebook/rocksdb/releases/tag/v8.3.3

2.Bug Fixes:
https://github.com/apache/spark/assets/15246973/879224c3-6f29-40d7-9c07-0f656fa2ff76;>
- Fix a bug where if there is an error reading from offset 0 of a file from 
L1+ and that the file is not the first file in the sorted run, data can be lost 
in compaction and read/scan can return incorrect results.
- Fix a bug where iterator may return incorrect result for DeleteRange() 
users if there was an error reading from a file.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test:
```
./build/mvn clean install -pl core -am 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest -fn
...
[INFO] 

[INFO] Reactor Summary for Spark Project Parent POM 4.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [  
7.121 s]
[INFO] Spark Project Tags . SUCCESS [ 
10.181 s]
[INFO] Spark Project Local DB . SUCCESS [ 
21.153 s]
[INFO] Spark Project Common Utils . SUCCESS [ 
14.960 s]
[INFO] Spark Project Networking ... SUCCESS [01:01 
min]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [ 
16.992 s]
[INFO] Spark Project Unsafe ... SUCCESS [ 
14.967 s]
[INFO] Spark Project Launcher . SUCCESS [ 
11.737 s]
[INFO] Spark Project Core . SUCCESS [38:06 
min]
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time:  40:45 min
[INFO] Finished at: 2023-09-10T17:25:26+08:00
[INFO] 


```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42862 from panbingkun/SPARK-45110.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fa2bc21ba1e6cbde31f33faa681f5a1c47219c69)
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3  |   2 +-
 pom.xml|   2 +-
 ...StoreBasicOperationsBenchmark-jdk11-results.txt | 120 ++---
 ...StoreBasicOperationsBenchmark-jdk17-results.txt | 120 ++---
 .../StateStoreBasicOperationsBenchmark-results.txt | 120 ++---
 5 files changed, 182 insertions(+), 182 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 1d02f8dba56..3d3f710e74c 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -227,7 +227,7 @@ parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar
 pickle/1.3//pickle-1.3.jar
 py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
-rocksdbjni/8.3.2//rocksdbjni-8.3.2.jar
+rocksdbjni/8.5.3//rocksdbjni-8.5.3.jar
 scala-collection-compat_2.12/2.7.0//scala-collection-compat_2.12-2.7.0.jar
 scala-compiler/2.12.18//scala-compiler-2.12.18.jar
 scala-library/2.12.18//scala-library-2.12.18.jar
diff --git a/pom.xml b/pom.xml
index 8fc4b89a78c..70e1ee71568 100644
--- a/pom.xml
+++ b/pom.xml
@@ -679,7 +679,7 @@
   
 org.rocksdb
 rocksdbjni
-8.3.2
+8.5.3
   
   
 ${leveldbjni.group}
diff --git 
a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt
index d5c175a320d..70e9849572c 100644
--- a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk11-results.txt
@@ -2,110 +2,110 @@
 put rows

[spark] branch master updated: [SPARK-45125][INFRA] Remove dev/github_jira_sync.py in favor of ASF jira_options

2023-09-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d7845da6ddf [SPARK-45125][INFRA] Remove dev/github_jira_sync.py in 
favor of ASF jira_options
d7845da6ddf is described below

commit d7845da6ddf2f838b1d91606b8730d078fea11b4
Author: Kent Yao 
AuthorDate: Tue Sep 12 00:27:17 2023 -0700

[SPARK-45125][INFRA] Remove dev/github_jira_sync.py in favor of ASF 
jira_options

### What changes were proposed in this pull request?

Since SPARK-44942 and https://issues.apache.org/jira/browse/INFRA-24962, 
we've enabled jira_options for GitHub and JIRA syncing, and it's been working 
properly.

Thus, this PR removes dev/github_jira_sync.py in favor of ASF jira_options.

### Why are the changes needed?

code cleanup

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

confirmed with INFRA and watch the jiar for several days
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #42882 from yaooqinn/SPARK-45125.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .github/labeler.yml |   4 +-
 dev/github_jira_sync.py | 202 
 2 files changed, 1 insertion(+), 205 deletions(-)

diff --git a/.github/labeler.yml b/.github/labeler.yml
index 4ae831f2131..b252edd8873 100644
--- a/.github/labeler.yml
+++ b/.github/labeler.yml
@@ -42,12 +42,11 @@ INFRA:
   - ".asf.yaml"
   - ".gitattributes"
   - ".gitignore"
-  - "dev/github_jira_sync.py"
   - "dev/merge_spark_pr.py"
   - "dev/run-tests-jenkins*"
 BUILD:
  # Can be supported when a stable release with correct all/any is released
- #- any: ['dev/**/*', '!dev/github_jira_sync.py', '!dev/merge_spark_pr.py', 
'!dev/.rat-excludes']
+ #- any: ['dev/**/*', '!dev/merge_spark_pr.py', '!dev/.rat-excludes']
  - "dev/**/*"
  - "build/**/*"
  - "project/**/*"
@@ -58,7 +57,6 @@ BUILD:
  - "scalastyle-config.xml"
  # These can be added in the above `any` clause (and the /dev/**/* glob 
removed) when
  # `any`/`all` support is released
- # - "!dev/github_jira_sync.py"
  # - "!dev/merge_spark_pr.py"
  # - "!dev/run-tests-jenkins*"
  # - "!dev/.rat-excludes"
diff --git a/dev/github_jira_sync.py b/dev/github_jira_sync.py
deleted file mode 100755
index 45908518d82..000
--- a/dev/github_jira_sync.py
+++ /dev/null
@@ -1,202 +0,0 @@
-#!/usr/bin/env python3
-
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# Utility for updating JIRA's with information about GitHub pull requests
-
-import json
-import os
-import re
-import sys
-from urllib.request import urlopen
-from urllib.request import Request
-from urllib.error import HTTPError
-
-try:
-import jira.client
-except ImportError:
-print("This tool requires the jira-python library")
-print("Install using 'pip3 install jira'")
-sys.exit(-1)
-
-# User facing configs
-GITHUB_API_BASE = os.environ.get("GITHUB_API_BASE", 
"https://api.github.com/repos/apache/spark;)
-GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY")
-JIRA_PROJECT_NAME = os.environ.get("JIRA_PROJECT_NAME", "SPARK")
-JIRA_API_BASE = os.environ.get("JIRA_API_BASE", 
"https://issues.apache.org/jira;)
-JIRA_USERNAME = os.environ.get("JIRA_USERNAME", "apachespark")
-JIRA_PASSWORD = os.environ.get("JIRA_PASSWORD", "XXX")
-# Maximum number of updates to perform in one run
-MAX_UPDATES = int(os.environ.get("MAX_UPDATES", "10"))
-# Cut-off for oldest PR on which to comment. Useful for avoiding
-# "notification overload" when running for the first time.
-MIN_COMMENT_PR = int(os.environ.get("MIN_COMMENT_PR", "1496"))
-
-# File used as an optimization to store maximum previously seen PR
-# Used mostly because accessing ASF JIRA is slow, so we want to avoid checking
-# the state of JIRA's that are tied to PR's we've already looked at.
-MAX_FILE = ".github-jira-max"
-
-
-def get_url(url):
-try:
-request = Request(url)
-request.add_header("Authorization", "token %s" % GITHUB_OAUTH_KEY)
-

[spark] branch branch-3.4 updated: [SPARK-45109][SQL][CONNECT][3.4] Fix log function in Connect

2023-09-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 544017854c7 [SPARK-45109][SQL][CONNECT][3.4] Fix log function in 
Connect
544017854c7 is described below

commit 544017854c77be1c8b7fffc3f23a5fdee2fb798e
Author: Peter Toth 
AuthorDate: Tue Sep 12 00:23:49 2023 -0700

[SPARK-45109][SQL][CONNECT][3.4] Fix log function in Connect

### What changes were proposed in this pull request?
This is a backport PR of https://github.com/apache/spark/pull/42869 as the 
1 argument `log` function should point to `ln`. (Please note that the original 
https://github.com/apache/spark/pull/42863 doesn't need to be backported as 
`aes_descrypt` and `ln` is not implemented in Connect in Spark 3.4.)

### Why are the changes needed?
Bugfix.

### Does this PR introduce _any_ user-facing change?
No, these Spark Connect functions haven't been released.

### How was this patch tested?
Exsiting UTs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42872 from peter-toth/SPARK-45109-fix-log-3.4.

Authored-by: Peter Toth 
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/scala/org/apache/spark/sql/functions.scala |   2 +-
 .../query-tests/explain-results/function_log.explain|   2 +-
 .../resources/query-tests/queries/function_log.json |   2 +-
 .../query-tests/queries/function_log.proto.bin  | Bin 172 -> 171 bytes
 4 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index 29c2e89c537..7a59fa5a3a3 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -2032,7 +2032,7 @@ object functions {
* @group math_funcs
* @since 3.4.0
*/
-  def log(e: Column): Column = Column.fn("log", e)
+  def log(e: Column): Column = Column.fn("ln", e)
 
   /**
* Computes the natural logarithm of the given column.
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_log.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_log.explain
index d3c3743b1ef..66b782ac817 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_log.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_log.explain
@@ -1,2 +1,2 @@
-Project [LOG(E(), b#0) AS LOG(E(), b)#0]
+Project [ln(b#0) AS ln(b)#0]
 +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_log.json
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_log.json
index 1b2d0ed0b14..ababbc52d08 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/queries/function_log.json
+++ 
b/connector/connect/common/src/test/resources/query-tests/queries/function_log.json
@@ -13,7 +13,7 @@
 },
 "expressions": [{
   "unresolvedFunction": {
-"functionName": "log",
+"functionName": "ln",
 "arguments": [{
   "unresolvedAttribute": {
 "unparsedIdentifier": "b"
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_log.proto.bin
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_log.proto.bin
index 548fb480dd2..ecb87a1fc41 100644
Binary files 
a/connector/connect/common/src/test/resources/query-tests/queries/function_log.proto.bin
 and 
b/connector/connect/common/src/test/resources/query-tests/queries/function_log.proto.bin
 differ


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45121][CONNECT][PS] Support `Series.empty` for Spark Connect

2023-09-12 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5d2d9155d24 [SPARK-45121][CONNECT][PS] Support `Series.empty` for 
Spark Connect
5d2d9155d24 is described below

commit 5d2d9155d24ea9a466e1868969dccd4ae1ac7278
Author: Haejoon Lee 
AuthorDate: Tue Sep 12 14:37:52 2023 +0800

[SPARK-45121][CONNECT][PS] Support `Series.empty` for Spark Connect

### What changes were proposed in this pull request?

This PR proposes to support Series.empty for Spark Connect by removing JVM 
dependency.

### Why are the changes needed?

Increase API coverage for Spark Connect.

### Does this PR introduce _any_ user-facing change?

`Series.empty` is available on Spark Connect.

### How was this patch tested?

Added UT.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42877 from itholic/SPARK-45121.

Authored-by: Haejoon Lee 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/pandas/base.py | 2 +-
 python/pyspark/pandas/tests/series/test_series.py | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/pandas/base.py b/python/pyspark/pandas/base.py
index 1cb17de89e8..ef0b51f757d 100644
--- a/python/pyspark/pandas/base.py
+++ b/python/pyspark/pandas/base.py
@@ -505,7 +505,7 @@ class IndexOpsMixin(object, metaclass=ABCMeta):
 >>> ps.DataFrame({}, index=list('abc')).index.empty
 False
 """
-return self._internal.resolved_copy.spark_frame.rdd.isEmpty()
+return self._internal.resolved_copy.spark_frame.isEmpty()
 
 @property
 def hasnans(self) -> bool:
diff --git a/python/pyspark/pandas/tests/series/test_series.py 
b/python/pyspark/pandas/tests/series/test_series.py
index 136d905eb49..aa147aa75cf 100644
--- a/python/pyspark/pandas/tests/series/test_series.py
+++ b/python/pyspark/pandas/tests/series/test_series.py
@@ -113,6 +113,8 @@ class SeriesTestsMixin:
 self.assert_eq(ps.from_pandas(pser_a), pser_a)
 self.assert_eq(ps.from_pandas(pser_b), pser_b)
 
+self.assertTrue(pser_a.empty)
+
 def test_all_null_series(self):
 pser_a = pd.Series([None, None, None], dtype="float64")
 pser_b = pd.Series([None, None, None], dtype="str")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43123][PS] Raise `TypeError` for `DataFrame.interpolate` when all columns are object-dtype

2023-09-12 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3148511a923 [SPARK-43123][PS] Raise `TypeError` for 
`DataFrame.interpolate` when all columns are object-dtype
3148511a923 is described below

commit 3148511a923bf59ea37d8f44e7427cde66f9f167
Author: Haejoon Lee 
AuthorDate: Tue Sep 12 14:36:42 2023 +0800

[SPARK-43123][PS] Raise `TypeError` for `DataFrame.interpolate` when all 
columns are object-dtype

### What changes were proposed in this pull request?

This PR proposes to aise `TypeError` for `DataFrame.interpolate` when all 
columns are object-dtype.

### Why are the changes needed?

To match the behavior of Pandas:
```python
>>> pd.DataFrame({"A": ['a', 'b', 'c'], "B": ['a', 'b', 'c']}).interpolate()
...
TypeError: Cannot interpolate with all object-dtype columns in the 
DataFrame. Try setting at least one column to a numeric dtype.
```
We currently return empty DataFrame instead of raise TypeError:
```python
>>> pd.DataFrame({"A": ['a', 'b', 'c'], "B": ['a', 'b', 'c']}).interpolate()
Empty DataFrame
Columns: []
Index: [0, 1, 2]
```

### Does this PR introduce _any_ user-facing change?

Compute `DataFrame.interpolate` on DataFrame that has all object-dtype 
columns will raise TypeError instead of returning an empty DataFrame.

### How was this patch tested?

Added UT.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42878 from itholic/SPARK-45123.

Authored-by: Haejoon Lee 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/pandas/frame.py| 5 +
 python/pyspark/pandas/tests/test_frame_interpolate.py | 5 +
 2 files changed, 10 insertions(+)

diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index adbef607256..3aebbd65427 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -6097,6 +6097,11 @@ defaultdict(, {'col..., 'col...})]
 if isinstance(psser.spark.data_type, (NumericType, BooleanType)):
 numeric_col_names.append(psser.name)
 
+if len(numeric_col_names) == 0:
+raise TypeError(
+"Cannot interpolate with all object-dtype columns in the 
DataFrame. "
+"Try setting at least one column to a numeric dtype."
+)
 psdf = self[numeric_col_names]
 return psdf._apply_series_op(
 lambda psser: psser._interpolate(
diff --git a/python/pyspark/pandas/tests/test_frame_interpolate.py 
b/python/pyspark/pandas/tests/test_frame_interpolate.py
index 5b5856f7ab8..17c73781f8e 100644
--- a/python/pyspark/pandas/tests/test_frame_interpolate.py
+++ b/python/pyspark/pandas/tests/test_frame_interpolate.py
@@ -53,6 +53,11 @@ class FrameInterpolateTestsMixin:
 with self.assertRaisesRegex(ValueError, "invalid limit_area"):
 psdf.id.interpolate(limit_area="jump")
 
+with self.assertRaisesRegex(
+TypeError, "Cannot interpolate with all object-dtype columns in 
the DataFrame."
+):
+ps.DataFrame({"A": ["a", "b", "c"], "B": ["a", "b", 
"c"]}).interpolate()
+
 def _test_interpolate(self, pobj):
 psobj = ps.from_pandas(pobj)
 self.assert_eq(psobj.interpolate(), pobj.interpolate())


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.4 updated: [SPARK-45075][SQL][3.4] Fix alter table with invalid default value will not report error

2023-09-12 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new ed6c1b33a0a [SPARK-45075][SQL][3.4] Fix alter table with invalid 
default value will not report error
ed6c1b33a0a is described below

commit ed6c1b33a0a763680182b29baedebb241e0139a4
Author: Jia Fan 
AuthorDate: Mon Sep 11 23:28:20 2023 -0700

[SPARK-45075][SQL][3.4] Fix alter table with invalid default value will not 
report error

### What changes were proposed in this pull request?
This is a backporting PR to branch-3.4 from 
https://github.com/apache/spark/pull/42810

Changed the way of assert the error to adapt to 3.4

### Why are the changes needed?
Fix bug on 3.4

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42876 from Hisoka-X/SPARK-45075_followup_3.4_alter_column.

Authored-by: Jia Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connector/catalog/TableChange.java   |  3 +--
 .../plans/logical/v2AlterTableCommands.scala   | 11 +--
 .../spark/sql/connector/AlterTableTests.scala  | 23 ++
 3 files changed, 33 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableChange.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableChange.java
index 609cfab2d56..ebecb6f507e 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableChange.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableChange.java
@@ -696,9 +696,8 @@ public interface TableChange {
 /**
  * Returns the column default value SQL string (Spark SQL dialect). The 
default value literal
  * is not provided as updating column default values does not need to 
back-fill existing data.
- * Null means dropping the column default value.
+ * Empty string means dropping the column default value.
  */
-@Nullable
 public String newDefaultValue() { return newDefaultValue; }
 
 @Override
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
index eb9d45f06ec..b02c4fac12d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
@@ -17,9 +17,9 @@
 
 package org.apache.spark.sql.catalyst.plans.logical
 
-import org.apache.spark.sql.catalyst.analysis.{FieldName, FieldPosition}
+import org.apache.spark.sql.catalyst.analysis.{FieldName, FieldPosition, 
ResolvedFieldName}
 import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
-import org.apache.spark.sql.catalyst.util.TypeUtils
+import org.apache.spark.sql.catalyst.util.{ResolveDefaultColumns, TypeUtils}
 import org.apache.spark.sql.connector.catalog.{TableCatalog, TableChange}
 import org.apache.spark.sql.errors.QueryCompilationErrors
 import org.apache.spark.sql.types.DataType
@@ -228,6 +228,13 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {
+// SPARK-45075: We call 'ResolveDefaultColumns.analyze' here to make 
sure that the default
+// value parses successfully, and return an error otherwise
+val newDataType = 
dataType.getOrElse(column.asInstanceOf[ResolvedFieldName].field.dataType)
+ResolveDefaultColumns.analyze(column.name.last, newDataType, 
newDefaultExpression,
+  "ALTER TABLE ALTER COLUMN")
+  }
   TableChange.updateColumnDefaultValue(colName, newDefaultExpression)
 }
 typeChange.toSeq ++ nullabilityChange ++ commentChange ++ positionChange 
++ defaultValueChange
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala
index 2047212a4ea..8f1faa18933 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala
@@ -366,6 +366,29 @@ trait AlterTableTests extends SharedSparkSession with 
QueryErrorsBase {
 }
   }
 
+  test("SPARK-45075: ALTER COLUMN with invalid default value") {
+withSQLConf(SQLConf.DEFAULT_COLUMN_ALLOWED_PROVIDERS.key -> s"$v2Format, 
") {
+  withTable("t") {
+

[spark] branch master updated: [SPARK-45124][CONNET] Do not use local user ID for Local Relations

2023-09-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 47d801e5e9d [SPARK-45124][CONNET] Do not use local user ID for Local 
Relations
47d801e5e9d is described below

commit 47d801e5e9ded3fb50d274a720ee7874e0b37cc3
Author: Hyukjin Kwon 
AuthorDate: Tue Sep 12 14:59:44 2023 +0900

[SPARK-45124][CONNET] Do not use local user ID for Local Relations

### What changes were proposed in this pull request?

This PR removes the use of `userId` and `sessionId` in 
`CachedLocalRelation` messages and subsequently make `SparkConnectPlanner` use 
the `userId`/`sessionId` of the active session rather than the user-provided 
information.

### Why are the changes needed?

Allowing a fetch of a local relation using user-provided information is a 
potential security risk since this allows users to fetch arbitrary local 
relations.

### Does this PR introduce _any_ user-facing change?

Virtually no. It will ignore the session id or user id that users set (but 
instead use internal ones that users cannot manipulate).

### How was this patch tested?

Manually.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42880 from HyukjinKwon/no-local-user.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  |   2 -
 .../main/protobuf/spark/connect/relations.proto|  10 +-
 .../sql/connect/planner/SparkConnectPlanner.scala  |   2 +-
 python/pyspark/sql/connect/plan.py |   3 -
 python/pyspark/sql/connect/proto/relations_pb2.py  | 160 ++---
 python/pyspark/sql/connect/proto/relations_pb2.pyi |  15 +-
 6 files changed, 87 insertions(+), 105 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
index 7882ea64013..7bd8fa59aea 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -134,8 +134,6 @@ class SparkSession private[sql] (
 } else {
   val hash = client.cacheLocalRelation(arrowData, encoder.schema.json)
   builder.getCachedLocalRelationBuilder
-.setUserId(client.userId)
-.setSessionId(client.sessionId)
 .setHash(hash)
 }
   } else {
diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
index 8001b3cbcfa..f7f1315ede0 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
@@ -400,11 +400,11 @@ message LocalRelation {
 
 // A local relation that has been cached already.
 message CachedLocalRelation {
-  // (Required) An identifier of the user which created the local relation
-  string userId = 1;
-
-  // (Required) An identifier of the Spark SQL session in which the user 
created the local relation.
-  string sessionId = 2;
+  // `userId` and `sessionId` fields are deleted since the server must always 
use the active
+  // session/user rather than arbitrary values provided by the client. It is 
never valid to access
+  // a local relation from a different session/user.
+  reserved 1, 2;
+  reserved "userId", "sessionId";
 
   // (Required) A sha-256 hash of the serialized local relation in proto, see 
LocalRelation.
   string hash = 3;
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 1a63c9fc27c..b8ab5539b30 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -970,7 +970,7 @@ class SparkConnectPlanner(val sessionHolder: SessionHolder) 
extends Logging {
 
   private def transformCachedLocalRelation(rel: proto.CachedLocalRelation): 
LogicalPlan = {
 val blockManager = session.sparkContext.env.blockManager
-val blockId = CacheId(rel.getUserId, rel.getSessionId, rel.getHash)
+val blockId = CacheId(sessionHolder.userId, sessionHolder.sessionId, 
rel.getHash)
 val bytes = blockManager.getLocalBytes(blockId)
 bytes
   .map { blockData =>
diff --git a/python/pyspark/sql/connect/plan.py 
b/python/pyspark/sql/connect/plan.py
index 5e9b4e53dbf..f641cb4b2fe 100644
---

[spark] branch branch-3.5 updated: [SPARK-45124][CONNET] Do not use local user ID for Local Relations

2023-09-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 09b14f0968c [SPARK-45124][CONNET] Do not use local user ID for Local 
Relations
09b14f0968c is described below

commit 09b14f0968cebe0f2c5c9a369935f27d4ea228f6
Author: Hyukjin Kwon 
AuthorDate: Tue Sep 12 14:59:44 2023 +0900

[SPARK-45124][CONNET] Do not use local user ID for Local Relations

### What changes were proposed in this pull request?

This PR removes the use of `userId` and `sessionId` in 
`CachedLocalRelation` messages and subsequently make `SparkConnectPlanner` use 
the `userId`/`sessionId` of the active session rather than the user-provided 
information.

### Why are the changes needed?

Allowing a fetch of a local relation using user-provided information is a 
potential security risk since this allows users to fetch arbitrary local 
relations.

### Does this PR introduce _any_ user-facing change?

Virtually no. It will ignore the session id or user id that users set (but 
instead use internal ones that users cannot manipulate).

### How was this patch tested?

Manually.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42880 from HyukjinKwon/no-local-user.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 47d801e5e9ded3fb50d274a720ee7874e0b37cc3)
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  |   2 -
 .../main/protobuf/spark/connect/relations.proto|  10 +-
 .../sql/connect/planner/SparkConnectPlanner.scala  |   2 +-
 python/pyspark/sql/connect/plan.py |   3 -
 python/pyspark/sql/connect/proto/relations_pb2.py  | 160 ++---
 python/pyspark/sql/connect/proto/relations_pb2.pyi |  15 +-
 6 files changed, 87 insertions(+), 105 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
index 7882ea64013..7bd8fa59aea 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -134,8 +134,6 @@ class SparkSession private[sql] (
 } else {
   val hash = client.cacheLocalRelation(arrowData, encoder.schema.json)
   builder.getCachedLocalRelationBuilder
-.setUserId(client.userId)
-.setSessionId(client.sessionId)
 .setHash(hash)
 }
   } else {
diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
index 8001b3cbcfa..f7f1315ede0 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
@@ -400,11 +400,11 @@ message LocalRelation {
 
 // A local relation that has been cached already.
 message CachedLocalRelation {
-  // (Required) An identifier of the user which created the local relation
-  string userId = 1;
-
-  // (Required) An identifier of the Spark SQL session in which the user 
created the local relation.
-  string sessionId = 2;
+  // `userId` and `sessionId` fields are deleted since the server must always 
use the active
+  // session/user rather than arbitrary values provided by the client. It is 
never valid to access
+  // a local relation from a different session/user.
+  reserved 1, 2;
+  reserved "userId", "sessionId";
 
   // (Required) A sha-256 hash of the serialized local relation in proto, see 
LocalRelation.
   string hash = 3;
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index 2abbacc5a9b..641dfc5dcd3 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -970,7 +970,7 @@ class SparkConnectPlanner(val sessionHolder: SessionHolder) 
extends Logging {
 
   private def transformCachedLocalRelation(rel: proto.CachedLocalRelation): 
LogicalPlan = {
 val blockManager = session.sparkContext.env.blockManager
-val blockId = CacheId(rel.getUserId, rel.getSessionId, rel.getHash)
+val blockId = CacheId(sessionHolder.userId, sessionHolder.sessionId, 
rel.getHash)
 val bytes = blockManager.getLocalBytes(blockId)
 bytes
   .map { blockData =>
diff --git

[spark] branch master updated (ef89b278f8e -> c119b8aec87)

[spark] branch master updated: [SPARK-45096][INFRA] Optimize apt-get install in Dockerfile

[spark] branch master updated: [SPARK-45131][PYTHON][DOCS] Refine docstring of `ceil/ceiling/floor/round/bround`

svn commit: r63958 - /dev/spark/v3.5.0-rc5-bin/ /release/spark/spark-3.5.0/

[spark] branch master updated: [SPARK-45120][UI] Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in UI

svn commit: r63956 - /dev/spark/v3.5.0-rc5-docs/

[spark] branch branch-3.5 updated: [SPARK-44872][CONNECT] Server testing infra and ReattachableExecuteSuite

[spark] branch branch-3.5 updated: [SPARK-45117][SQL] Implement missing otherCopyArgs for the MultiCommutativeOp expression

[spark] branch master updated: [SPARK-45117][SQL] Implement missing otherCopyArgs for the MultiCommutativeOp expression

[spark] branch branch-3.5 updated: Revert "[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3"

[spark] branch master updated: Revert "[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3"

[spark] branch master updated (d8298bffd91 -> 4b96add471d)

[spark] branch branch-3.5 updated: [SPARK-45081][SQL] Encoders.bean does no longer work with read-only properties

[spark] branch master updated: [SPARK-45081][SQL] Encoders.bean does no longer work with read-only properties

[spark] branch master updated: [MINOR][DOCS] Add errors.rst to .gitignore

[spark] branch master updated: [SPARK-44911][SQL] Create hive table with invalid column should return error class

[spark] branch master updated: [SPARK-45092][SQL][UI] Avoid analyzing twice for failed queries

[spark] branch master updated: [SPARK-44915][CORE] Validate checksum of remounted PVC's shuffle data before recovery

[spark] tag v3.5.0 created (now ce5ddad9903)

[spark] branch master updated (6565ae47cae -> d8129f837c4)

[spark] branch master updated (fa2bc21ba1e -> 6565ae47cae)

[spark] branch master updated (d7845da6ddf -> fa2bc21ba1e)

[spark] branch branch-3.5 updated: [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.5.3

[spark] branch master updated: [SPARK-45125][INFRA] Remove dev/github_jira_sync.py in favor of ASF jira_options

[spark] branch branch-3.4 updated: [SPARK-45109][SQL][CONNECT][3.4] Fix log function in Connect

[spark] branch master updated: [SPARK-45121][CONNECT][PS] Support `Series.empty` for Spark Connect

[spark] branch master updated: [SPARK-43123][PS] Raise `TypeError` for `DataFrame.interpolate` when all columns are object-dtype

[spark] branch branch-3.4 updated: [SPARK-45075][SQL][3.4] Fix alter table with invalid default value will not report error

[spark] branch master updated: [SPARK-45124][CONNET] Do not use local user ID for Local Relations

[spark] branch branch-3.5 updated: [SPARK-45124][CONNET] Do not use local user ID for Local Relations

30 matches

Site Navigation

Mail list logo

Footer information