[spark] branch master updated: [SPARK-42214][INFRA] Enable infra image build for scheduled job

2023-01-28 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f348d4fb9ff [SPARK-42214][INFRA] Enable infra image build for 
scheduled job
f348d4fb9ff is described below

commit f348d4fb9ffabc490b7c5294cd15eed2a74f2b60
Author: Yikun Jiang 
AuthorDate: Sat Jan 28 18:01:57 2023 +0800

[SPARK-42214][INFRA] Enable infra image build for scheduled job

### What changes were proposed in this pull request?
Enable infra image build for scheduled job.

The branch scheduled job is based on master branch workflow, so we need to 
enable the infra image for master branch / branch (3.4+). (except 3.2/3.3)

### Why are the changes needed?
Enable infra image build for scheduled job.

### Does this PR introduce _any_ user-facing change?
No, infra only

### How was this patch tested?
- CI passed (to make sure master branch job passed)
- Manually review and check the scheduled job after merge:
https://github.com/apache/spark/actions/workflows/build_branch34.yml

Closes #39778 from Yikun/SPARK-42214.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 .github/workflows/build_and_test.yml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 54b3d1d19d4..021566a5b8e 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -58,8 +58,8 @@ jobs:
   required: ${{ steps.set-outputs.outputs.required }}
   image_url: >-
 ${{
-  (inputs.branch == 'master' && 
steps.infra-image-outputs.outputs.image_url)
-  || 'dongjoon/apache-spark-github-action-image:20220207'
+  ((inputs.branch == 'branch-3.2' || inputs.branch == 'branch-3.3') && 
'dongjoon/apache-spark-github-action-image:20220207')
+  || steps.infra-image-outputs.outputs.image_url
 }}
 steps:
 - name: Checkout Spark repository
@@ -268,12 +268,12 @@ jobs:
   infra-image:
 name: "Base image build"
 needs: precondition
-# Currently, only enable docker build from cache for `master` branch jobs
+# Currently, enable docker build from cache for `master` and branch (since 
3.4) jobs
 if: >-
   (fromJson(needs.precondition.outputs.required).pyspark == 'true' ||
   fromJson(needs.precondition.outputs.required).lint == 'true' ||
   fromJson(needs.precondition.outputs.required).sparkr == 'true') &&
-  inputs.branch == 'master'
+  (inputs.branch != 'branch-3.2' && inputs.branch != 'branch-3.3')
 runs-on: ubuntu-latest
 permissions:
   packages: write


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42161][BUILD] Upgrade Apache Arrow to 11.0.0

2023-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new d53cb8eeb1c [SPARK-42161][BUILD] Upgrade Apache Arrow to 11.0.0
d53cb8eeb1c is described below

commit d53cb8eeb1c20e92df937805ceb64ef26a20506b
Author: yangjie01 
AuthorDate: Sat Jan 28 16:48:38 2023 -0800

[SPARK-42161][BUILD] Upgrade Apache Arrow to 11.0.0

### What changes were proposed in this pull request?
This PR aims to upgrade Apache Arrow to 11.0.0.

### Why are the changes needed?
The release note as follows:

- https://arrow.apache.org/release/11.0.0.html

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #39707 from LuciferYang/SPARK-42161.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 43b81b7afd4750bc04299dc14be492700a8157aa)
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 8 
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 
 pom.xml   | 2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 9051387bdf5..a614f07f8f5 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -17,10 +17,10 @@ api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
 api-util/1.0.0-M20//api-util-1.0.0-M20.jar
 arpack/3.0.3//arpack-3.0.3.jar
 arpack_combined_all/0.1//arpack_combined_all-0.1.jar
-arrow-format/10.0.1//arrow-format-10.0.1.jar
-arrow-memory-core/10.0.1//arrow-memory-core-10.0.1.jar
-arrow-memory-netty/10.0.1//arrow-memory-netty-10.0.1.jar
-arrow-vector/10.0.1//arrow-vector-10.0.1.jar
+arrow-format/11.0.0//arrow-format-11.0.0.jar
+arrow-memory-core/11.0.0//arrow-memory-core-11.0.0.jar
+arrow-memory-netty/11.0.0//arrow-memory-netty-11.0.0.jar
+arrow-vector/11.0.0//arrow-vector-11.0.0.jar
 audience-annotations/0.5.0//audience-annotations-0.5.0.jar
 avro-ipc/1.11.1//avro-ipc-1.11.1.jar
 avro-mapred/1.11.1//avro-mapred-1.11.1.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 70057821bab..e6af508a903 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -16,10 +16,10 @@ antlr4-runtime/4.9.3//antlr4-runtime-4.9.3.jar
 aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
 arpack/3.0.3//arpack-3.0.3.jar
 arpack_combined_all/0.1//arpack_combined_all-0.1.jar
-arrow-format/10.0.1//arrow-format-10.0.1.jar
-arrow-memory-core/10.0.1//arrow-memory-core-10.0.1.jar
-arrow-memory-netty/10.0.1//arrow-memory-netty-10.0.1.jar
-arrow-vector/10.0.1//arrow-vector-10.0.1.jar
+arrow-format/11.0.0//arrow-format-11.0.0.jar
+arrow-memory-core/11.0.0//arrow-memory-core-11.0.0.jar
+arrow-memory-netty/11.0.0//arrow-memory-netty-11.0.0.jar
+arrow-vector/11.0.0//arrow-vector-11.0.0.jar
 audience-annotations/0.5.0//audience-annotations-0.5.0.jar
 avro-ipc/1.11.1//avro-ipc-1.11.1.jar
 avro-mapred/1.11.1//avro-mapred-1.11.1.jar
diff --git a/pom.xml b/pom.xml
index 2d6cc8f543f..16c2c4200f5 100644
--- a/pom.xml
+++ b/pom.xml
@@ -220,7 +220,7 @@
 If you are changing Arrow version specification, please check
 ./python/pyspark/sql/pandas/utils.py, and ./python/setup.py too.
 -->
-10.0.1
+11.0.0
 
 org.fusesource.leveldbjni
 6.4.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42161][BUILD] Upgrade Apache Arrow to 11.0.0

2023-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 43b81b7afd4 [SPARK-42161][BUILD] Upgrade Apache Arrow to 11.0.0
43b81b7afd4 is described below

commit 43b81b7afd4750bc04299dc14be492700a8157aa
Author: yangjie01 
AuthorDate: Sat Jan 28 16:48:38 2023 -0800

[SPARK-42161][BUILD] Upgrade Apache Arrow to 11.0.0

### What changes were proposed in this pull request?
This PR aims to upgrade Apache Arrow to 11.0.0.

### Why are the changes needed?
The release note as follows:

- https://arrow.apache.org/release/11.0.0.html

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #39707 from LuciferYang/SPARK-42161.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 8 
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 
 pom.xml   | 2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 9051387bdf5..a614f07f8f5 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -17,10 +17,10 @@ api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
 api-util/1.0.0-M20//api-util-1.0.0-M20.jar
 arpack/3.0.3//arpack-3.0.3.jar
 arpack_combined_all/0.1//arpack_combined_all-0.1.jar
-arrow-format/10.0.1//arrow-format-10.0.1.jar
-arrow-memory-core/10.0.1//arrow-memory-core-10.0.1.jar
-arrow-memory-netty/10.0.1//arrow-memory-netty-10.0.1.jar
-arrow-vector/10.0.1//arrow-vector-10.0.1.jar
+arrow-format/11.0.0//arrow-format-11.0.0.jar
+arrow-memory-core/11.0.0//arrow-memory-core-11.0.0.jar
+arrow-memory-netty/11.0.0//arrow-memory-netty-11.0.0.jar
+arrow-vector/11.0.0//arrow-vector-11.0.0.jar
 audience-annotations/0.5.0//audience-annotations-0.5.0.jar
 avro-ipc/1.11.1//avro-ipc-1.11.1.jar
 avro-mapred/1.11.1//avro-mapred-1.11.1.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 70057821bab..e6af508a903 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -16,10 +16,10 @@ antlr4-runtime/4.9.3//antlr4-runtime-4.9.3.jar
 aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
 arpack/3.0.3//arpack-3.0.3.jar
 arpack_combined_all/0.1//arpack_combined_all-0.1.jar
-arrow-format/10.0.1//arrow-format-10.0.1.jar
-arrow-memory-core/10.0.1//arrow-memory-core-10.0.1.jar
-arrow-memory-netty/10.0.1//arrow-memory-netty-10.0.1.jar
-arrow-vector/10.0.1//arrow-vector-10.0.1.jar
+arrow-format/11.0.0//arrow-format-11.0.0.jar
+arrow-memory-core/11.0.0//arrow-memory-core-11.0.0.jar
+arrow-memory-netty/11.0.0//arrow-memory-netty-11.0.0.jar
+arrow-vector/11.0.0//arrow-vector-11.0.0.jar
 audience-annotations/0.5.0//audience-annotations-0.5.0.jar
 avro-ipc/1.11.1//avro-ipc-1.11.1.jar
 avro-mapred/1.11.1//avro-mapred-1.11.1.jar
diff --git a/pom.xml b/pom.xml
index 5428aed8ad3..1d9eea0e21c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -220,7 +220,7 @@
 If you are changing Arrow version specification, please check
 ./python/pyspark/sql/pandas/utils.py, and ./python/setup.py too.
 -->
-10.0.1
+11.0.0
 
 org.fusesource.leveldbjni
 6.4.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42220][CONNECT][BUILD] Upgrade buf from 1.12.0 to 1.13.1

2023-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0e46106736b4 [SPARK-42220][CONNECT][BUILD] Upgrade buf from 1.12.0 to 
1.13.1
0e46106736b4 is described below

commit 0e46106736b457fcf74dd4316ec8b1c50e41d8d7
Author: panbingkun 
AuthorDate: Sat Jan 28 16:49:42 2023 -0800

[SPARK-42220][CONNECT][BUILD] Upgrade buf from 1.12.0 to 1.13.1

### What changes were proposed in this pull request?
 The pr aims to upgrade buf from 1.12.0 to 1.13.1.

### Why are the changes needed?
https://user-images.githubusercontent.com/15246973/215235014-6bbe2643-b04c-4d10-8d1e-2309969cc686.png";>
https://user-images.githubusercontent.com/15246973/215235022-3e55b906-98fc-4a47-ba14-c3a2604c4e35.png";>
Release Notes: https://github.com/bufbuild/buf/releases
https://github.com/bufbuild/buf/compare/v1.12.0...v1.13.1

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #39776 from panbingkun/SPARK-42220.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml| 2 +-
 python/docs/source/development/contributing.rst | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 021566a5b8e7..b7c6b2b66bfc 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -575,7 +575,7 @@ jobs:
 - name: Install dependencies for Python code generation check
   run: |
 # See more in "Installation" 
https://docs.buf.build/installation#tarball
-curl -LO 
https://github.com/bufbuild/buf/releases/download/v1.12.0/buf-Linux-x86_64.tar.gz
+curl -LO 
https://github.com/bufbuild/buf/releases/download/v1.13.1/buf-Linux-x86_64.tar.gz
 mkdir -p $HOME/buf
 tar -xvzf buf-Linux-x86_64.tar.gz -C $HOME/buf --strip-components 1
 python3.9 -m pip install 'protobuf==3.19.5' 'mypy-protobuf==3.3.0'
diff --git a/python/docs/source/development/contributing.rst 
b/python/docs/source/development/contributing.rst
index 17c90abea68e..0fd6fff21545 100644
--- a/python/docs/source/development/contributing.rst
+++ b/python/docs/source/development/contributing.rst
@@ -120,7 +120,7 @@ Prerequisite
 
 PySpark development requires to build Spark that needs a proper JDK installed, 
etc. See `Building Spark 
`_ for more details.
 
-Note that if you intend to contribute to Spark Connect in Python, ``buf`` 
version ``1.12.0`` is required, see `Buf Installation 
`_ for more details.
+Note that if you intend to contribute to Spark Connect in Python, ``buf`` 
version ``1.13.1`` is required, see `Buf Installation 
`_ for more details.
 
 Conda
 ~


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42220][CONNECT][BUILD] Upgrade buf from 1.12.0 to 1.13.1

2023-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 1cdf9e4b3ff4 [SPARK-42220][CONNECT][BUILD] Upgrade buf from 1.12.0 to 
1.13.1
1cdf9e4b3ff4 is described below

commit 1cdf9e4b3ff48ce5f1781b7ea5d6e4c00a92a5f0
Author: panbingkun 
AuthorDate: Sat Jan 28 16:49:42 2023 -0800

[SPARK-42220][CONNECT][BUILD] Upgrade buf from 1.12.0 to 1.13.1

### What changes were proposed in this pull request?
 The pr aims to upgrade buf from 1.12.0 to 1.13.1.

### Why are the changes needed?
https://user-images.githubusercontent.com/15246973/215235014-6bbe2643-b04c-4d10-8d1e-2309969cc686.png";>
https://user-images.githubusercontent.com/15246973/215235022-3e55b906-98fc-4a47-ba14-c3a2604c4e35.png";>
Release Notes: https://github.com/bufbuild/buf/releases
https://github.com/bufbuild/buf/compare/v1.12.0...v1.13.1

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #39776 from panbingkun/SPARK-42220.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 0e46106736b457fcf74dd4316ec8b1c50e41d8d7)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml| 2 +-
 python/docs/source/development/contributing.rst | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 712443cdec4f..9b8e58131c51 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -576,7 +576,7 @@ jobs:
 - name: Install dependencies for Python code generation check
   run: |
 # See more in "Installation" 
https://docs.buf.build/installation#tarball
-curl -LO 
https://github.com/bufbuild/buf/releases/download/v1.12.0/buf-Linux-x86_64.tar.gz
+curl -LO 
https://github.com/bufbuild/buf/releases/download/v1.13.1/buf-Linux-x86_64.tar.gz
 mkdir -p $HOME/buf
 tar -xvzf buf-Linux-x86_64.tar.gz -C $HOME/buf --strip-components 1
 python3.9 -m pip install 'protobuf==3.19.5' 'mypy-protobuf==3.3.0'
diff --git a/python/docs/source/development/contributing.rst 
b/python/docs/source/development/contributing.rst
index 17c90abea68e..0fd6fff21545 100644
--- a/python/docs/source/development/contributing.rst
+++ b/python/docs/source/development/contributing.rst
@@ -120,7 +120,7 @@ Prerequisite
 
 PySpark development requires to build Spark that needs a proper JDK installed, 
etc. See `Building Spark 
`_ for more details.
 
-Note that if you intend to contribute to Spark Connect in Python, ``buf`` 
version ``1.12.0`` is required, see `Buf Installation 
`_ for more details.
+Note that if you intend to contribute to Spark Connect in Python, ``buf`` 
version ``1.13.1`` is required, see `Buf Installation 
`_ for more details.
 
 Conda
 ~


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-41830][CONNECT][PYTHON][TESTS][FOLLOWUP] Enable parity test `test_sample`

2023-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2fa1d6be8f6 [SPARK-41830][CONNECT][PYTHON][TESTS][FOLLOWUP] Enable 
parity test `test_sample`
2fa1d6be8f6 is described below

commit 2fa1d6be8f6fe9e71f2def743484940b8c4b6dbf
Author: Ruifeng Zheng 
AuthorDate: Sat Jan 28 16:51:40 2023 -0800

[SPARK-41830][CONNECT][PYTHON][TESTS][FOLLOWUP] Enable parity test 
`test_sample`

### What changes were proposed in this pull request?
Enable parity test `test_sample`

### Why are the changes needed?
For test coverage

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
enabled test

Closes #39765 from zhengruifeng/connect_enable_41830.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/connect/test_parity_dataframe.py | 5 -
 python/pyspark/sql/tests/test_dataframe.py| 6 +-
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/python/pyspark/sql/tests/connect/test_parity_dataframe.py 
b/python/pyspark/sql/tests/connect/test_parity_dataframe.py
index db0d727d330..e04119bea9f 100644
--- a/python/pyspark/sql/tests/connect/test_parity_dataframe.py
+++ b/python/pyspark/sql/tests/connect/test_parity_dataframe.py
@@ -95,11 +95,6 @@ class DataFrameParityTests(DataFrameTestsMixin, 
ReusedConnectTestCase):
 def test_same_semantics_error(self):
 super().test_same_semantics_error()
 
-# TODO(SPARK-41830): Fix DataFrame.sample parameters
-@unittest.skip("Fails in Spark Connect, should enable.")
-def test_sample(self):
-super().test_sample()
-
 @unittest.skip("Spark Connect does not support RDD but the tests depend on 
them.")
 def test_toDF_with_schema_string(self):
 super().test_toDF_with_schema_string()
diff --git a/python/pyspark/sql/tests/test_dataframe.py 
b/python/pyspark/sql/tests/test_dataframe.py
index 845cf0f1fbe..0ba0649245c 100644
--- a/python/pyspark/sql/tests/test_dataframe.py
+++ b/python/pyspark/sql/tests/test_dataframe.py
@@ -46,6 +46,7 @@ from pyspark.sql.types import (
 from pyspark.errors import (
 AnalysisException,
 IllegalArgumentException,
+SparkConnectException,
 SparkConnectAnalysisException,
 )
 from pyspark.testing.sqlutils import (
@@ -888,7 +889,10 @@ class DataFrameTestsMixin:
 
 self.assertRaises(TypeError, lambda: 
self.spark.range(1).sample(seed="abc"))
 
-self.assertRaises(IllegalArgumentException, lambda: 
self.spark.range(1).sample(-1.0))
+self.assertRaises(
+(IllegalArgumentException, SparkConnectException),
+lambda: self.spark.range(1).sample(-1.0).count(),
+)
 
 def test_toDF_with_schema_string(self):
 data = [Row(key=i, value=str(i)) for i in range(100)]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-41830][CONNECT][PYTHON][TESTS][FOLLOWUP] Enable parity test `test_sample`

2023-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 5c8d7f25362 [SPARK-41830][CONNECT][PYTHON][TESTS][FOLLOWUP] Enable 
parity test `test_sample`
5c8d7f25362 is described below

commit 5c8d7f25362b945c942f12a0136a314936ce3a51
Author: Ruifeng Zheng 
AuthorDate: Sat Jan 28 16:51:40 2023 -0800

[SPARK-41830][CONNECT][PYTHON][TESTS][FOLLOWUP] Enable parity test 
`test_sample`

### What changes were proposed in this pull request?
Enable parity test `test_sample`

### Why are the changes needed?
For test coverage

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
enabled test

Closes #39765 from zhengruifeng/connect_enable_41830.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 2fa1d6be8f6fe9e71f2def743484940b8c4b6dbf)
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/connect/test_parity_dataframe.py | 5 -
 python/pyspark/sql/tests/test_dataframe.py| 6 +-
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/python/pyspark/sql/tests/connect/test_parity_dataframe.py 
b/python/pyspark/sql/tests/connect/test_parity_dataframe.py
index db0d727d330..e04119bea9f 100644
--- a/python/pyspark/sql/tests/connect/test_parity_dataframe.py
+++ b/python/pyspark/sql/tests/connect/test_parity_dataframe.py
@@ -95,11 +95,6 @@ class DataFrameParityTests(DataFrameTestsMixin, 
ReusedConnectTestCase):
 def test_same_semantics_error(self):
 super().test_same_semantics_error()
 
-# TODO(SPARK-41830): Fix DataFrame.sample parameters
-@unittest.skip("Fails in Spark Connect, should enable.")
-def test_sample(self):
-super().test_sample()
-
 @unittest.skip("Spark Connect does not support RDD but the tests depend on 
them.")
 def test_toDF_with_schema_string(self):
 super().test_toDF_with_schema_string()
diff --git a/python/pyspark/sql/tests/test_dataframe.py 
b/python/pyspark/sql/tests/test_dataframe.py
index 845cf0f1fbe..0ba0649245c 100644
--- a/python/pyspark/sql/tests/test_dataframe.py
+++ b/python/pyspark/sql/tests/test_dataframe.py
@@ -46,6 +46,7 @@ from pyspark.sql.types import (
 from pyspark.errors import (
 AnalysisException,
 IllegalArgumentException,
+SparkConnectException,
 SparkConnectAnalysisException,
 )
 from pyspark.testing.sqlutils import (
@@ -888,7 +889,10 @@ class DataFrameTestsMixin:
 
 self.assertRaises(TypeError, lambda: 
self.spark.range(1).sample(seed="abc"))
 
-self.assertRaises(IllegalArgumentException, lambda: 
self.spark.range(1).sample(-1.0))
+self.assertRaises(
+(IllegalArgumentException, SparkConnectException),
+lambda: self.spark.range(1).sample(-1.0).count(),
+)
 
 def test_toDF_with_schema_string(self):
 data = [Row(key=i, value=str(i)) for i in range(100)]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42226][BUILD] Upgrade `versions-maven-plugin` to 2.14.2

2023-01-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5e9566ad0d5 [SPARK-42226][BUILD] Upgrade `versions-maven-plugin` to 
2.14.2
5e9566ad0d5 is described below

commit 5e9566ad0d51d50f1e0f8b1f03a6a9c0d218e97b
Author: yangjie01 
AuthorDate: Sat Jan 28 16:53:52 2023 -0800

[SPARK-42226][BUILD] Upgrade `versions-maven-plugin` to 2.14.2

### What changes were proposed in this pull request?
This pr aims upgrade `versions-maven-plugin` to 2.14.2

### Why are the changes needed?
New version bring some improvement like [Add a simple cache for 
ComparableVersions
](https://github.com/mojohaus/versions/pull/870)
The full release notes as follows:
- https://github.com/mojohaus/versions/releases/tag/2.14.2

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- GA `Dependencies test` should work normally
- Manually check `./dev/test-dependencies.sh --replace-manifest`, run 
successful

Closes #39784 from LuciferYang/SPARK-42226.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index 1d9eea0e21c..17423a0a176 100644
--- a/pom.xml
+++ b/pom.xml
@@ -177,7 +177,7 @@
   See: SPARK-36547, SPARK-38394.
-->
 4.8.0
-2.14.1
+2.14.2
 
 true
 true


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42224][CONNECT] Migrate `TypeError` into error framework for Spark Connect functions

2023-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dbdb06ccbcd [SPARK-42224][CONNECT] Migrate `TypeError` into error 
framework for Spark Connect functions
dbdb06ccbcd is described below

commit dbdb06ccbcd00d6e3254151e1e27f55cfc796aea
Author: itholic 
AuthorDate: Sun Jan 29 10:03:06 2023 +0900

[SPARK-42224][CONNECT] Migrate `TypeError` into error framework for Spark 
Connect functions

### What changes were proposed in this pull request?

This PR proposes to migrate `TypeError` into error framework for Spark 
Connect functions.

### Why are the changes needed?

To improve errors by leveraging the PySpark error framework for Spark 
Connect functions.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Fixed & added UTs.

Closes #39782 from itholic/SPARK-42224.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/errors/error_classes.py | 27 ---
 python/pyspark/sql/connect/functions.py| 49 ++--
 .../sql/tests/connect/test_connect_function.py | 54 ++
 python/pyspark/sql/tests/test_functions.py | 26 +--
 4 files changed, 123 insertions(+), 33 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index 77f74097880..dd86f06505b 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -21,7 +21,7 @@ ERROR_CLASSES_JSON = """
 {
   "COLUMN_IN_LIST": {
 "message": [
-  " does not allow a column in a list."
+  " does not allow a Column in a list."
 ]
   },
   "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN" : {
@@ -31,27 +31,42 @@ ERROR_CLASSES_JSON = """
   },
   "NOT_A_COLUMN" : {
 "message" : [
-  "Argument `` should be a column, got ."
+  "Argument `` should be a Column, got ."
+]
+  },
+  "NOT_A_DATAFRAME" : {
+"message" : [
+  "Argument `` must be a DataFrame, got ."
 ]
   },
   "NOT_A_STRING" : {
 "message" : [
-  "Argument `` should be a string, got ."
+  "Argument `` should be a str, got ."
+]
+  },
+  "NOT_COLUMN_OR_DATATYPE_OR_STRING" : {
+"message" : [
+  "Argument `` should be a Column or str or DataType, but got 
."
 ]
   },
   "NOT_COLUMN_OR_INTEGER" : {
 "message" : [
-  "Argument `` should be a column or integer, got ."
+  "Argument `` should be a Column or int, got ."
 ]
   },
   "NOT_COLUMN_OR_INTEGER_OR_STRING" : {
 "message" : [
-  "Argument `` should be a column or integer or string, got 
."
+  "Argument `` should be a Column, int or str, got ."
 ]
   },
   "NOT_COLUMN_OR_STRING" : {
 "message" : [
-  "Argument `` should be a column or string, got ."
+  "Argument `` should be a Column or str, got ."
+]
+  },
+  "UNSUPPORTED_NUMPY_ARRAY_SCALAR" : {
+"message" : [
+  "The type of array scalar '' is not supported."
 ]
   },
   "UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION" : {
diff --git a/python/pyspark/sql/connect/functions.py 
b/python/pyspark/sql/connect/functions.py
index 7c21f9280c2..3b3036f3be8 100644
--- a/python/pyspark/sql/connect/functions.py
+++ b/python/pyspark/sql/connect/functions.py
@@ -223,7 +223,10 @@ def lit(col: Any) -> Column:
 return array(*[lit(c) for c in col])
 elif isinstance(col, np.ndarray) and col.ndim == 1:
 if _from_numpy_type(col.dtype) is None:
-raise TypeError("The type of array scalar '%s' is not supported" % 
(col.dtype))
+raise PySparkTypeError(
+error_class="UNSUPPORTED_NUMPY_ARRAY_SCALAR",
+message_parameters={"dtype": col.dtype.name},
+)
 
 # NumpyArrayConverter for Py4J can not support ndarray with int8 
values.
 # Actually this is not a problem for Connect, but here still convert it
@@ -258,7 +261,10 @@ def broadcast(df: "DataFrame") -> "DataFrame":
 from pyspark.sql.connect.dataframe import DataFrame
 
 if not isinstance(df, DataFrame):
-raise TypeError(f"'df' must be a DataFrame, but got 
{type(df).__name__} {df}")
+raise PySparkTypeError(
+error_class="NOT_A_DATAFRAME",
+message_parameters={"arg_name": "df", "arg_type": 
type(df).__name__},
+)
 return df.hint("broadcast")
 
 
@@ -1376,8 +1382,9 @@ def from_json(
 elif isinstance(schema, str):
 _schema = lit(schema)
 else:
-raise TypeError(
-f"schema should be a Column or str or DataType, but got 
{type(schema).__name__}"
+raise PySparkTypeError(
+error_class="NOT_COLUMN_OR_DATATYPE_OR_STRING",
+message_parameters={"arg_na

[spark] branch branch-3.4 updated: [SPARK-42224][CONNECT] Migrate `TypeError` into error framework for Spark Connect functions

2023-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 1923dfb8a71 [SPARK-42224][CONNECT] Migrate `TypeError` into error 
framework for Spark Connect functions
1923dfb8a71 is described below

commit 1923dfb8a71aab50c246505481c54c58a4a57591
Author: itholic 
AuthorDate: Sun Jan 29 10:03:06 2023 +0900

[SPARK-42224][CONNECT] Migrate `TypeError` into error framework for Spark 
Connect functions

### What changes were proposed in this pull request?

This PR proposes to migrate `TypeError` into error framework for Spark 
Connect functions.

### Why are the changes needed?

To improve errors by leveraging the PySpark error framework for Spark 
Connect functions.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Fixed & added UTs.

Closes #39782 from itholic/SPARK-42224.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit dbdb06ccbcd00d6e3254151e1e27f55cfc796aea)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/errors/error_classes.py | 27 ---
 python/pyspark/sql/connect/functions.py| 49 ++--
 .../sql/tests/connect/test_connect_function.py | 54 ++
 python/pyspark/sql/tests/test_functions.py | 26 +--
 4 files changed, 123 insertions(+), 33 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index 77f74097880..dd86f06505b 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -21,7 +21,7 @@ ERROR_CLASSES_JSON = """
 {
   "COLUMN_IN_LIST": {
 "message": [
-  " does not allow a column in a list."
+  " does not allow a Column in a list."
 ]
   },
   "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN" : {
@@ -31,27 +31,42 @@ ERROR_CLASSES_JSON = """
   },
   "NOT_A_COLUMN" : {
 "message" : [
-  "Argument `` should be a column, got ."
+  "Argument `` should be a Column, got ."
+]
+  },
+  "NOT_A_DATAFRAME" : {
+"message" : [
+  "Argument `` must be a DataFrame, got ."
 ]
   },
   "NOT_A_STRING" : {
 "message" : [
-  "Argument `` should be a string, got ."
+  "Argument `` should be a str, got ."
+]
+  },
+  "NOT_COLUMN_OR_DATATYPE_OR_STRING" : {
+"message" : [
+  "Argument `` should be a Column or str or DataType, but got 
."
 ]
   },
   "NOT_COLUMN_OR_INTEGER" : {
 "message" : [
-  "Argument `` should be a column or integer, got ."
+  "Argument `` should be a Column or int, got ."
 ]
   },
   "NOT_COLUMN_OR_INTEGER_OR_STRING" : {
 "message" : [
-  "Argument `` should be a column or integer or string, got 
."
+  "Argument `` should be a Column, int or str, got ."
 ]
   },
   "NOT_COLUMN_OR_STRING" : {
 "message" : [
-  "Argument `` should be a column or string, got ."
+  "Argument `` should be a Column or str, got ."
+]
+  },
+  "UNSUPPORTED_NUMPY_ARRAY_SCALAR" : {
+"message" : [
+  "The type of array scalar '' is not supported."
 ]
   },
   "UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION" : {
diff --git a/python/pyspark/sql/connect/functions.py 
b/python/pyspark/sql/connect/functions.py
index 7c21f9280c2..3b3036f3be8 100644
--- a/python/pyspark/sql/connect/functions.py
+++ b/python/pyspark/sql/connect/functions.py
@@ -223,7 +223,10 @@ def lit(col: Any) -> Column:
 return array(*[lit(c) for c in col])
 elif isinstance(col, np.ndarray) and col.ndim == 1:
 if _from_numpy_type(col.dtype) is None:
-raise TypeError("The type of array scalar '%s' is not supported" % 
(col.dtype))
+raise PySparkTypeError(
+error_class="UNSUPPORTED_NUMPY_ARRAY_SCALAR",
+message_parameters={"dtype": col.dtype.name},
+)
 
 # NumpyArrayConverter for Py4J can not support ndarray with int8 
values.
 # Actually this is not a problem for Connect, but here still convert it
@@ -258,7 +261,10 @@ def broadcast(df: "DataFrame") -> "DataFrame":
 from pyspark.sql.connect.dataframe import DataFrame
 
 if not isinstance(df, DataFrame):
-raise TypeError(f"'df' must be a DataFrame, but got 
{type(df).__name__} {df}")
+raise PySparkTypeError(
+error_class="NOT_A_DATAFRAME",
+message_parameters={"arg_name": "df", "arg_type": 
type(df).__name__},
+)
 return df.hint("broadcast")
 
 
@@ -1376,8 +1382,9 @@ def from_json(
 elif isinstance(schema, str):
 _schema = lit(schema)
 else:
-raise TypeError(
-f"schema should be a Column or str or DataType, but got 
{type(schema).__name__}"
+raise PySpa

[spark] branch master updated: [SPARK-42194][PS] Allow `columns` parameter when creating DataFrame with Series

2023-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 086c8d9d6ce [SPARK-42194][PS] Allow `columns` parameter when creating 
DataFrame with Series
086c8d9d6ce is described below

commit 086c8d9d6ce91974e97ab47aab1cf54974e12bbf
Author: itholic 
AuthorDate: Sun Jan 29 11:34:19 2023 +0900

[SPARK-42194][PS] Allow `columns` parameter when creating DataFrame with 
Series

### What changes were proposed in this pull request?

This PR proposes to allow `columns` parameter when creating `ps.DataFrame` 
with `ps.Series` with limited condition.

### Why are the changes needed?

In pandas, they attach the new column consists with missing values when 
`columns` contains more than 2 columns including valid column:

```python
>>> pser  # pandas Series
0.4270271
0.9045922
0.5997683
Name: x, dtype: int64

>>> pd.DataFrame(pser, columns=["x", "y", "z"])
  xyz
0.427027  1  NaN  NaN
0.904592  2  NaN  NaN
0.599768  3  NaN  NaN
```

But this method is potentially pretty expensive in pandas API on Spark, so 
I guess that's why we currently don't support it.

However, I've seen examples of using the following:

```python
>>> ps.DataFrame(pser, columns=["x"])
  x
0.427027  1
0.904592  2
0.599768  3
```

As shown in the example above, this just works the same as 
`pd.DataFrame(pser)` (without `columns`).

But it fails with `ps.Series` as below:

```python
>>> ps.DataFrame(psser, columns=["x"])  # `psser` is pandas-on-Spark Series
Traceback (most recent call last):
  File "", line 1, in 
  File ".../spark/python/pyspark/pandas/frame.py", line 539, in __init__
assert columns is None
AssertionError
```

In this case, user might just want to clearly state column names in their 
code, so I believe we can allow this rather than raising an `AssertionError`.

### Does this PR introduce _any_ user-facing change?

**Before**
```python
>>> ps.DataFrame(psser, columns=["x"])  # `psser` is pandas-on-Spark Series
Traceback (most recent call last):
  File "", line 1, in 
  File ".../spark/python/pyspark/pandas/frame.py", line 539, in __init__
assert columns is None
AssertionError
```

**After**
```python
>>> ps.DataFrame(psser, columns=["x"])  # `psser` is pandas-on-Spark Series
  x
0.427027  1
0.904592  2
0.599768  3
```

### How was this patch tested?

Added UTs.

Closes #39786 from itholic/SPARK-42194.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/frame.py| 7 ++-
 python/pyspark/pandas/tests/test_dataframe.py | 7 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index a217066eff6..4a6c2119104 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -536,9 +536,14 @@ class DataFrame(Frame, Generic[T]):
 if index is None:
 internal = data._internal
 elif isinstance(data, ps.Series):
-assert columns is None
 assert dtype is None
 assert not copy
+# For pandas compatibility when `columns` contains only one valid 
column.
+if columns is not None:
+assert isinstance(columns, (dict, list, tuple))
+assert len(columns) == 1
+columns = list(columns.keys()) if isinstance(columns, dict) 
else columns
+assert columns[0] == data._internal.data_spark_column_names[0]
 if index is None:
 internal = data.to_frame()._internal
 else:
diff --git a/python/pyspark/pandas/tests/test_dataframe.py 
b/python/pyspark/pandas/tests/test_dataframe.py
index 1b06d321e13..d33c6584f7f 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -90,6 +90,13 @@ class DataFrameTest(ComparisonTestBase, SQLTestUtils):
 psser = ps.from_pandas(pser)
 self.assert_eq(pd.DataFrame(pser), ps.DataFrame(psser))
 
+# check ps.DataFrame(ps.Series) with `columns`
+self.assert_eq(pd.DataFrame(pser, columns=["x"]), ps.DataFrame(psser, 
columns=["x"]))
+self.assert_eq(pd.DataFrame(pser, columns=("x",)), ps.DataFrame(psser, 
columns=("x",)))
+self.assert_eq(
+pd.DataFrame(pser, columns={"x": None}), ps.DataFrame(psser, 
columns={"x": None})
+)
+
 # check psdf[pd.Index]
 pdf, psdf = self.df_pair
 column_mask = pdf.columns.isin(["a", "b"])

[spark] branch branch-3.4 updated: [SPARK-42194][PS] Allow `columns` parameter when creating DataFrame with Series

2023-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new dbedb81ed39 [SPARK-42194][PS] Allow `columns` parameter when creating 
DataFrame with Series
dbedb81ed39 is described below

commit dbedb81ed39ca5561a1907260b84fa8dd96ea825
Author: itholic 
AuthorDate: Sun Jan 29 11:34:19 2023 +0900

[SPARK-42194][PS] Allow `columns` parameter when creating DataFrame with 
Series

### What changes were proposed in this pull request?

This PR proposes to allow `columns` parameter when creating `ps.DataFrame` 
with `ps.Series` with limited condition.

### Why are the changes needed?

In pandas, they attach the new column consists with missing values when 
`columns` contains more than 2 columns including valid column:

```python
>>> pser  # pandas Series
0.4270271
0.9045922
0.5997683
Name: x, dtype: int64

>>> pd.DataFrame(pser, columns=["x", "y", "z"])
  xyz
0.427027  1  NaN  NaN
0.904592  2  NaN  NaN
0.599768  3  NaN  NaN
```

But this method is potentially pretty expensive in pandas API on Spark, so 
I guess that's why we currently don't support it.

However, I've seen examples of using the following:

```python
>>> ps.DataFrame(pser, columns=["x"])
  x
0.427027  1
0.904592  2
0.599768  3
```

As shown in the example above, this just works the same as 
`pd.DataFrame(pser)` (without `columns`).

But it fails with `ps.Series` as below:

```python
>>> ps.DataFrame(psser, columns=["x"])  # `psser` is pandas-on-Spark Series
Traceback (most recent call last):
  File "", line 1, in 
  File ".../spark/python/pyspark/pandas/frame.py", line 539, in __init__
assert columns is None
AssertionError
```

In this case, user might just want to clearly state column names in their 
code, so I believe we can allow this rather than raising an `AssertionError`.

### Does this PR introduce _any_ user-facing change?

**Before**
```python
>>> ps.DataFrame(psser, columns=["x"])  # `psser` is pandas-on-Spark Series
Traceback (most recent call last):
  File "", line 1, in 
  File ".../spark/python/pyspark/pandas/frame.py", line 539, in __init__
assert columns is None
AssertionError
```

**After**
```python
>>> ps.DataFrame(psser, columns=["x"])  # `psser` is pandas-on-Spark Series
  x
0.427027  1
0.904592  2
0.599768  3
```

### How was this patch tested?

Added UTs.

Closes #39786 from itholic/SPARK-42194.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 086c8d9d6ce91974e97ab47aab1cf54974e12bbf)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/frame.py| 7 ++-
 python/pyspark/pandas/tests/test_dataframe.py | 7 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index a217066eff6..4a6c2119104 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -536,9 +536,14 @@ class DataFrame(Frame, Generic[T]):
 if index is None:
 internal = data._internal
 elif isinstance(data, ps.Series):
-assert columns is None
 assert dtype is None
 assert not copy
+# For pandas compatibility when `columns` contains only one valid 
column.
+if columns is not None:
+assert isinstance(columns, (dict, list, tuple))
+assert len(columns) == 1
+columns = list(columns.keys()) if isinstance(columns, dict) 
else columns
+assert columns[0] == data._internal.data_spark_column_names[0]
 if index is None:
 internal = data.to_frame()._internal
 else:
diff --git a/python/pyspark/pandas/tests/test_dataframe.py 
b/python/pyspark/pandas/tests/test_dataframe.py
index 1b06d321e13..d33c6584f7f 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -90,6 +90,13 @@ class DataFrameTest(ComparisonTestBase, SQLTestUtils):
 psser = ps.from_pandas(pser)
 self.assert_eq(pd.DataFrame(pser), ps.DataFrame(psser))
 
+# check ps.DataFrame(ps.Series) with `columns`
+self.assert_eq(pd.DataFrame(pser, columns=["x"]), ps.DataFrame(psser, 
columns=["x"]))
+self.assert_eq(pd.DataFrame(pser, columns=("x",)), ps.DataFrame(psser, 
columns=("x",)))
+self.assert_eq(
+pd.DataFrame(pser, columns={"x": None}), ps.DataFrame(psser, 
columns={"x": None})
+)
+

[spark] branch master updated: [SPARK-41489][SQL] Assign name to _LEGACY_ERROR_TEMP_2415

2023-01-28 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d2fc1992058 [SPARK-41489][SQL] Assign name to _LEGACY_ERROR_TEMP_2415
d2fc1992058 is described below

commit d2fc19920588f2f6c83c31a9519702f9416190fe
Author: itholic 
AuthorDate: Sun Jan 29 08:45:14 2023 +0300

[SPARK-41489][SQL] Assign name to _LEGACY_ERROR_TEMP_2415

### What changes were proposed in this pull request?

This PR proposes to assign name to _LEGACY_ERROR_TEMP_2415, 
"DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN".

### Why are the changes needed?

We should assign proper name to _LEGACY_ERROR_TEMP_*

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*`

Closes #39701 from itholic/LEGACY_2415.

Authored-by: itholic 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 10 +-
 .../apache/spark/sql/catalyst/analysis/CheckAnalysis.scala |  7 ---
 .../spark/sql/catalyst/analysis/AnalysisErrorSuite.scala   |  5 +++--
 .../apache/spark/sql/catalyst/analysis/AnalysisSuite.scala | 14 ++
 .../optimizer/ReplaceNullWithFalseInPredicateSuite.scala   | 11 +++
 5 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index ae766de3e20..936f996f3a4 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -265,6 +265,11 @@
   "Input to  should all be the same type, but it's 
."
 ]
   },
+  "FILTER_NOT_BOOLEAN" : {
+"message" : [
+  "Filter expression  of type  is not a boolean."
+]
+  },
   "HASH_MAP_TYPE" : {
 "message" : [
   "Input to the function  cannot contain elements of the 
\"MAP\" type. In Spark, same maps may have different hashcode, thus hash 
expressions are prohibited on \"MAP\" elements. To restore previous behavior 
set \"spark.sql.legacy.allowHashOnMapType\" to \"true\"."
@@ -5175,11 +5180,6 @@
   "Event time must be defined on a window or a timestamp, but  is 
of type ."
 ]
   },
-  "_LEGACY_ERROR_TEMP_2415" : {
-"message" : [
-  "filter expression '' of type  is not a boolean."
-]
-  },
   "_LEGACY_ERROR_TEMP_2416" : {
 "message" : [
   "join condition '' of type  is not a boolean."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index d5ef71adc4f..276bf714a34 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -355,10 +355,11 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 }
   case f: Filter if f.condition.dataType != BooleanType =>
 f.failAnalysis(
-  errorClass = "_LEGACY_ERROR_TEMP_2415",
+  errorClass = "DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN",
   messageParameters = Map(
-"filter" -> f.condition.sql,
-"type" -> f.condition.dataType.catalogString))
+"sqlExpr" -> f.expressions.map(toSQLExpr).mkString(","),
+"filter" -> toSQLExpr(f.condition),
+"type" -> toSQLType(f.condition.dataType)))
 
   case j @ Join(_, _, _, Some(condition), _) if condition.dataType != 
BooleanType =>
 j.failAnalysis(
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
index faa8c1f4558..56bb8b0ccc2 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
@@ -349,10 +349,11 @@ class AnalysisErrorSuite extends AnalysisTest {
 "UNRESOLVED_COLUMN.WITH_SUGGESTION",
 Map("objectName" -> "`b`", "proposal" -> "`a`, `c`, `a3`"))
 
-  errorTest(
+  errorClassTest(
 "non-boolean filters",
 testRelation.where(Literal(1)),
-"filter" :: "'1'" :: "not a boolean" :: Literal(1).dataType.simpleString 
:: Nil)
+errorClass = "DATATYPE_MISMATCH.FILTER_NOT_BOOLEAN",
+messageParameters = Map("sqlExpr" -> "\"1\"", "filter" -> "\"1\"", "type" 
-> "\"INT\""))
 
   errorTest(
 "non-boolean join conditions",
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sq

[spark] branch master updated: [SPARK-42225][CONNECT] Add `SparkConnectIllegalArgumentException` to handle Spark Connect error precisely

2023-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2842d8aa25e [SPARK-42225][CONNECT] Add 
`SparkConnectIllegalArgumentException` to handle Spark Connect error precisely
2842d8aa25e is described below

commit 2842d8aa25ee16918421850f70fa74bc815ae6e8
Author: itholic 
AuthorDate: Sun Jan 29 16:18:12 2023 +0900

[SPARK-42225][CONNECT] Add `SparkConnectIllegalArgumentException` to handle 
Spark Connect error precisely

### What changes were proposed in this pull request?

This PR proposes to add `SparkConnectIllegalArgumentException`.

### Why are the changes needed?

To handle Spark Connect error precisely by catching 
`IllegalArgumentException` before unexpectedly raise 
`SparkConnectGrpcException`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tested.

Closes #39783 from itholic/SPARK-42225.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/reference/pyspark.errors.rst | 1 +
 python/pyspark/errors/__init__.py   | 2 ++
 python/pyspark/errors/exceptions.py | 6 ++
 python/pyspark/sql/connect/client.py| 5 +
 4 files changed, 14 insertions(+)

diff --git a/python/docs/source/reference/pyspark.errors.rst 
b/python/docs/source/reference/pyspark.errors.rst
index a5e1d78fa62..1d54c6babe0 100644
--- a/python/docs/source/reference/pyspark.errors.rst
+++ b/python/docs/source/reference/pyspark.errors.rst
@@ -42,6 +42,7 @@ Classes
 SparkConnectGrpcException
 SparkConnectParseException
 SparkConnectTempTableAlreadyExistsException
+SparkConnectIllegalArgumentException
 
 
 Methods
diff --git a/python/pyspark/errors/__init__.py 
b/python/pyspark/errors/__init__.py
index e747ed30247..7faa0768a24 100644
--- a/python/pyspark/errors/__init__.py
+++ b/python/pyspark/errors/__init__.py
@@ -35,6 +35,7 @@ from pyspark.errors.exceptions import (  # noqa: F401
 SparkConnectAnalysisException,
 SparkConnectParseException,
 SparkConnectTempTableAlreadyExistsException,
+SparkConnectIllegalArgumentException,
 )
 
 
@@ -55,4 +56,5 @@ __all__ = [
 "SparkConnectAnalysisException",
 "SparkConnectParseException",
 "SparkConnectTempTableAlreadyExistsException",
+"SparkConnectIllegalArgumentException",
 ]
diff --git a/python/pyspark/errors/exceptions.py 
b/python/pyspark/errors/exceptions.py
index 723cd9540d6..a799f4522de 100644
--- a/python/pyspark/errors/exceptions.py
+++ b/python/pyspark/errors/exceptions.py
@@ -371,3 +371,9 @@ class PySparkTypeError(PySparkException, TypeError):
 """
 Wrapper class for TypeError to support error classes.
 """
+
+
+class SparkConnectIllegalArgumentException(SparkConnectGrpcException):
+"""
+Passed an illegal or inappropriate argument from Spark Connect server.
+"""
diff --git a/python/pyspark/sql/connect/client.py 
b/python/pyspark/sql/connect/client.py
index 7409a2a8231..efc970d6a4c 100644
--- a/python/pyspark/sql/connect/client.py
+++ b/python/pyspark/sql/connect/client.py
@@ -42,6 +42,7 @@ from pyspark.errors import (
 SparkConnectAnalysisException,
 SparkConnectParseException,
 SparkConnectTempTableAlreadyExistsException,
+SparkConnectIllegalArgumentException,
 )
 from pyspark.sql.types import (
 DataType,
@@ -626,6 +627,10 @@ class SparkConnectClient(object):
 raise SparkConnectTempTableAlreadyExistsException(
 info.metadata["message"], 
plan=info.metadata["plan"]
 ) from None
+elif reason == "java.lang.IllegalArgumentException":
+message = info.metadata["message"]
+message = message if message != "" else status.message
+raise SparkConnectIllegalArgumentException(message) 
from None
 else:
 raise SparkConnectGrpcException(
 status.message, reason=info.reason


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42225][CONNECT] Add `SparkConnectIllegalArgumentException` to handle Spark Connect error precisely

2023-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 3bfb8116161 [SPARK-42225][CONNECT] Add 
`SparkConnectIllegalArgumentException` to handle Spark Connect error precisely
3bfb8116161 is described below

commit 3bfb8116161998dea762350380d2e3a2b67b74da
Author: itholic 
AuthorDate: Sun Jan 29 16:18:12 2023 +0900

[SPARK-42225][CONNECT] Add `SparkConnectIllegalArgumentException` to handle 
Spark Connect error precisely

This PR proposes to add `SparkConnectIllegalArgumentException`.

To handle Spark Connect error precisely by catching 
`IllegalArgumentException` before unexpectedly raise 
`SparkConnectGrpcException`.

No.

Manually tested.

Closes #39783 from itholic/SPARK-42225.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 2842d8aa25ee16918421850f70fa74bc815ae6e8)
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/reference/pyspark.errors.rst | 1 +
 python/pyspark/errors/__init__.py   | 2 ++
 python/pyspark/errors/exceptions.py | 6 ++
 python/pyspark/sql/connect/client.py| 5 +
 4 files changed, 14 insertions(+)

diff --git a/python/docs/source/reference/pyspark.errors.rst 
b/python/docs/source/reference/pyspark.errors.rst
index a5e1d78fa62..1d54c6babe0 100644
--- a/python/docs/source/reference/pyspark.errors.rst
+++ b/python/docs/source/reference/pyspark.errors.rst
@@ -42,6 +42,7 @@ Classes
 SparkConnectGrpcException
 SparkConnectParseException
 SparkConnectTempTableAlreadyExistsException
+SparkConnectIllegalArgumentException
 
 
 Methods
diff --git a/python/pyspark/errors/__init__.py 
b/python/pyspark/errors/__init__.py
index e747ed30247..7faa0768a24 100644
--- a/python/pyspark/errors/__init__.py
+++ b/python/pyspark/errors/__init__.py
@@ -35,6 +35,7 @@ from pyspark.errors.exceptions import (  # noqa: F401
 SparkConnectAnalysisException,
 SparkConnectParseException,
 SparkConnectTempTableAlreadyExistsException,
+SparkConnectIllegalArgumentException,
 )
 
 
@@ -55,4 +56,5 @@ __all__ = [
 "SparkConnectAnalysisException",
 "SparkConnectParseException",
 "SparkConnectTempTableAlreadyExistsException",
+"SparkConnectIllegalArgumentException",
 ]
diff --git a/python/pyspark/errors/exceptions.py 
b/python/pyspark/errors/exceptions.py
index 723cd9540d6..a799f4522de 100644
--- a/python/pyspark/errors/exceptions.py
+++ b/python/pyspark/errors/exceptions.py
@@ -371,3 +371,9 @@ class PySparkTypeError(PySparkException, TypeError):
 """
 Wrapper class for TypeError to support error classes.
 """
+
+
+class SparkConnectIllegalArgumentException(SparkConnectGrpcException):
+"""
+Passed an illegal or inappropriate argument from Spark Connect server.
+"""
diff --git a/python/pyspark/sql/connect/client.py 
b/python/pyspark/sql/connect/client.py
index 7409a2a8231..efc970d6a4c 100644
--- a/python/pyspark/sql/connect/client.py
+++ b/python/pyspark/sql/connect/client.py
@@ -42,6 +42,7 @@ from pyspark.errors import (
 SparkConnectAnalysisException,
 SparkConnectParseException,
 SparkConnectTempTableAlreadyExistsException,
+SparkConnectIllegalArgumentException,
 )
 from pyspark.sql.types import (
 DataType,
@@ -626,6 +627,10 @@ class SparkConnectClient(object):
 raise SparkConnectTempTableAlreadyExistsException(
 info.metadata["message"], 
plan=info.metadata["plan"]
 ) from None
+elif reason == "java.lang.IllegalArgumentException":
+message = info.metadata["message"]
+message = message if message != "" else status.message
+raise SparkConnectIllegalArgumentException(message) 
from None
 else:
 raise SparkConnectGrpcException(
 status.message, reason=info.reason


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42224][FOLLOWUP] Raise `PySparkTypeError` instead of `TypeError`

2023-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2440b6b7317 [SPARK-42224][FOLLOWUP] Raise `PySparkTypeError` instead 
of `TypeError`
2440b6b7317 is described below

commit 2440b6b731758308d0db88c10edb5b9a71d81c92
Author: itholic 
AuthorDate: Sun Jan 29 16:19:26 2023 +0900

[SPARK-42224][FOLLOWUP] Raise `PySparkTypeError` instead of `TypeError`

### What changes were proposed in this pull request?

This followups for https://github.com/apache/spark/pull/39782.

### Why are the changes needed?

Found incorrect fix in original PR.

We should use `PySparkTypeError` for testing rather than `TypeError`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated UT and manually tested.

Closes #39787 from itholic/SPARK-42225-followup.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/tests/test_functions.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/tests/test_functions.py 
b/python/pyspark/sql/tests/test_functions.py
index 6af45124db8..4cbef4b5387 100644
--- a/python/pyspark/sql/tests/test_functions.py
+++ b/python/pyspark/sql/tests/test_functions.py
@@ -1021,7 +1021,7 @@ class FunctionsTestsMixin:
 with self.assertRaisesRegex((Py4JJavaError, SparkConnectException), 
"200"):
 df.select(assert_true(df.id < 2, df.id * 
1e6)).toDF("val").collect()
 
-with self.assertRaises(TypeError) as pe:
+with self.assertRaises(PySparkTypeError) as pe:
 df.select(assert_true(df.id < 2, 5))
 
 self.check_error(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42224][FOLLOWUP] Raise `PySparkTypeError` instead of `TypeError`

2023-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 5e23ae8c925 [SPARK-42224][FOLLOWUP] Raise `PySparkTypeError` instead 
of `TypeError`
5e23ae8c925 is described below

commit 5e23ae8c9258110935537a46f483fbd52dbc7b79
Author: itholic 
AuthorDate: Sun Jan 29 16:19:26 2023 +0900

[SPARK-42224][FOLLOWUP] Raise `PySparkTypeError` instead of `TypeError`

### What changes were proposed in this pull request?

This followups for https://github.com/apache/spark/pull/39782.

### Why are the changes needed?

Found incorrect fix in original PR.

We should use `PySparkTypeError` for testing rather than `TypeError`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated UT and manually tested.

Closes #39787 from itholic/SPARK-42225-followup.

Authored-by: itholic 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 2440b6b731758308d0db88c10edb5b9a71d81c92)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/tests/test_functions.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/tests/test_functions.py 
b/python/pyspark/sql/tests/test_functions.py
index 6af45124db8..4cbef4b5387 100644
--- a/python/pyspark/sql/tests/test_functions.py
+++ b/python/pyspark/sql/tests/test_functions.py
@@ -1021,7 +1021,7 @@ class FunctionsTestsMixin:
 with self.assertRaisesRegex((Py4JJavaError, SparkConnectException), 
"200"):
 df.select(assert_true(df.id < 2, df.id * 
1e6)).toDF("val").collect()
 
-with self.assertRaises(TypeError) as pe:
+with self.assertRaises(PySparkTypeError) as pe:
 df.select(assert_true(df.id < 2, 5))
 
 self.check_error(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.3 updated (289e65061c1 -> 6cd4ff68527)

2023-01-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


from 289e65061c1 [SPARK-42201][BUILD] `build/sbt` should allow `SBT_OPTS` 
to override JVM memory setting
 add 6cd4ff68527 [SPARK-42168][3.3][SQL][PYTHON][FOLLOW-UP] Test 
FlatMapCoGroupsInPandas with Window function

No new revisions were added by this update.

Summary of changes:
 .../pyspark/sql/tests/test_pandas_cogrouped_map.py | 54 -
 .../exchange/EnsureRequirementsSuite.scala | 56 ++
 2 files changed, 109 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org