This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new ded8cdf8d945 [SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover the test case for the number of partitions ded8cdf8d945 is described below commit ded8cdf8d9459e0e5b73c01c8ee41ae54ccd7ac5 Author: Hyukjin Kwon <gurwls...@apache.org> AuthorDate: Tue Mar 26 07:35:49 2024 -0700 [SPARK-47367][PYTHON][CONNECT][TESTS][FOLLOW-UP] Recover the test case for the number of partitions ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/45486 that addresses https://github.com/apache/spark/pull/45486#discussion_r1538753052 review comment to recover the test coverage related to the number of partitions in Python Data Source. ### Why are the changes needed? To restore the test coverage. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Unittest fixed, CI in this PR should verify it. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45720 from HyukjinKwon/SPARK-47367-folliwup. Authored-by: Hyukjin Kwon <gurwls...@apache.org> Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- python/pyspark/sql/tests/test_python_datasource.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/python/pyspark/sql/tests/test_python_datasource.py b/python/pyspark/sql/tests/test_python_datasource.py index f69e1dee1285..d028a210b007 100644 --- a/python/pyspark/sql/tests/test_python_datasource.py +++ b/python/pyspark/sql/tests/test_python_datasource.py @@ -28,6 +28,7 @@ from pyspark.sql.datasource import ( WriterCommitMessage, CaseInsensitiveDict, ) +from pyspark.sql.functions import spark_partition_id from pyspark.sql.types import Row, StructType from pyspark.testing.sqlutils import ( have_pyarrow, @@ -236,10 +237,12 @@ class BasePythonDataSourceTestsMixin: self.spark.dataSource.register(InMemoryDataSource) df = self.spark.read.format("memory").load() + self.assertEqual(df.select(spark_partition_id()).distinct().count(), 3) assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1"), Row(x=2, y="2")]) df = self.spark.read.format("memory").option("num_partitions", 2).load() assertDataFrameEqual(df, [Row(x=0, y="0"), Row(x=1, y="1")]) + self.assertEqual(df.select(spark_partition_id()).distinct().count(), 2) def _get_test_json_data_source(self): import json --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org