[spark] branch master updated: [SPARK-44446][PYTHON] Add checks for expected list type special cases

xinrong Mon, 17 Jul 2023 11:44:24 -0700

This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new e578d466d4e [SPARK-44446][PYTHON] Add checks for expected list type 
special cases
e578d466d4e is described below

commit e578d466d4eae808a8ad5e42681b9e3e87fe6ca7
Author: Amanda Liu <amanda....@databricks.com>
AuthorDate: Mon Jul 17 11:43:05 2023 -0700

    [SPARK-44446][PYTHON] Add checks for expected list type special cases
    
    ### What changes were proposed in this pull request?
    This PR adds handling for special cases when `expected` is type list.
    
    ### Why are the changes needed?
    The change is needed to handle all cases for when `expected` is type list.
    
    ### Does this PR introduce _any_ user-facing change?
    Yes, the PR makes modifications to the user-facing function 
`assertDataFrameEqual`
    
    ### How was this patch tested?
    Added tests to `runtime/python/pyspark/sql/tests/test_utils.py` and 
`runtime/python/pyspark/sql/tests/connect/test_utils.py`
    
    Closes #42023 from asl3/fix-list-support.
    
    Authored-by: Amanda Liu <amanda....@databricks.com>
    Signed-off-by: Xinrong Meng <xinr...@apache.org>
---
 python/pyspark/sql/tests/test_utils.py | 24 ++++++++++++++++++++++++
 python/pyspark/testing/utils.py        | 15 +++++++++++++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/tests/test_utils.py 
b/python/pyspark/sql/tests/test_utils.py
index 5b859ad15a5..eae3f528504 100644
--- a/python/pyspark/sql/tests/test_utils.py
+++ b/python/pyspark/sql/tests/test_utils.py
@@ -1119,6 +1119,30 @@ class UtilsTestsMixin:
         assertDataFrameEqual(df1, df2, checkRowOrder=False)
         assertDataFrameEqual(df1, df2, checkRowOrder=True)
 
+    def test_empty_expected_list(self):
+        df1 = self.spark.range(0, 10).drop("id")
+
+        df2 = []
+
+        assertDataFrameEqual(df1, df2, checkRowOrder=False)
+        assertDataFrameEqual(df1, df2, checkRowOrder=True)
+
+    def test_no_column_expected_list(self):
+        df1 = self.spark.range(0, 10).limit(0)
+
+        df2 = []
+
+        assertDataFrameEqual(df1, df2, checkRowOrder=False)
+        assertDataFrameEqual(df1, df2, checkRowOrder=True)
+
+    def test_empty_no_column_expected_list(self):
+        df1 = self.spark.range(0, 10).drop("id").limit(0)
+
+        df2 = []
+
+        assertDataFrameEqual(df1, df2, checkRowOrder=False)
+        assertDataFrameEqual(df1, df2, checkRowOrder=True)
+
     def test_special_vals(self):
         df1 = self.spark.createDataFrame(
             data=[
diff --git a/python/pyspark/testing/utils.py b/python/pyspark/testing/utils.py
index 21c7b7e4dcd..14db9264209 100644
--- a/python/pyspark/testing/utils.py
+++ b/python/pyspark/testing/utils.py
@@ -349,6 +349,8 @@ def assertDataFrameEqual(
     For checkRowOrder, note that PySpark DataFrame ordering is 
non-deterministic, unless
     explicitly sorted.
 
+    Note that schema equality is checked only when `expected` is a DataFrame 
(not a list of Rows).
+
     For DataFrames with float values, assertDataFrame asserts approximate 
equality.
     Two float values a and b are approximately equal if the following equation 
is True:
 
@@ -362,6 +364,9 @@ def assertDataFrameEqual(
     >>> df1 = spark.createDataFrame(data=[("1", 0.1), ("2", 3.23)], 
schema=["id", "amount"])
     >>> df2 = spark.createDataFrame(data=[("1", 0.109), ("2", 3.23)], 
schema=["id", "amount"])
     >>> assertDataFrameEqual(df1, df2, rtol=1e-1)  # pass, DataFrames are 
approx equal by rtol
+    >>> df1 = spark.createDataFrame(data=[(1, 1000), (2, 3000)], schema=["id", 
"amount"])
+    >>> list_of_rows = [Row(1, 1000), Row(2, 3000)]
+    >>> assertDataFrameEqual(df1, list_of_rows)  # pass, actual and expected 
are equal
     >>> df1 = spark.createDataFrame(
     ...     data=[("1", 1000.00), ("2", 3000.00), ("3", 2000.00)], 
schema=["id", "amount"])
     >>> df2 = spark.createDataFrame(
@@ -415,8 +420,14 @@ def assertDataFrameEqual(
             )
 
     # special cases: empty datasets, datasets with 0 columns
-    if (actual.first() is None and expected.first() is None) or (
-        len(actual.columns) == 0 and len(expected.columns) == 0
+    if (
+        isinstance(expected, DataFrame)
+        and (
+            (actual.first() is None and expected.first() is None)
+            or (len(actual.columns) == 0 and len(expected.columns) == 0)
+        )
+        or isinstance(expected, list)
+        and ((actual.first() is None or len(actual.columns) == 0) and 
len(expected) == 0)
     ):
         return True
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44446][PYTHON] Add checks for expected list type special cases

Reply via email to