This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new b2140d0f25d8 [SPARK-48248][PYTHON] Fix nested array to respect legacy 
conf of inferArrayTypeFromFirstElement
b2140d0f25d8 is described below

commit b2140d0f25d81e64a968df83c5da5089051acaac
Author: Hyukjin Kwon <gurwls...@apache.org>
AuthorDate: Mon May 13 17:15:28 2024 +0900

    [SPARK-48248][PYTHON] Fix nested array to respect legacy conf of 
inferArrayTypeFromFirstElement
    
    ### What changes were proposed in this pull request?
    
    This PR fixes a bug that does not respect 
`spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled` in nested 
arrays, introduced by https://github.com/apache/spark/pull/36545.
    
    ### Why are the changes needed?
    
    To have a way to restore the original behaviour.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, it fixes the regression when 
`spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled` is set to 
`True`.
    
    ### How was this patch tested?
    
    Unittest added.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #46548 from HyukjinKwon/SPARK-48248.
    
    Authored-by: Hyukjin Kwon <gurwls...@apache.org>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/sql/tests/test_types.py |  7 +++++++
 python/pyspark/sql/types.py            | 18 ++++++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/tests/test_types.py 
b/python/pyspark/sql/tests/test_types.py
index 9c931d861c18..bd99804ec565 100644
--- a/python/pyspark/sql/tests/test_types.py
+++ b/python/pyspark/sql/tests/test_types.py
@@ -1621,6 +1621,13 @@ class TypesTestsMixin:
                 StringType("UTF8_BINARY_LCASE"),
             )
 
+    def test_infer_array_element_type_with_struct(self):
+        # SPARK-48248: Nested array to respect legacy conf of 
inferArrayTypeFromFirstElement
+        with self.sql_conf(
+            
{"spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled": True}
+        ):
+            self.assertEqual([[1, None]], self.spark.createDataFrame([[[[1, 
"a"]]]]).first()[0])
+
 
 class DataTypeTests(unittest.TestCase):
     # regression test for SPARK-6055
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 41be12620fd5..fbd4987713e2 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -1951,13 +1951,27 @@ def _infer_type(
         if len(obj) > 0:
             if infer_array_from_first_element:
                 return ArrayType(
-                    _infer_type(obj[0], infer_dict_as_struct, 
prefer_timestamp_ntz), True
+                    _infer_type(
+                        obj[0],
+                        infer_dict_as_struct,
+                        infer_array_from_first_element,
+                        prefer_timestamp_ntz,
+                    ),
+                    True,
                 )
             else:
                 return ArrayType(
                     reduce(
                         _merge_type,
-                        (_infer_type(v, infer_dict_as_struct, 
prefer_timestamp_ntz) for v in obj),
+                        (
+                            _infer_type(
+                                v,
+                                infer_dict_as_struct,
+                                infer_array_from_first_element,
+                                prefer_timestamp_ntz,
+                            )
+                            for v in obj
+                        ),
                     ),
                     True,
                 )


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to