[ https://issues.apache.org/jira/browse/SPARK-34982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kumaresh AK updated SPARK-34982: -------------------------------- Priority: Minor (was: Major) > Pyspark asDict() returns wrong child field for nested dataframe > --------------------------------------------------------------- > > Key: SPARK-34982 > URL: https://issues.apache.org/jira/browse/SPARK-34982 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.0.1, 3.0.2 > Environment: Tested with EMR 6.2.0. python: 3.8.5 > Also Tested with local pyspark on windows. v: 3.0.1. python: 3.8.5 > Reporter: Kumaresh AK > Priority: Minor > Attachments: SPARK-34982.py > > > Hello! I upgraded a job to Spark 3.0.1 (from 2.4.4) and encountered this > issue. The job uses asDict(True) in pyspark. I reproduced the issue with a > concise schema and code. Consider this example schema: > {code:java} > root > |-- id: integer (nullable = false) > |-- struct_1: struct (nullable = true) > | |-- array_1_1: array (nullable = true) > | | |-- element: string (containsNull = false) > |-- struct_2: struct (nullable = true) > | |-- array_2_1: array (nullable = true) > | | |-- element: string (containsNull = false){code} > I created 100 rows with the above schema filled it with some numbers and > checked the row.asDict(True) against the input. For some rows > {code:java} > struct_1.array_1_1{code} > is missing. Instead I get > {code:java} > struct_1.array_2_1{code} > And I also observe this happens when array_1_1 is null. Example assert > failure: > {code:java} > AssertionError: {'id': 7, 'struct_1': {'array_2_1': None}, 'struct_2': > {'array_2_1': None}} != {'id': 7, 'struct_1': {'array_1_1': None}, > 'struct_2': {'array_2_1': None}} > {code} > I have attached a minimal script that reproduces this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org