Eyck Troschke created SPARK-52382:
-------------------------------------

             Summary: MapType column with ArrayType key leads to TypeError in 
df.collect
                 Key: SPARK-52382
                 URL: https://issues.apache.org/jira/browse/SPARK-52382
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.5.3
            Reporter: Eyck Troschke


According to the PySpark documentation, it should be possible to have a MapType 
column with ArrayType keys. MapType supports keys of type DataType and 
ArrayType inherits from DataType.
 
When i try that with PySpark 3.5.3, the show() method of the DataFrame works as 
aspected, but the collect() method throws "TypeError: unhashable type: 'list'":
 
from pyspark.sql import SparkSession
from pyspark.sql.types import MapType, ArrayType, StringType
 
schema = MapType(ArrayType(StringType()), StringType())
data = [\{("A", "B"): "foo", ("X", "Y", "Z"): "bar"}]
df = spark.createDataFrame(data, schema)
df.show() # works
df.collect() # throws exception
 
So either the documentation or the behavior seems to be incorrect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to