Eyck Troschke created SPARK-52382:
-------------------------------------
Summary: MapType column with ArrayType key leads to TypeError in
df.collect
Key: SPARK-52382
URL: https://issues.apache.org/jira/browse/SPARK-52382
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 3.5.3
Reporter: Eyck Troschke
According to the PySpark documentation, it should be possible to have a MapType
column with ArrayType keys. MapType supports keys of type DataType and
ArrayType inherits from DataType.
When i try that with PySpark 3.5.3, the show() method of the DataFrame works as
aspected, but the collect() method throws "TypeError: unhashable type: 'list'":
from pyspark.sql import SparkSession
from pyspark.sql.types import MapType, ArrayType, StringType
schema = MapType(ArrayType(StringType()), StringType())
data = [\{("A", "B"): "foo", ("X", "Y", "Z"): "bar"}]
df = spark.createDataFrame(data, schema)
df.show() # works
df.collect() # throws exception
So either the documentation or the behavior seems to be incorrect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]