Eyck Troschke created SPARK-52382: ------------------------------------- Summary: MapType column with ArrayType key leads to TypeError in df.collect Key: SPARK-52382 URL: https://issues.apache.org/jira/browse/SPARK-52382 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.3 Reporter: Eyck Troschke
According to the PySpark documentation, it should be possible to have a MapType column with ArrayType keys. MapType supports keys of type DataType and ArrayType inherits from DataType. When i try that with PySpark 3.5.3, the show() method of the DataFrame works as aspected, but the collect() method throws "TypeError: unhashable type: 'list'": from pyspark.sql import SparkSession from pyspark.sql.types import MapType, ArrayType, StringType schema = MapType(ArrayType(StringType()), StringType()) data = [\{("A", "B"): "foo", ("X", "Y", "Z"): "bar"}] df = spark.createDataFrame(data, schema) df.show() # works df.collect() # throws exception So either the documentation or the behavior seems to be incorrect. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org