[ https://issues.apache.org/jira/browse/SPARK-52382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eyck Troschke updated SPARK-52382: ---------------------------------- Description: According to the PySpark documentation, it should be possible to have a {{MapType column with ArrayType keys. MapType supports keys of type DataType and ArrayType}} inherits from {{{}DataType{}}}. When i try that with PySpark 3.5.3, the {{show() method of the DataFrame works as aspected, but the collect()}} method throws "{{{}TypeError: unhashable type: 'list'{}}}": {code:java} from pyspark.sql import SparkSession from pyspark.sql.types import MapType, ArrayType, StringType schema = MapType(ArrayType(StringType()), StringType()) data = [\{("A", "B"): "foo", ("X", "Y", "Z"): "bar"}] df = spark.createDataFrame(data, schema) df.show() # works df.collect() # throws exception{code} So either the documentation or the behavior seems to be incorrect. was: According to the PySpark documentation, it should be possible to have a {{{{{}MapType}}column with \{{ArrayType }}keys. \{{MapType }}supports keys of type \{{DataType }}and {{ArrayType{}}} inherits from {{{}DataType{}}}. When i try that with PySpark 3.5.3, the {{show() }}method of the DataFrame works as aspected, but the {{collect()}} method throws "{{{}TypeError: unhashable type: 'list'{}}}": {code:java} from pyspark.sql import SparkSession from pyspark.sql.types import MapType, ArrayType, StringType schema = MapType(ArrayType(StringType()), StringType()) data = [\{("A", "B"): "foo", ("X", "Y", "Z"): "bar"}] df = spark.createDataFrame(data, schema) df.show() # works df.collect() # throws exception{code} So either the documentation or the behavior seems to be incorrect. > MapType column with ArrayType key leads to TypeError in df.collect > ------------------------------------------------------------------ > > Key: SPARK-52382 > URL: https://issues.apache.org/jira/browse/SPARK-52382 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.5.3 > Reporter: Eyck Troschke > Priority: Major > > According to the PySpark documentation, it should be possible to have a > {{MapType column with ArrayType keys. MapType supports keys of type DataType > and ArrayType}} inherits from {{{}DataType{}}}. > > When i try that with PySpark 3.5.3, the {{show() method of the DataFrame > works as aspected, but the collect()}} method throws "{{{}TypeError: > unhashable type: 'list'{}}}": > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql.types import MapType, ArrayType, StringType > > schema = MapType(ArrayType(StringType()), StringType()) > data = [\{("A", "B"): "foo", ("X", "Y", "Z"): "bar"}] > df = spark.createDataFrame(data, schema) > df.show() # works > df.collect() # throws exception{code} > > So either the documentation or the behavior seems to be incorrect. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org