[ https://issues.apache.org/jira/browse/SPARK-27619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-27619: ---------------------------------- Target Version/s: 3.0.0 Priority: Blocker (was: Major) > MapType should be prohibited in hash expressions > ------------------------------------------------ > > Key: SPARK-27619 > URL: https://issues.apache.org/jira/browse/SPARK-27619 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0 > Reporter: Josh Rosen > Priority: Blocker > Labels: correctness > > Spark currently allows MapType expressions to be used as input to hash > expressions, but I think that this should be prohibited because Spark SQL > does not support map equality. > Currently, Spark SQL's map hashcodes are sensitive to the insertion order of > map elements: > {code:java} > val a = spark.createDataset(Map(1->1, 2->2) :: Nil) > val b = spark.createDataset(Map(2->2, 1->1) :: Nil) > // Demonstration of how Scala Map equality is unaffected by insertion order: > assert(Map(1->1, 2->2).hashCode() == Map(2->2, 1->1).hashCode()) > assert(Map(1->1, 2->2) == Map(2->2, 1->1)) > assert(a.first() == b.first()) > // In contrast, this will print two different hashcodes: > println(Seq(a, b).map(_.selectExpr("hash(*)").first())){code} > This behavior might be surprising to Scala developers. > I think there's precedence for banning the use of MapType here because we > already prohibit MapType in aggregation / joins / equality comparisons > (SPARK-9415) and set operations (SPARK-19893). > If we decide that we want this to be an error then it might also be a good > idea to add a {{spark.sql.legacy}} flag as an escape-hatch to re-enable the > old and buggy behavior (in case applications were relying on it in cases > where it just so happens to be safe-by-accident (e.g. maps which only have > one entry)). > Alternatively, we could support hashing here if we implemented support for > comparable map types (SPARK-18134). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org