[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677142#comment-17677142 ]
Ruifeng Zheng commented on SPARK-42032: --------------------------------------- Have an offline discussion with [~beliefer], Spark Connect has the same behavior as Scala Dataset API, which is different from PySpark in some cases: Dataset API {code:java} scala> spark.createDataFrame(Seq((1, Map("foo" -> -2.0, "bar" -> 2.0)))).show(100, 100) +---+-------------------------+ | _1| _2| +---+-------------------------+ | 1|{foo -> -2.0, bar -> 2.0}| +---+-------------------------+ {code} PySpark: {code:java} In [2]: spark.createDataFrame([(1, {"foo": -2.0, "bar": 2.0})]).show(100, 100) +---+-------------------------+ | _1| _2| +---+-------------------------+ | 1|{bar -> 2.0, foo -> -2.0}| +---+-------------------------+ {code} this should not be a bug in Connect. > Map data show in different order > -------------------------------- > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark > Affects Versions: 3.4.0 > Reporter: Ruifeng Zheng > Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ********************************************************************** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-------------------------+ > |data_upper | > +-------------------------+ > |{BAR -> 2.0, FOO -> -2.0}| > +-------------------------+ > Got: > +-------------------------+ > |data_upper | > +-------------------------+ > |{FOO -> -2.0, BAR -> 2.0}| > +-------------------------+ > <BLANKLINE> > ********************************************************************** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---------------------------------------+ > |new_data | > +---------------------------------------+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---------------------------------------+ > Got: > +---------------------------------------+ > |new_data | > +---------------------------------------+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---------------------------------------+ > <BLANKLINE> > ********************************************************************** > 1 of 2 in pyspark.sql.connect.functions.transform_keys > 1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org