[jira] [Commented] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677235#comment-17677235 ] jiaan.geng commented on SPARK-42032: After my investigation, the fact is the result of connect is the same as Dataset API. This is a bug of pyspark. cc [~podongfeng][~gurwls223] > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677189#comment-17677189 ] Apache Spark commented on SPARK-42032: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39600 > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677142#comment-17677142 ] Ruifeng Zheng commented on SPARK-42032: --- Have an offline discussion with [~beliefer], Spark Connect has the same behavior as Scala Dataset API, which is different from PySpark in some cases: Dataset API {code:java} scala> spark.createDataFrame(Seq((1, Map("foo" -> -2.0, "bar" -> 2.0.show(100, 100) +---+-+ | _1| _2| +---+-+ | 1|{foo -> -2.0, bar -> 2.0}| +---+-+ {code} PySpark: {code:java} In [2]: spark.createDataFrame([(1, {"foo": -2.0, "bar": 2.0})]).show(100, 100) +---+-+ | _1| _2| +---+-+ | 1|{bar -> 2.0, foo -> -2.0}| +---+-+ {code} this should not be a bug in Connect. > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42032) Map data show in different order
[ https://issues.apache.org/jira/browse/SPARK-42032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677106#comment-17677106 ] jiaan.geng commented on SPARK-42032: This issue duplicated with https://issues.apache.org/jira/browse/SPARK-41988 I'm doing now. > Map data show in different order > > > Key: SPARK-42032 > URL: https://issues.apache.org/jira/browse/SPARK-42032 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > not sure whether this needs to be fixed: > {code:java} > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1623, in pyspark.sql.connect.functions.transform_keys > Failed example: > df.select(transform_keys( > "data", lambda k, _: upper(k)).alias("data_upper") > ).show(truncate=False) > Expected: > +-+ > |data_upper | > +-+ > |{BAR -> 2.0, FOO -> -2.0}| > +-+ > Got: > +-+ > |data_upper | > +-+ > |{FOO -> -2.0, BAR -> 2.0}| > +-+ > > ** > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/functions.py", > line 1630, in pyspark.sql.connect.functions.transform_values > Failed example: > df.select(transform_values( > "data", lambda k, v: when(k.isin("IT", "OPS"), v + 10.0).otherwise(v) > ).alias("new_data")).show(truncate=False) > Expected: > +---+ > |new_data | > +---+ > |{OPS -> 34.0, IT -> 20.0, SALES -> 2.0}| > +---+ > Got: > +---+ > |new_data | > +---+ > |{IT -> 20.0, SALES -> 2.0, OPS -> 34.0}| > +---+ > > ** >1 of 2 in pyspark.sql.connect.functions.transform_keys >1 of 2 in pyspark.sql.connect.functions.transform_values > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org