[ https://issues.apache.org/jira/browse/SPARK-29664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963649#comment-16963649 ]
Hyukjin Kwon commented on SPARK-29664: -------------------------------------- We will have to update migration guide at https://github.com/apache/spark/blob/master/docs/pyspark-migration-guide.md and show the workaround ({{df[...]}}) in the docstring of {{getItem}} in PySpark. > Column.getItem behavior is not consistent with Scala version > ------------------------------------------------------------ > > Key: SPARK-29664 > URL: https://issues.apache.org/jira/browse/SPARK-29664 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.0.0 > Reporter: Terry Kim > Priority: Major > > In PySpark, Column.getItem's behavior is different from the Scala version. > For example, > In PySpark: > {code:python} > df = spark.range(2) > map_col = create_map(lit(0), lit(100), lit(1), lit(200)) > df.withColumn("mapped", map_col.getItem(col('id'))).show() > # +---+------+ > # | id|mapped| > # +---+------+ > # | 0| 100| > # | 1| 200| > # +---+------+ > {code} > In Scala: > {code:scala} > val df = spark.range(2) > val map_col = map(lit(0), lit(100), lit(1), lit(200)) > // The following getItem results in the following exception, which is the > right behavior: > // java.lang.RuntimeException: Unsupported literal type class > org.apache.spark.sql.Column id > // at > org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78) > // at org.apache.spark.sql.Column.getItem(Column.scala:856) > // ... 49 elided > df.withColumn("mapped", map_col.getItem(col("id"))).show > // You have to use apply() to match with PySpark's behavior. > df.withColumn("mapped", map_col(col("id"))).show > // +---+------+ > // | id|mapped| > // +---+------+ > // | 0| 100| > // | 1| 200| > // +---+------+ > {code} > Looking at the code for Scala implementation, PySpark's behavior is incorrect > since the argument to getItem becomes `Literal`. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org