[ https://issues.apache.org/jira/browse/SPARK-35079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
koert kuipers updated SPARK-35079: ---------------------------------- Description: i think this is a correctness bug in spark 3.1.1 the behavior is correct in spark 3.0.1 in spark 3.0.1: {code:java} scala> import spark.implicits._ scala> import org.apache.spark.sql.functions._ scala> val x = Seq(Seq("aa", "bb", "cc")).toDF x: org.apache.spark.sql.DataFrame = [value: array<string>] scala> x.select(transform(col("value"), col => udf((_: String).drop(1)).apply(col))).show +---------------------------------------------------+ |transform(value, lambdafunction(UDF(lambda 'x), x))| +---------------------------------------------------+ | [a, b, c]| +---------------------------------------------------+ {code} in spark 3.1.1: {code:java} scala> import spark.implicits._ scala> import org.apache.spark.sql.functions._ scala> val x = Seq(Seq("aa", "bb", "cc")).toDF x: org.apache.spark.sql.DataFrame = [value: array<string>] scala> x.select(transform(col("value"), col => udf((_: String).drop(1)).apply(col))).show +---------------------------------------------------+ |transform(value, lambdafunction(UDF(lambda 'x), x))| +---------------------------------------------------+ | [c, c, c]| +---------------------------------------------------+ {code} was: i think this is a correctness bug in spark 3.1.1 the behavior is correct in spark 3.0.1 in spark 3.0.1: {code:java} scala> import spark.implicits._ scala> import org.apache.spark.sql.functions._ scala> val x = Seq(Seq("11", "22", "33")).toDF x: org.apache.spark.sql.DataFrame = [value: array<string>] scala> x.select(transform(col("value"), col => udf((_: String).drop(1)).apply(col))).show +---------------------------------------------------+ |transform(value, lambdafunction(UDF(lambda 'x), x))| +---------------------------------------------------+ | [1, 2, 3]| +---------------------------------------------------+ {code} in spark 3.1.1: {code:java} scala> import spark.implicits._ scala> import org.apache.spark.sql.functions._ scala> val x = Seq(Seq("11", "22", "33")).toDF x: org.apache.spark.sql.DataFrame = [value: array<string>] scala> x.select(transform(col("value"), col => udf((_: String).drop(1)).apply(col))).show +---------------------------------------------------+ |transform(value, lambdafunction(UDF(lambda 'x), x))| +---------------------------------------------------+ | [3, 3, 3]| +---------------------------------------------------+ {code} > Transform with udf gives incorrect result > ----------------------------------------- > > Key: SPARK-35079 > URL: https://issues.apache.org/jira/browse/SPARK-35079 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.1 > Reporter: koert kuipers > Priority: Minor > > i think this is a correctness bug in spark 3.1.1 > the behavior is correct in spark 3.0.1 > in spark 3.0.1: > {code:java} > scala> import spark.implicits._ > scala> import org.apache.spark.sql.functions._ > scala> val x = Seq(Seq("aa", "bb", "cc")).toDF > x: org.apache.spark.sql.DataFrame = [value: array<string>] > scala> x.select(transform(col("value"), col => udf((_: > String).drop(1)).apply(col))).show > +---------------------------------------------------+ > |transform(value, lambdafunction(UDF(lambda 'x), x))| > +---------------------------------------------------+ > | [a, b, c]| > +---------------------------------------------------+ > {code} > in spark 3.1.1: > {code:java} > scala> import spark.implicits._ > scala> import org.apache.spark.sql.functions._ > scala> val x = Seq(Seq("aa", "bb", "cc")).toDF > x: org.apache.spark.sql.DataFrame = [value: array<string>] > scala> x.select(transform(col("value"), col => udf((_: > String).drop(1)).apply(col))).show > +---------------------------------------------------+ > |transform(value, lambdafunction(UDF(lambda 'x), x))| > +---------------------------------------------------+ > | [c, c, c]| > +---------------------------------------------------+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org