[ https://issues.apache.org/jira/browse/SPARK-34830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306655#comment-17306655 ]
Pavel Chernikov commented on SPARK-34830: ----------------------------------------- [~dsolow1], I've recently bumped into another issue with UDF calls and {{transform}} functionality. See, https://issues.apache.org/jira/browse/SPARK-34829 > Some UDF calls inside transform are broken > ------------------------------------------ > > Key: SPARK-34830 > URL: https://issues.apache.org/jira/browse/SPARK-34830 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.1 > Reporter: Daniel Solow > Priority: Major > > Let's say I want to create a UDF to do a simple lookup on a string: > {code:java} > import org.apache.spark.sql.{functions => f} > val M = Map("a" -> "abc", "b" -> "defg") > val BM = spark.sparkContext.broadcast(M) > val LOOKUP = f.udf((s: String) => BM.value.get(s)) > {code} > Now if I have the following dataframe: > {code:java} > val df = Seq( > Tuple1(Seq("a", "b")) > ).toDF("arr") > {code} > and I want to run this UDF over each element in the array, I can do: > {code:java} > df.select(f.transform($"arr", i => LOOKUP(i)).as("arr")).show(false) > {code} > This should show: > {code:java} > +-----------+ > |arr | > +-----------+ > |[abc, defg]| > +-----------+ > {code} > However it actually shows: > {code:java} > +-----------+ > |arr | > +-----------+ > |[def, defg]| > +-----------+ > {code} > It's also broken for SQL (even without DSL). This gives the same result: > {code:java} > spark.udf.register("LOOKUP",(s: String) => BM.value.get(s)) > df.selectExpr("TRANSFORM(arr, a -> LOOKUP(a)) AS arr").show(false) > {code} > Note that "def" is not even in the map I'm using. > This is a big problem because it breaks existing code/UDFs. I noticed this > because the job I ported from 2.4.5 to 3.1.1 seemed to be working, but was > actually producing broken data. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org