[ https://issues.apache.org/jira/browse/SPARK-26869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772988#comment-16772988 ]
Valeria Vasylieva commented on SPARK-26869: ------------------------------------------- [~anddonram] you are trying to treat Struct as Tuple in udf, but even if you try to use Row/case class, it will also fail as it is not supported yet. Try to look at [SPARK-12823|https://issues.apache.org/jira/browse/SPARK-12823], it seems to be related. Hope it helps. > UDF with struct requires to have _1 and _2 as struct field names > ---------------------------------------------------------------- > > Key: SPARK-26869 > URL: https://issues.apache.org/jira/browse/SPARK-26869 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0, 2.4.0 > Environment: Ubuntu 18.04.1 LTS > Reporter: Andrés Doncel Ramírez > Priority: Minor > > When using a UDF which has a Seq of tuples as input, the struct field names > need to match "_1" and "_2". The following code illustrates this: > > {code:java} > val df = sc.parallelize(Array( > ("1",3.0), > ("2",4.5), > ("5",2.0) > ) > ).toDF("c1","c2") > val df1=df.agg(collect_list(struct("c1","c2")).as("c3")) > // Changing column names to _1 and _2 when creating the struct > val > df2=df.agg(collect_list(struct(col("c1").as("_1"),col("c2").as("_2"))).as("c3")) > def takeUDF = udf({ (xs: Seq[(String, Double)]) => > xs.take(2) > }) > df1.printSchema > df2.printSchema > df1.withColumn("c4",takeUDF(col("c3"))).show() // this fails > df2.withColumn("c4",takeUDF(col("c3"))).show() // this works > {code} > The first one returns the following exception: > org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(c3)' due to data > type mismatch: argument 1 requires array<struct<_1:string,_2:double>> type, > however, '`c3`' is of array<struct<c1:string,c2:double>> type.;; > While the second works as expected and prints the result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org