Harsh Motwani created SPARK-52033: ------------------------------------- Summary: Bug in Generate node when child node has multiple copies of the same attribute Key: SPARK-52033 URL: https://issues.apache.org/jira/browse/SPARK-52033 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 4.0.0 Reporter: Harsh Motwani
In a generate node, when the child output has multiple copies of the same attribute but the node output has a different number of copies of this attribute, the generate node breaks in codegen, and returns the wrong result in non-codegen mode. This can be emulated using a SQL UDF and a `lateral view explode` {code:java} sql("""create or replace temporary function spark_func (params array<struct<x int, y int>>) | returns STRUCT<a: int, b: int> LANGUAGE SQL | return (select ns from ( | SELECT try_divide(SUM(item.x * item.y), SUM(item.x * item.x)) AS beta1, | NAMED_STRUCT('a', beta1,'b', beta1) ns | FROM (SELECT params) LATERAL VIEW EXPLODE(params) AS item LIMIT 1));""".stripMargin) sql("""select spark_func(collect_list(NAMED_STRUCT('x', 1, 'y', 1))) as result;""").collect() {code} This code goes through an assertion failure in Codegen [here|https://github.com/harshmotw-db/spark/blob/921eba838bf1e88b5e455ee72e8edad94b71f00c/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L156]. When codegen is disabled, it returns {null, null} even though the correct output is {1, 1} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org