[ https://issues.apache.org/jira/browse/SPARK-52033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-52033: ----------------------------------- Labels: pull-request-available (was: ) > Bug in Generate node when child node has multiple copies of the same attribute > ------------------------------------------------------------------------------ > > Key: SPARK-52033 > URL: https://issues.apache.org/jira/browse/SPARK-52033 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0, 4.0.0 > Reporter: Harsh Motwani > Priority: Major > Labels: pull-request-available > > In a generate node, when the child output has multiple copies of the same > attribute but the node output has a different number of copies of this > attribute, the generate node breaks in codegen, and returns the wrong result > in non-codegen mode. This can be emulated using a SQL UDF and a `lateral view > explode` > {code:java} > sql("""create or replace temporary function spark_func (params array<struct<x > int, y int>>) > | returns STRUCT<a: int, b: int> LANGUAGE SQL > | return (select ns from ( > | SELECT try_divide(SUM(item.x * item.y), SUM(item.x * item.x)) > AS beta1, > | NAMED_STRUCT('a', beta1,'b', beta1) ns > | FROM (SELECT params) LATERAL VIEW EXPLODE(params) AS item LIMIT > 1));""".stripMargin) > sql("""select spark_func(collect_list(NAMED_STRUCT('x', 1, 'y', 1))) as > result;""").collect() > {code} > This code goes through an assertion failure in Codegen > [here|https://github.com/harshmotw-db/spark/blob/921eba838bf1e88b5e455ee72e8edad94b71f00c/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L156]. > When codegen is disabled, it returns {null, null} even though the correct > output is {1, 1} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org