Harsh Motwani created SPARK-52033:
-------------------------------------

             Summary: Bug in Generate node when child node has multiple copies 
of the same attribute
                 Key: SPARK-52033
                 URL: https://issues.apache.org/jira/browse/SPARK-52033
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.0, 4.0.0
            Reporter: Harsh Motwani


In a generate node, when the child output has multiple copies of the same 
attribute but the node output has a different number of copies of this 
attribute, the generate node breaks in codegen, and returns the wrong result in 
non-codegen mode. This can be emulated using a SQL UDF and a `lateral view 
explode`


{code:java}
sql("""create or replace temporary function spark_func (params array<struct<x 
int, y int>>)
            | returns STRUCT<a: int, b: int> LANGUAGE SQL
            | return (select ns from (
            | SELECT try_divide(SUM(item.x * item.y), SUM(item.x * item.x)) AS 
beta1,
            | NAMED_STRUCT('a', beta1,'b', beta1) ns
            | FROM (SELECT params) LATERAL VIEW EXPLODE(params) AS item LIMIT 
1));""".stripMargin)

sql("""select spark_func(collect_list(NAMED_STRUCT('x', 1, 'y', 1))) as 
result;""").collect()
{code}

This code goes through an assertion failure in Codegen 
[here|https://github.com/harshmotw-db/spark/blob/921eba838bf1e88b5e455ee72e8edad94b71f00c/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L156].
 When codegen is disabled, it returns {null, null} even though the correct 
output is {1, 1}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to