Tathagata Das created SPARK-22018:
-------------------------------------

             Summary: Catalyst Optimizer does not preserve top-level metadata 
while collapsing projects
                 Key: SPARK-22018
                 URL: https://issues.apache.org/jira/browse/SPARK-22018
             Project: Spark
          Issue Type: Bug
          Components: Optimizer, Structured Streaming
    Affects Versions: 2.2.0, 2.1.1
            Reporter: Tathagata Das
            Assignee: Tathagata Das


If there are two projects like as follows.
{code}
Project [a_with_metadata#27 AS b#26]
+- Project [a#0 AS a_with_metadata#27]
   +- LocalRelation <empty>, [a#0, b#1]
{code}

Child Project has an output column with a metadata in it, and the parent 
Project has an alias that implicitly forwards the metadata. So this metadata is 
visible for higher operators. Upon applying CollapseProject optimizer rule, the 
metadata is not preserved.

{code}
Project [a#0 AS b#26]
+- LocalRelation <empty>, [a#0, b#1]
{code}

This is incorrect, as downstream operators that expect certain metadata (e.g. 
watermark in structured streaming) to identify certain fields will fail to do 
so.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to