rehan243 commented on issue #55575:
URL: https://github.com/apache/spark/issues/55575#issuecomment-4333929475

   Oh interesting, yeah, we've hit something super similar before when working 
with UNION views and predicate pushdown in Spark. The issue for us boiled down 
to how Spark handles attribute resolution when the same column exists in 
multiple branches of the union—those `exprId`s get tricky, especially when 
they're mismatched after a projection.
   
   The `replaceAlias` bit in your snippet makes sense as the culprit. If the 
`name` column isn’t in `aliasMap`, Spark won’t rewrite it properly when pushing 
the filter down, so it ends up pointing to the wrong output. We had to patch 
our optimization rule to explicitly check passthrough columns in the union 
branches and ensure they align before applying any pushdown. 
   
   Also, fwiw, this kind of failure seemed more consistent for us after 
upgrading to Spark 3.1.x—earlier versions weren’t as aggressive with these 
rules. Are you already on 3.x? If you’re stuck, one quick workaround might be 
disabling `PushPredicateThroughNonJoin` in the optimizer for this query, but 
that’s obviously not ideal long-term. Sounds like a deeper fix would need some 
extra checks in `replaceAlias`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to