rehan243 commented on issue #55575: URL: https://github.com/apache/spark/issues/55575#issuecomment-4333929475
Oh interesting, yeah, we've hit something super similar before when working with UNION views and predicate pushdown in Spark. The issue for us boiled down to how Spark handles attribute resolution when the same column exists in multiple branches of the union—those `exprId`s get tricky, especially when they're mismatched after a projection. The `replaceAlias` bit in your snippet makes sense as the culprit. If the `name` column isn’t in `aliasMap`, Spark won’t rewrite it properly when pushing the filter down, so it ends up pointing to the wrong output. We had to patch our optimization rule to explicitly check passthrough columns in the union branches and ensure they align before applying any pushdown. Also, fwiw, this kind of failure seemed more consistent for us after upgrading to Spark 3.1.x—earlier versions weren’t as aggressive with these rules. Are you already on 3.x? If you’re stuck, one quick workaround might be disabling `PushPredicateThroughNonJoin` in the optimizer for this query, but that’s obviously not ideal long-term. Sounds like a deeper fix would need some extra checks in `replaceAlias`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
