Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21143 @cloud-fan, union doesn't really help. I already have support for mixed-formats working just fine. The format isn't the problem, it is filtering (and a similar problem with projection). Parquet allows you to push down filters, while Avro doesn't. Right now, I'm running filters inside my data source to ensure that the result always matches pushed filters, which is okay but doesn't use codegen. Since we already have a need for per-split filters for residuals, we could do something similar in Spark instead of in the data sources and allow each split to return a residual. Then Spark would add a codegen'ed filter before proceeding with the rest of the physical plan. You might think of it as a `ResidualFilter` node, where the filter expression changes, instead of a separate physical plan.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org