[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

rdblue Tue, 01 May 2018 10:40:30 -0700

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21143
  
    @cloud-fan, union doesn't really help. I already have support for 
mixed-formats working just fine. The format isn't the problem, it is filtering 
(and a similar problem with projection). Parquet allows you to push down 
filters, while Avro doesn't.
    
    Right now, I'm running filters inside my data source to ensure that the 
result always matches pushed filters, which is okay but doesn't use codegen. 
Since we already have a need for per-split filters for residuals, we could do 
something similar in Spark instead of in the data sources and allow each split 
to return a residual. Then Spark would add a codegen'ed filter before 
proceeding with the rest of the physical plan. You might think of it as a 
`ResidualFilter` node, where the filter expression changes, instead of a 
separate physical plan.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

Reply via email to