[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

mallman Fri, 24 Aug 2018 10:26:28 -0700

Github user mallman commented on the issue:

    https://github.com/apache/spark/pull/21320
  
    Thanks everyone for your contributions, support and patience. It's been a 
journey and a half, and I'm excited for the future. I will open a follow-on PR 
to address the current known failure scenario (see ignored test) in this patch, 
and we can discuss if/how we can get it into 2.4 as well.
    
    I know there are many early adopters of this patch and #16578. Bug reports 
will continue to be very helpful.
    
    Beyond this patch, there are many possibilities for widening the scope of 
schema pruning. As part of our review process, we've pared the scope of this 
capability to just projection. IMHO, the first limitation we should address 
post 2.4 is supporting pruning with query filters of nested fields ("where" 
clauses). Joins, aggregations and window queries would be powerful enhancements 
as well, bringing the scope of schema pruning to analytic queries.
    
    I believe all of the additional features VideoAmp has implemented for 
schema pruning are independent of the underlying column store. Future 
enhancements should be automagically inherited by any column store that 
implements functionality analogous to `ParquetSchemaPruning.scala`. This should 
widen not just the audience that can be reached, but the developer community 
that can contribute and review.
    
    Thanks again.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

Reply via email to