[ https://issues.apache.org/jira/browse/HAWQ-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Goden Yao updated HAWQ-886: --------------------------- Fix Version/s: (was: 2.0.1.0-incubating) backlog > Support PXF filter push down for ORC > ------------------------------------ > > Key: HAWQ-886 > URL: https://issues.apache.org/jira/browse/HAWQ-886 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF > Reporter: Shivram Mani > Assignee: Shivram Mani > Fix For: backlog > > > Currently HAWQ when reading ORC files via PXF (using the default Hive > profile) doesn’t push down any of the filter information down to the > underlying ORC reader. The only filter that is possible right now is at the > level of partition and is generically done for all Hive tables. > ORC internally contains file level, stripe level and row level statistics > including information such as min,max values etc. For more information refer > to https://orc.apache.org/docs/indexes.html > The proposal here is to introduce a new PXF profile optimized for ORC files > which leverages these stats to improve the performance of HAWQ queries with > predicates. We will also use the Vectorized approach (VectorizedRowBatch) > while reading along with SearchArgument to build the filter as opposed to the > existing expensive reader which is row based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)