[ https://issues.apache.org/jira/browse/HAWQ-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shivram Mani updated HAWQ-886: ------------------------------ Description: Currently HAWQ when reading ORC files via PXF (using the default Hive profile) doesn’t push down any of the filter information down to the underlying ORC reader. The only filter that is possible right now is at the level of partition and is generically done for all Hive tables. ORC internally contains file level, stripe level and row level statistics including information such as min,max values etc. For more information refer to https://orc.apache.org/docs/indexes.html The proposal here is to introduce a new PXF profile optimized for ORC files which leverages these stats to improve the performance of HAWQ queries with predicates. We will also use the Vectorized approach while reading as opposed to the existing reader which is row based on more expensive. was: Currently HAWQ when reading ORC files via PXF (using the default Hive profile) doesn’t pass any of the filter information down. The only filter that is possible right now is at the level of partition and is generically done for all Hive tables. ORC internally contains file level, stripe level and row level statistics including information such as min,max values etc. For more information refer to https://orc.apache.org/docs/indexes.html The proposal here is to possibly introduce a new profile optimized for ORC files and to leverage these stats to improve the performance of HAWQ queries with predicates. We will also use the Vectorized approach while reading as opposed to the existing reader which is row based on more expensive. > Support PXF filter push down for ORC > ------------------------------------ > > Key: HAWQ-886 > URL: https://issues.apache.org/jira/browse/HAWQ-886 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF > Reporter: Shivram Mani > Assignee: Shivram Mani > Fix For: 2.1.0 > > > Currently HAWQ when reading ORC files via PXF (using the default Hive > profile) doesn’t push down any of the filter information down to the > underlying ORC reader. The only filter that is possible right now is at the > level of partition and is generically done for all Hive tables. > ORC internally contains file level, stripe level and row level statistics > including information such as min,max values etc. For more information refer > to https://orc.apache.org/docs/indexes.html > The proposal here is to introduce a new PXF profile optimized for ORC files > which leverages these stats to improve the performance of HAWQ queries with > predicates. We will also use the Vectorized approach while reading as opposed > to the existing reader which is row based on more expensive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)