[ 
https://issues.apache.org/jira/browse/HAWQ-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-886:
---------------------------
    Fix Version/s:     (was: 2.0.1.0-incubating)
                   backlog

> Support PXF filter push down for ORC
> ------------------------------------
>
>                 Key: HAWQ-886
>                 URL: https://issues.apache.org/jira/browse/HAWQ-886
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: PXF
>            Reporter: Shivram Mani
>            Assignee: Shivram Mani
>             Fix For: backlog
>
>
> Currently HAWQ when reading ORC files via PXF (using the default Hive 
> profile) doesn’t push down any of the filter information down to the 
> underlying ORC reader. The only filter that is possible right now is at the 
> level of partition and is generically done for all Hive tables.
> ORC internally contains file level, stripe level and row level statistics 
> including information such as min,max values etc. For more information refer 
> to https://orc.apache.org/docs/indexes.html
> The proposal here is to introduce a new PXF profile optimized for ORC files 
> which leverages these stats to improve the performance of HAWQ queries with 
> predicates. We will also use the Vectorized approach (VectorizedRowBatch) 
> while reading along with SearchArgument to build the filter as opposed to the 
> existing expensive reader which is row based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to