[ 
https://issues.apache.org/jira/browse/HAWQ-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivram Mani updated HAWQ-886:
------------------------------
    Description: 
Currently HAWQ when reading ORC files via PXF (using the default Hive profile) 
doesn’t push down any of the filter information down to the underlying ORC 
reader. The only filter that is possible right now is at the level of partition 
and is generically done for all Hive tables.

ORC internally contains file level, stripe level and row level statistics 
including information such as min,max values etc. For more information refer to 
https://orc.apache.org/docs/indexes.html

The proposal here is to introduce a new PXF profile optimized for ORC files 
which leverages these stats to improve the performance of HAWQ queries with 
predicates. We will also use the Vectorized approach while reading as opposed 
to the existing reader which is row based on more expensive.


  was:
Currently HAWQ when reading ORC files via PXF (using the default Hive profile) 
doesn’t pass any of the filter information down. The only filter that is 
possible right now is at the level of partition and is generically done for all 
Hive tables.

ORC internally contains file level, stripe level and row level statistics 
including information such as min,max values etc. For more information refer to 
https://orc.apache.org/docs/indexes.html

The proposal here is to possibly introduce a new profile optimized for ORC 
files and to leverage these stats to improve the performance of HAWQ queries 
with predicates. We will also use the Vectorized approach while reading as 
opposed to the existing reader which is row based on more expensive.



> Support PXF filter push down for ORC
> ------------------------------------
>
>                 Key: HAWQ-886
>                 URL: https://issues.apache.org/jira/browse/HAWQ-886
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: PXF
>            Reporter: Shivram Mani
>            Assignee: Shivram Mani
>             Fix For: 2.1.0
>
>
> Currently HAWQ when reading ORC files via PXF (using the default Hive 
> profile) doesn’t push down any of the filter information down to the 
> underlying ORC reader. The only filter that is possible right now is at the 
> level of partition and is generically done for all Hive tables.
> ORC internally contains file level, stripe level and row level statistics 
> including information such as min,max values etc. For more information refer 
> to https://orc.apache.org/docs/indexes.html
> The proposal here is to introduce a new PXF profile optimized for ORC files 
> which leverages these stats to improve the performance of HAWQ queries with 
> predicates. We will also use the Vectorized approach while reading as opposed 
> to the existing reader which is row based on more expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to