Yash Datta created PARQUET-128:
----------------------------------

             Summary: Optimize the parquet RecordReader implementation when 
filterpredicate is pushed down 
                 Key: PARQUET-128
                 URL: https://issues.apache.org/jira/browse/PARQUET-128
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
    Affects Versions: 1.6.0rc2
            Reporter: Yash Datta
             Fix For: parquet-mr_1.6.0


The RecordReader implementation currently will read all the columns before 
applying the filter predicate and deciding whether to keep the row or discard 
it.
We can have a RecordReader which will only assemble the columns on which 
filters are applied (which are usually a few), then apply the filter and decide 
whether to keep the row or not , and then goes on to assemble the remaining 
columns or skip the remaining columns accordingly.

The performance improvement by this change is seen to be significant , and is 
better in case smaller number of rows are returned by filtering (which is 
usually the case) and there are many number of columns



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to