Zhenxiao Luo created PARQUET-131:
------------------------------------

             Summary: Supporting Vectorized APIs in Parquet
                 Key: PARQUET-131
                 URL: https://issues.apache.org/jira/browse/PARQUET-131
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
            Reporter: Zhenxiao Luo
            Assignee: Zhenxiao Luo


Vectorized Query Execution could have big performance improvement for SQL 
engines like Hive, Drill, and Presto. Instead of processing one row at a time, 
Vectorized Query Execution could streamline operations by processing a batch of 
rows at a time. Within one batch, each column is represented as a vector of a 
primitive data type. SQL engines could apply predicates very efficiently on 
these vectors, avoiding a single row going through all the operators before the 
next row can be processed.
As an efficient columnar data representation, it would be nice if Parquet could 
support Vectorized APIs, so that all SQL engines could read vectors from 
Parquet files, and do vectorized execution for Parquet File Format.
 
Detail proposal:
https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to