[
https://issues.apache.org/jira/browse/PARQUET-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233942#comment-14233942
]
Dong Chen commented on PARQUET-131:
-----------------------------------
Thanks [~brocknoland], PrestoDB team, and Drill team for the progress and plan!
Adding the allocator interface is a good idea.
Looking forward to the POC. And hope I could help on Hive part then.
> Supporting Vectorized APIs in Parquet
> -------------------------------------
>
> Key: PARQUET-131
> URL: https://issues.apache.org/jira/browse/PARQUET-131
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Zhenxiao Luo
> Assignee: Zhenxiao Luo
> Attachments: Parquet-Vectorized-APIs.pdf, ParquetInPresto.pdf
>
>
> Vectorized Query Execution could have big performance improvement for SQL
> engines like Hive, Drill, and Presto. Instead of processing one row at a
> time, Vectorized Query Execution could streamline operations by processing a
> batch of rows at a time. Within one batch, each column is represented as a
> vector of a primitive data type. SQL engines could apply predicates very
> efficiently on these vectors, avoiding a single row going through all the
> operators before the next row can be processed.
> As an efficient columnar data representation, it would be nice if Parquet
> could support Vectorized APIs, so that all SQL engines could read vectors
> from Parquet files, and do vectorized execution for Parquet File Format.
>
> Detail proposal:
> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)