[jira] [Commented] (FLINK-11899) Introduce vectorized parquet InputFormat for blink runtime

Jingsong Lee (Jira) Mon, 13 Jan 2020 23:54:11 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014909#comment-17014909
 ]


Jingsong Lee commented on FLINK-11899:
--------------------------------------

Hi [~hpeter], another way is re-writing readers instead of re-using hive 
readers, just like:

[https://github.com/flink-tpc-ds/flink/tree/tpcds-master/flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/parquet]

I think that is better way, otherwise we will deal with multi hive versions 
again... Which is annoying.

> Introduce vectorized parquet InputFormat for blink runtime
> ----------------------------------------------------------
>
>                 Key: FLINK-11899
>                 URL: https://issues.apache.org/jira/browse/FLINK-11899
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / Runtime
>            Reporter: Jingsong Lee
>            Assignee: Zhenqiu Huang
>            Priority: Major
>             Fix For: 1.11.0
>
>
> VectorizedParquetInputFormat is introduced to read parquet data in batches.
> When returning each row of data, instead of actually retrieving each field, 
> we use BaseRow's abstraction to return a Columnar Row-like view.
> This will greatly improve the downstream filtered scenarios, so that there is 
> no need to access redundant fields on the filtered data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-11899) Introduce vectorized parquet InputFormat for blink runtime

Reply via email to