[ https://issues.apache.org/jira/browse/FLINK-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014909#comment-17014909 ]
Jingsong Lee commented on FLINK-11899: -------------------------------------- Hi [~hpeter], another way is re-writing readers instead of re-using hive readers, just like: [https://github.com/flink-tpc-ds/flink/tree/tpcds-master/flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/parquet] I think that is better way, otherwise we will deal with multi hive versions again... Which is annoying. > Introduce vectorized parquet InputFormat for blink runtime > ---------------------------------------------------------- > > Key: FLINK-11899 > URL: https://issues.apache.org/jira/browse/FLINK-11899 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Runtime > Reporter: Jingsong Lee > Assignee: Zhenqiu Huang > Priority: Major > Fix For: 1.11.0 > > > VectorizedParquetInputFormat is introduced to read parquet data in batches. > When returning each row of data, instead of actually retrieving each field, > we use BaseRow's abstraction to return a Columnar Row-like view. > This will greatly improve the downstream filtered scenarios, so that there is > no need to access redundant fields on the filtered data. -- This message was sent by Atlassian Jira (v8.3.4#803005)