Liya Fan created FLINK-13053:
--------------------------------

             Summary: Vectorization Support in Flink
                 Key: FLINK-13053
                 URL: https://issues.apache.org/jira/browse/FLINK-13053
             Project: Flink
          Issue Type: New Feature
          Components: Table SQL / Runtime
            Reporter: Liya Fan
            Assignee: Liya Fan
         Attachments: image-2019-07-02-15-26-39-550.png

Vectorization is a popular technique in SQL engines today. Compared with 
traditional row-based approach, it has some distinct advantages, for example:

 
 * Better use of CPU resources (e.g. SIMD)
 * More compact memory layout
 * More friendly to compressed data format.

 

Currently, Flink is based on a row-based SQL engine for both stream and batch 
workloads. To enjoy the above benefits, we want to bring vectorization to 
Flink. This involves substantial changes to the existing code base. Therefore, 
we give a plan to carry out such changes in small, incremental steps, in order 
not to affect existing features. We want to apply it to batch workload first. 
The details can be found in our proposal.

 

For the past months, we have developed an initial implementation of the above 
ideas. Initial performance evaluations on TPC-H benchmarks show that 
substantial performance improvements can be obtained by vectorization (see the 
figure below). More details can be found in our proposal.

  !image-2019-07-02-15-26-39-550.png!

Special thanks to @Kurt Young’s team for all the kind help.

Special thanks to @Piotr Nowojski for all the valuable feedback and help 
suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to