Hi, Jark Thanks for your feedback, according to my initial assessment, the work effort is relatively large.
Moreover, I will add a test result of all queries to the FLIP. Best, Ron Jark Wu <imj...@gmail.com> 于2023年6月1日周四 20:45写道: > Hi Ron, > > Thanks a lot for the great proposal. The FLIP looks good to me in general. > It looks like not an easy work but the performance sounds promising. So I > think it's worth doing. > > Besides, if there is a complete test graph with all TPC-DS queries, the > effect of this FLIP will be more intuitive. > > Best, > Jark > > > > On Wed, 31 May 2023 at 14:27, liu ron <ron9....@gmail.com> wrote: > > > Hi, Jinsong > > > > Thanks for your valuable suggestions. > > > > Best, > > Ron > > > > Jingsong Li <jingsongl...@gmail.com> 于2023年5月30日周二 13:22写道: > > > > > Thanks Ron for your information. > > > > > > I suggest that it can be written in the Motivation of FLIP. > > > > > > Best, > > > Jingsong > > > > > > On Tue, May 30, 2023 at 9:57 AM liu ron <ron9....@gmail.com> wrote: > > > > > > > > Hi, Jingsong > > > > > > > > Thanks for your review. We have tested it in TPC-DS case, and got a > 12% > > > > gain overall when only supporting only Calc&HashJoin&HashAgg > operator. > > In > > > > some queries, we even get more than 30% gain, it looks like an > > effective > > > > way. > > > > > > > > Best, > > > > Ron > > > > > > > > Jingsong Li <jingsongl...@gmail.com> 于2023年5月29日周一 14:33写道: > > > > > > > > > Thanks Ron for the proposal. > > > > > > > > > > Do you have some benchmark results for the performance > improvement? I > > > > > am more concerned about the improvement on Flink than the data in > > > > > other papers. > > > > > > > > > > Best, > > > > > Jingsong > > > > > > > > > > On Mon, May 29, 2023 at 2:16 PM liu ron <ron9....@gmail.com> > wrote: > > > > > > > > > > > > Hi, dev > > > > > > > > > > > > I'd like to start a discussion about FLIP-315: Support Operator > > > Fusion > > > > > > Codegen for Flink SQL[1] > > > > > > > > > > > > As main memory grows, query performance is more and more > determined > > > by > > > > > the > > > > > > raw CPU costs of query processing itself, this is due to the > query > > > > > > processing techniques based on interpreted execution shows poor > > > > > performance > > > > > > on modern CPUs due to lack of locality and frequent instruction > > > > > > mis-prediction. Therefore, the industry is also researching how > to > > > > > improve > > > > > > engine performance by increasing operator execution efficiency. > In > > > > > > addition, during the process of optimizing Flink's performance > for > > > TPC-DS > > > > > > queries, we found that a significant amount of CPU time was spent > > on > > > > > > virtual function calls, framework collector calls, and invalid > > > > > > calculations, which can be optimized to improve the overall > engine > > > > > > performance. After some investigation, we found Operator Fusion > > > Codegen > > > > > > which is proposed by Thomas Neumann in the paper[2] can address > > these > > > > > > problems. I have finished a PoC[3] to verify its feasibility and > > > > > validity. > > > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > [1]: > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL > > > > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf > > > > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG > > > > > > > > > > > > Best, > > > > > > Ron > > > > > > > > > > >