Great news Stephan! Why not make the code available by having a fork of Flink on Alibaba's Github account. This will allow us to do easy diff's in the Github UI and create PR's of cherry-picked commits if needed. I can imagine that the Blink codebase has a lot of branches by itself, so just pushing a couple of branches to the main Flink repo is not ideal. Looking forward to it!
Cheers, Fokko Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <wshaox...@gmail.com>: > big +1 to contribute Blink codebase directly into the Apache Flink project. > Looking forward to the new journey. > > Regards, > Shaoxuan > > On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <xiaow...@gmail.com> wrote: > > > Thanks Stephan! We are hoping to make the process as non-disruptive as > > possible to the Flink community. Making the Blink codebase public is the > > first step that hopefully facilitates further discussions. > > Xiaowei > > > > On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen < > > se...@apache.org> wrote: > > > > Dear Flink Community! > > > > Some of you may have heard it already from announcements or from a Flink > > Forward talk: > > Alibaba has decided to open source its in-house improvements to Flink, > > called Blink! > > First of all, big thanks to team that developed these improvements and > made > > this > > contribution possible! > > > > Blink has some very exciting enhancements, most prominently on the Table > > API/SQL side > > and the unified execution of these programs. For batch (bounded) data, > the > > SQL execution > > has full TPC-DS coverage (which is a big deal), and the execution is more > > than 10x faster > > than the current SQL runtime in Flink. Blink has also added support for > > catalogs, > > improved the failover speed of batch queries and the resource management. > > It also > > makes some good steps in the direction of more deeply unifying the batch > > and streaming > > execution. > > > > The proposal is to merge Blink's enhancements into Flink, to give Flink's > > SQL/Table API and > > execution a big boost in usability and performance. > > > > Just to avoid any confusion: This is not a suggested change of focus to > > batch processing, > > nor would this break with any of the streaming architecture and vision of > > Flink. > > This contribution follows very much the principle of "batch is a special > > case of streaming". > > As a special case, batch makes special optimizations possible. In its > > current state, > > Flink does not exploit many of these optimizations. This contribution > adds > > exactly these > > optimizations and makes the streaming model of Flink applicable to harder > > batch use cases. > > > > Assuming that the community is excited about this as well, and in favor > of > > these enhancements > > to Flink's capabilities, below are some thoughts on how this contribution > > and integration > > could work. > > > > --- Making the code available --- > > > > At the moment, the Blink code is in the form of a big Flink fork (rather > > than isolated > > patches on top of Flink), so the integration is unfortunately not as easy > > as merging a > > few patches or pull requests. > > > > To support a non-disruptive merge of such a big contribution, I believe > it > > make sense to make > > the code of the fork available in the Flink project first. > > From there on, we can start to work on the details for merging the > > enhancements, including > > the refactoring of the necessary parts in the Flink master and the Blink > > code to make a > > merge possible without repeatedly breaking compatibility. > > > > The first question is where do we put the code of the Blink fork during > the > > merging procedure? > > My first thought was to temporarily add a repository (like > > "flink-blink-staging"), but we could > > also put it into a special branch in the main Flink repository. > > > > > > I will start a separate thread about discussing a possible strategy to > > handle and merge > > such a big contribution. > > > > Best, > > Stephan > > >