big +1 to contribute Blink codebase directly into the Apache Flink project. Looking forward to the new journey.
Regards, Shaoxuan On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <xiaow...@gmail.com> wrote: > Thanks Stephan! We are hoping to make the process as non-disruptive as > possible to the Flink community. Making the Blink codebase public is the > first step that hopefully facilitates further discussions. > Xiaowei > > On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen < > se...@apache.org> wrote: > > Dear Flink Community! > > Some of you may have heard it already from announcements or from a Flink > Forward talk: > Alibaba has decided to open source its in-house improvements to Flink, > called Blink! > First of all, big thanks to team that developed these improvements and made > this > contribution possible! > > Blink has some very exciting enhancements, most prominently on the Table > API/SQL side > and the unified execution of these programs. For batch (bounded) data, the > SQL execution > has full TPC-DS coverage (which is a big deal), and the execution is more > than 10x faster > than the current SQL runtime in Flink. Blink has also added support for > catalogs, > improved the failover speed of batch queries and the resource management. > It also > makes some good steps in the direction of more deeply unifying the batch > and streaming > execution. > > The proposal is to merge Blink's enhancements into Flink, to give Flink's > SQL/Table API and > execution a big boost in usability and performance. > > Just to avoid any confusion: This is not a suggested change of focus to > batch processing, > nor would this break with any of the streaming architecture and vision of > Flink. > This contribution follows very much the principle of "batch is a special > case of streaming". > As a special case, batch makes special optimizations possible. In its > current state, > Flink does not exploit many of these optimizations. This contribution adds > exactly these > optimizations and makes the streaming model of Flink applicable to harder > batch use cases. > > Assuming that the community is excited about this as well, and in favor of > these enhancements > to Flink's capabilities, below are some thoughts on how this contribution > and integration > could work. > > --- Making the code available --- > > At the moment, the Blink code is in the form of a big Flink fork (rather > than isolated > patches on top of Flink), so the integration is unfortunately not as easy > as merging a > few patches or pull requests. > > To support a non-disruptive merge of such a big contribution, I believe it > make sense to make > the code of the fork available in the Flink project first. > From there on, we can start to work on the details for merging the > enhancements, including > the refactoring of the necessary parts in the Flink master and the Blink > code to make a > merge possible without repeatedly breaking compatibility. > > The first question is where do we put the code of the Blink fork during the > merging procedure? > My first thought was to temporarily add a repository (like > "flink-blink-staging"), but we could > also put it into a special branch in the main Flink repository. > > > I will start a separate thread about discussing a possible strategy to > handle and merge > such a big contribution. > > Best, > Stephan >