Great news Stephan!

Why not make the code available by having a fork of Flink on Alibaba's
Github account. This will allow us to do easy diff's in the Github UI and
create PR's of cherry-picked commits if needed. I can imagine that the
Blink codebase has a lot of branches by itself, so just pushing a couple of
branches to the main Flink repo is not ideal. Looking forward to it!

Cheers, Fokko





Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <wshaox...@gmail.com>:

> big +1 to contribute Blink codebase directly into the Apache Flink project.
> Looking forward to the new journey.
>
> Regards,
> Shaoxuan
>
> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <xiaow...@gmail.com> wrote:
>
> >  Thanks Stephan! We are hoping to make the process as non-disruptive as
> > possible to the Flink community. Making the Blink codebase public is the
> > first step that hopefully facilitates further discussions.
> > Xiaowei
> >
> >     On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen <
> > se...@apache.org> wrote:
> >
> >  Dear Flink Community!
> >
> > Some of you may have heard it already from announcements or from a Flink
> > Forward talk:
> > Alibaba has decided to open source its in-house improvements to Flink,
> > called Blink!
> > First of all, big thanks to team that developed these improvements and
> made
> > this
> > contribution possible!
> >
> > Blink has some very exciting enhancements, most prominently on the Table
> > API/SQL side
> > and the unified execution of these programs. For batch (bounded) data,
> the
> > SQL execution
> > has full TPC-DS coverage (which is a big deal), and the execution is more
> > than 10x faster
> > than the current SQL runtime in Flink. Blink has also added support for
> > catalogs,
> > improved the failover speed of batch queries and the resource management.
> > It also
> > makes some good steps in the direction of more deeply unifying the batch
> > and streaming
> > execution.
> >
> > The proposal is to merge Blink's enhancements into Flink, to give Flink's
> > SQL/Table API and
> > execution a big boost in usability and performance.
> >
> > Just to avoid any confusion: This is not a suggested change of focus to
> > batch processing,
> > nor would this break with any of the streaming architecture and vision of
> > Flink.
> > This contribution follows very much the principle of "batch is a special
> > case of streaming".
> > As a special case, batch makes special optimizations possible. In its
> > current state,
> > Flink does not exploit many of these optimizations. This contribution
> adds
> > exactly these
> > optimizations and makes the streaming model of Flink applicable to harder
> > batch use cases.
> >
> > Assuming that the community is excited about this as well, and in favor
> of
> > these enhancements
> > to Flink's capabilities, below are some thoughts on how this contribution
> > and integration
> > could work.
> >
> > --- Making the code available ---
> >
> > At the moment, the Blink code is in the form of a big Flink fork (rather
> > than isolated
> > patches on top of Flink), so the integration is unfortunately not as easy
> > as merging a
> > few patches or pull requests.
> >
> > To support a non-disruptive merge of such a big contribution, I believe
> it
> > make sense to make
> > the code of the fork available in the Flink project first.
> > From there on, we can start to work on the details for merging the
> > enhancements, including
> > the refactoring of the necessary parts in the Flink master and the Blink
> > code to make a
> > merge possible without repeatedly breaking compatibility.
> >
> > The first question is where do we put the code of the Blink fork during
> the
> > merging procedure?
> > My first thought was to temporarily add a repository (like
> > "flink-blink-staging"), but we could
> > also put it into a special branch in the main Flink repository.
> >
> >
> > I will start a separate thread about discussing a possible strategy to
> > handle and merge
> > such a big contribution.
> >
> > Best,
> > Stephan
> >
>

Reply via email to