Thanks Stephan! We are hoping to make the process as non-disruptive as 
possible to the Flink community. Making the Blink codebase public is the first 
step that hopefully facilitates further discussions.
Xiaowei

    On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen 
<se...@apache.org> wrote:  
 
 Dear Flink Community!

Some of you may have heard it already from announcements or from a Flink
Forward talk:
Alibaba has decided to open source its in-house improvements to Flink,
called Blink!
First of all, big thanks to team that developed these improvements and made
this
contribution possible!

Blink has some very exciting enhancements, most prominently on the Table
API/SQL side
and the unified execution of these programs. For batch (bounded) data, the
SQL execution
has full TPC-DS coverage (which is a big deal), and the execution is more
than 10x faster
than the current SQL runtime in Flink. Blink has also added support for
catalogs,
improved the failover speed of batch queries and the resource management.
It also
makes some good steps in the direction of more deeply unifying the batch
and streaming
execution.

The proposal is to merge Blink's enhancements into Flink, to give Flink's
SQL/Table API and
execution a big boost in usability and performance.

Just to avoid any confusion: This is not a suggested change of focus to
batch processing,
nor would this break with any of the streaming architecture and vision of
Flink.
This contribution follows very much the principle of "batch is a special
case of streaming".
As a special case, batch makes special optimizations possible. In its
current state,
Flink does not exploit many of these optimizations. This contribution adds
exactly these
optimizations and makes the streaming model of Flink applicable to harder
batch use cases.

Assuming that the community is excited about this as well, and in favor of
these enhancements
to Flink's capabilities, below are some thoughts on how this contribution
and integration
could work.

--- Making the code available ---

At the moment, the Blink code is in the form of a big Flink fork (rather
than isolated
patches on top of Flink), so the integration is unfortunately not as easy
as merging a
few patches or pull requests.

To support a non-disruptive merge of such a big contribution, I believe it
make sense to make
the code of the fork available in the Flink project first.
>From there on, we can start to work on the details for merging the
enhancements, including
the refactoring of the necessary parts in the Flink master and the Blink
code to make a
merge possible without repeatedly breaking compatibility.

The first question is where do we put the code of the Blink fork during the
merging procedure?
My first thought was to temporarily add a repository (like
"flink-blink-staging"), but we could
also put it into a special branch in the main Flink repository.


I will start a separate thread about discussing a possible strategy to
handle and merge
such a big contribution.

Best,
Stephan
  

Reply via email to