Exciting to see this happening! Wrt doc, have we done a diff which can show us how much differences are between Flink's and Blink's documentation (flink/docs)? For example, how many pages and how much percentage of each page is different? How many new pages (for new features) does Blink have?If we have such a summary or visualization, it may give us a better idea which approach we should go with.
Another perspective is that, though the main feature differences between Flink and Blink that the community is interested in are SQL/Table API and Batch, Blink's code changes seem to be much more extensive and touches more modules and behaviors. As a user, I'd love to have a more consistent experience of understanding and trying Blink, and a separate versioned website works best in such a case. Thanks, Bowen On Thu, Jan 24, 2019 at 4:22 AM Kurt Young <ykt...@gmail.com> wrote: > Sure, i will do the rebase before pushing the branch. > > Timo Walther <twal...@apache.org>于2019年1月24日 周四18:20写道: > > > Regarding the content of a `blink-1.5` branch, is it possible to rebase > > the big Blink commit on top of the current master or the last Flink > > release? > > > > I don't mean a full rebase here, but just forking the branch from > > current Flink, and putting the Blink content into the repository, and > > commit it. This would enable to see a diff which classes and lines have > > changed and which are still the same. I guess this would be very helpful > > instead of a branch with a big commit that has no common origin. > > > > Thanks, > > Timo > > > > Am 24.01.19 um 02:54 schrieb Becket Qin: > > > Thanks Stephan, > > > > > > The plan makes sense to me. > > > > > > Regarding the docs, it seems better to have a separate versioned > website > > > because there are a lot of changes spread over the places. We can add > the > > > banner to remind users that they are looking at the blink docs, which > is > > > temporary and will eventually be merged into Flink master. (The banner > is > > > pretty similar to what user will see when they visit docs of old flink > > > versions > > > < > > > https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html > > > > > > [1]). > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qn > > > > > > [1] > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html > > > > > > On Thu, Jan 24, 2019 at 6:21 AM Shaoxuan Wang <wshaox...@gmail.com> > > wrote: > > > > > >> Thanks Stephan, > > >> The entire plan looks good to me. WRT the "Docs for Flink", a > subsection > > >> should be good enough if we just introduce the outlines of what blink > > has > > >> changed. However, we have made detailed introductions to blink based > on > > the > > >> framework of current release document of Flink (those introductions > are > > >> distributed in each subsections). Does it make sense to create a blink > > >> document as a separate one, under the documentation section, say > > blink-1.5 > > >> (temporary, not a release). > > >> > > >> Regards, > > >> Shaoxuan > > >> > > >> > > >> On Wed, Jan 23, 2019 at 10:15 PM Stephan Ewen <se...@apache.org> > wrote: > > >> > > >>> Nice to see this lively discussion. > > >>> > > >>> *--- Branch Versus Repository ---* > > >>> > > >>> Looks like this is converging towards pushing a branch. > > >>> How about naming the branch simply "blink-1.5" ? That would be in > line > > >> with > > >>> the 1.5 version branch of Flink, which is simply called > "release-1.5" ? > > >>> > > >>> *--- SGA --- * > > >>> > > >>> The SGA (Software Grant Agreement) should be either filed already or > in > > >> the > > >>> process of filing. > > >>> > > >>> *--- Offering Jars for Blink ---* > > >>> > > >>> As Chesnay and Timo mentioned, we cannot easily offer a "Release" of > > >> Blink > > >>> (source or binary), because that would require a thorough > > >>> checking of licenses and creating/ bundling license files. That is a > > lot > > >> of > > >>> work, as we recently experienced again in the Flink master. > > >>> > > >>> What we can do is upload compiled jar files and link to them > somewhere > > in > > >>> the blink docs. We need to add a disclaimer that these are > > >>> convenience jars, and not an official Apache release. I hope that > would > > >>> work for the users that are curious to try things out. > > >>> > > >>> *--- Docs for Blink --- * > > >>> > > >>> Do we need a versioned website here? If not, can we simply make this > a > > >>> subsection of the current Flink snapshot docs? > > >>> Next to "Flink Development" and "Internals", we could have a section > on > > >>> "Blink branch". > > >>> I think it is crucial, thought, to make it clear that this is > temporary > > >> and > > >>> will eventually be subsumed by the main release, just > > >>> so that users do not get confused. > > >>> > > >>> Best, > > >>> Stephan > > >>> > > >>> > > >>> On Wed, Jan 23, 2019 at 12:23 PM Becket Qin <becket....@gmail.com> > > >> wrote: > > >>>> Really excited to see Blink joining the Flink community! > > >>>> > > >>>> My two cents regarding repo v.s. branch, I am +1 for a branch in > > Flink. > > >>>> Among many things, what's most important at this point is probably > to > > >>> make > > >>>> Blink code available to the developers so people can discuss the > merge > > >>>> strategy. Creating a branch is probably the one of the fastest way > to > > >> do > > >>>> that. We can always create separate repo later if necessary. > > >>>> > > >>>> WRT the doc and jar distribution, It is true that we are going to > have > > >>>> some major refactoring to the code. But I can imagine some curious > > >> users > > >>>> may still want to try out something in Blink and it would be good if > > we > > >>> can > > >>>> do them a favor. Legal wise, my hunch is that it is probably OK for > > >>> someone > > >>>> to just build the jars and docs, host it somewhere for convenience. > > But > > >>> it > > >>>> should be clear that this is just for convenience purpose instead of > > an > > >>>> official release form Apache (unless we would like to make it > > >> official). > > >>>> Thanks, > > >>>> > > >>>> Jiangjie (Becket) Qin > > >>>> > > >>>> On Wed, Jan 23, 2019 at 6:48 PM Chesnay Schepler < > ches...@apache.org> > > >>>> wrote: > > >>>> > > >>>>> From the ASF side Jar files do notrequire a vote/release process, > > >> this > > >>>>> is at the discretion of the PMC. > > >>>>> > > >>>>> However, I have my doubts whether at this time we could even > create a > > >>>>> source release of Blink given that we'd have to vet the code-base > > >> first. > > >>>>> Even without source release we could still distribute jars, but > would > > >>>>> not be allowed to advertise them to users as they do not constitute > > an > > >>>>> official release. > > >>>>> > > >>>>> On 23.01.2019 11:41, Timo Walther wrote: > > >>>>>> As far as I know it, we will not provide any binaries but only the > > >>>>>> source code. JAR files on Apache servers would need an official > > >>>>>> voting/release process. Interested users can build Blink > themselves > > >>>>>> using `mvn clean package`. > > >>>>>> > > >>>>>> @Stephan: Please correct me if I'm wrong. > > >>>>>> > > >>>>>> Regards, > > >>>>>> Timo > > >>>>>> > > >>>>>> Am 23.01.19 um 11:16 schrieb Kurt Young: > > >>>>>>> Hi Timo, > > >>>>>>> > > >>>>>>> What about the jar files, will blink's jar be uploaded to apache > > >>>>>>> repository? If not, i think it will be very inconvenient for > users > > >>> who > > >>>>>>> wants to try blink and view the documents if they need some help > > >> from > > >>>>>>> doc. > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Kurt > > >>>>>>> > > >>>>>>> > > >>>>>>> On Wed, Jan 23, 2019 at 6:09 PM Timo Walther <twal...@apache.org > > > > >>>>> wrote: > > >>>>>>>> Hi Kurt, > > >>>>>>>> > > >>>>>>>> I would not make the Blink's documentation visible to users or > > >>> search > > >>>>>>>> engines via a website. Otherwise this would communicate that > Blink > > >>>>>>>> is an > > >>>>>>>> official release. I would suggest to put the Blink docs into > > >> `/docs` > > >>>>>>>> and > > >>>>>>>> people can build it with `./docs/build.sh -pi` if there are > > >>>>> interested. > > >>>>>>>> I would not invest time into setting up a docs infrastructure. > > >>>>>>>> > > >>>>>>>> Regards, > > >>>>>>>> Timo > > >>>>>>>> > > >>>>>>>> Am 23.01.19 um 08:56 schrieb Kurt Young: > > >>>>>>>>> Thanks @Stephan for this exciting announcement! > > >>>>>>>>> > > >>>>>>>>> >From my point of view, i would prefer to use branch. It makes > > >> the > > >>>>>>>> message > > >>>>>>>>> "Blink is pat of Flink" more straightforward and clear. > > >>>>>>>>> > > >>>>>>>>> Except for the location of blink codes, there are some other > > >>>>> questions > > >>>>>>>> like > > >>>>>>>>> what version should should use, and where do we put blink's > > >>>>> documents. > > >>>>>>>>> Currently, we choose to use "1.5.1-blink-r0" as blink's version > > >>> since > > >>>>>>>> blink > > >>>>>>>>> forked from Flink's 1.5.1. We also added some docs to blink > just > > >> as > > >>>>>>>>> Flink > > >>>>>>>>> did. Can blink use a website like > > >>>>>>>>> "https://ci.apache.org/projects/flink/flink-docs-release-1.7/" > > >> to > > >>>>> put > > >>>>>>>> all > > >>>>>>>>> blink's docs, change it to something like > > >>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ? > > >>>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> Kurt > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng < > > >> chenghe...@gmail.com > > >>>>>>>> wrote: > > >>>>>>>>>> Hi all, > > >>>>>>>>>> > > >>>>>>>>>> @Stephan Thanks a lot for driving these efforts. I think a > lot > > >> of > > >>>>>>>> people > > >>>>>>>>>> is already waiting for this. > > >>>>>>>>>> +1 for opening the blink source code. > > >>>>>>>>>> Both a separate repository or a special branch is ok for me. > > >>>>>>>>>> Hopefully, > > >>>>>>>>>> this will not last too long. > > >>>>>>>>>> > > >>>>>>>>>> Best, Hequn > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <imj...@gmail.com> > > >>> wrote: > > >>>>>>>>>>> Great news! Looking forward to the new wave of developments. > > >>>>>>>>>>> > > >>>>>>>>>>> If Blink needs to be continuously updated, fix bugs, release > > >>>>>>>>>>> versions, > > >>>>>>>>>>> maybe a separate repository is a better idea. > > >>>>>>>>>>> > > >>>>>>>>>>> Best, > > >>>>>>>>>>> Jark > > >>>>>>>>>>> > > >>>>>>>>>>> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński < > > >> wos...@gmail.com > > >>>>>>>> wrote: > > >>>>>>>>>>>> Hey! > > >>>>>>>>>>>> I also think that creating the separate branch for Blink in > > >>>>>>>>>>>> Flink repo > > >>>>>>>>>>> is a > > >>>>>>>>>>>> better idea than creating the fork as IMHO it will allow > > >> merging > > >>>>>>>>>> changes > > >>>>>>>>>>>> more easily. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Best Regards, > > >>>>>>>>>>>> Dom. > > >>>>>>>>>>>> > > >>>>>>>>>>>> wt., 22 sty 2019 o 10:09 Ufuk Celebi <u...@apache.org> > > >>> napisał(a): > > >>>>>>>>>>>>> Hey Stephan and others, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> thanks for the summary. I'm very excited about the outlined > > >>>>>>>>>>> improvements. > > >>>>>>>>>>>>> :-) > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Separate branch vs. fork: I'm fine with either of the > > >>>>> suggestions. > > >>>>>>>>>>>>> Depending on the expected strategy for merging the changes, > > >>>>>>>>>>>>> expected > > >>>>>>>>>>>>> number of additional changes, etc., either one or the other > > >>>>>>>>>>>>> approach > > >>>>>>>>>>>>> might be better suited. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> – Ufuk > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Tue, Jan 22, 2019 at 9:20 AM Kurt Young < > ykt...@gmail.com > > >>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>> Hi Driesprong, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Glad to hear that you're interested with blink's codes. > > >>>>> Actually, > > >>>>>>>>>>> blink > > >>>>>>>>>>>>>> only has one branch by itself, so either a separated repo > > >> or a > > >>>>>>>>>>> flink's > > >>>>>>>>>>>>>> branch works for blink's code share. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>> Kurt > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko > > >>>>>>>>>>> <fo...@driesprong.frl > > >>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Great news Stephan! > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Why not make the code available by having a fork of Flink > > >> on > > >>>>>>>>>>>> Alibaba's > > >>>>>>>>>>>>>>> Github account. This will allow us to do easy diff's in > the > > >>>>>>>>>> Github > > >>>>>>>>>>> UI > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>>> create PR's of cherry-picked commits if needed. I can > > >> imagine > > >>>>>>>>>> that > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>> Blink codebase has a lot of branches by itself, so just > > >>>>>>>>>>>>>>> pushing a > > >>>>>>>>>>>>> couple of > > >>>>>>>>>>>>>>> branches to the main Flink repo is not ideal. Looking > > >> forward > > >>>>> to > > >>>>>>>>>>> it! > > >>>>>>>>>>>>>>> Cheers, Fokko > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang < > > >>>>>>>>>>>> wshaox...@gmail.com > > >>>>>>>>>>>>>> : > > >>>>>>>>>>>>>>>> big +1 to contribute Blink codebase directly into the > > >> Apache > > >>>>>>>>>>> Flink > > >>>>>>>>>>>>>>> project. > > >>>>>>>>>>>>>>>> Looking forward to the new journey. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Regards, > > >>>>>>>>>>>>>>>> Shaoxuan > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang < > > >>>>>>>>>>> xiaow...@gmail.com> > > >>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>> Thanks Stephan! We are hoping to make the process > as > > >>>>>>>>>>>>> non-disruptive as > > >>>>>>>>>>>>>>>>> possible to the Flink community. Making the Blink > > >> codebase > > >>>>>>>>>>> public > > >>>>>>>>>>>>> is > > >>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>> first step that hopefully facilitates further > > >> discussions. > > >>>>>>>>>>>>>>>>> Xiaowei > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Monday, January 21, 2019, 11:46:28 AM PST, > > >> Stephan > > >>>>>>>>>> Ewen > > >>>>>>>>>>> < > > >>>>>>>>>>>>>>>>> se...@apache.org> wrote: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Dear Flink Community! > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Some of you may have heard it already from > announcements > > >> or > > >>>>>>>>>>> from > > >>>>>>>>>>>> a > > >>>>>>>>>>>>>>> Flink > > >>>>>>>>>>>>>>>>> Forward talk: > > >>>>>>>>>>>>>>>>> Alibaba has decided to open source its in-house > > >>> improvements > > >>>>>>>>>> to > > >>>>>>>>>>>>> Flink, > > >>>>>>>>>>>>>>>>> called Blink! > > >>>>>>>>>>>>>>>>> First of all, big thanks to team that developed these > > >>>>>>>>>>>> improvements > > >>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>> made > > >>>>>>>>>>>>>>>>> this > > >>>>>>>>>>>>>>>>> contribution possible! > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Blink has some very exciting enhancements, most > > >> prominently > > >>>>>>>>>> on > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>> Table > > >>>>>>>>>>>>>>>>> API/SQL side > > >>>>>>>>>>>>>>>>> and the unified execution of these programs. For batch > > >>>>>>>>>>> (bounded) > > >>>>>>>>>>>>> data, > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>> SQL execution > > >>>>>>>>>>>>>>>>> has full TPC-DS coverage (which is a big deal), and the > > >>>>>>>>>>> execution > > >>>>>>>>>>>>> is > > >>>>>>>>>>>>>>> more > > >>>>>>>>>>>>>>>>> than 10x faster > > >>>>>>>>>>>>>>>>> than the current SQL runtime in Flink. Blink has also > > >> added > > >>>>>>>>>>>>> support for > > >>>>>>>>>>>>>>>>> catalogs, > > >>>>>>>>>>>>>>>>> improved the failover speed of batch queries and the > > >>> resource > > >>>>>>>>>>>>>>> management. > > >>>>>>>>>>>>>>>>> It also > > >>>>>>>>>>>>>>>>> makes some good steps in the direction of more deeply > > >>>>>>>>>> unifying > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>> batch > > >>>>>>>>>>>>>>>>> and streaming > > >>>>>>>>>>>>>>>>> execution. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> The proposal is to merge Blink's enhancements into > Flink, > > >>> to > > >>>>>>>>>>> give > > >>>>>>>>>>>>>>> Flink's > > >>>>>>>>>>>>>>>>> SQL/Table API and > > >>>>>>>>>>>>>>>>> execution a big boost in usability and performance. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Just to avoid any confusion: This is not a suggested > > >> change > > >>>>>>>>>> of > > >>>>>>>>>>>>> focus to > > >>>>>>>>>>>>>>>>> batch processing, > > >>>>>>>>>>>>>>>>> nor would this break with any of the streaming > > >> architecture > > >>>>>>>>>> and > > >>>>>>>>>>>>> vision > > >>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>> Flink. > > >>>>>>>>>>>>>>>>> This contribution follows very much the principle of > > >> "batch > > >>>>>>>>>> is > > >>>>>>>>>>> a > > >>>>>>>>>>>>>>> special > > >>>>>>>>>>>>>>>>> case of streaming". > > >>>>>>>>>>>>>>>>> As a special case, batch makes special optimizations > > >>>>>>>>>> possible. > > >>>>>>>>>>> In > > >>>>>>>>>>>>> its > > >>>>>>>>>>>>>>>>> current state, > > >>>>>>>>>>>>>>>>> Flink does not exploit many of these optimizations. > This > > >>>>>>>>>>>>> contribution > > >>>>>>>>>>>>>>>> adds > > >>>>>>>>>>>>>>>>> exactly these > > >>>>>>>>>>>>>>>>> optimizations and makes the streaming model of Flink > > >>>>>>>>>> applicable > > >>>>>>>>>>>> to > > >>>>>>>>>>>>>>> harder > > >>>>>>>>>>>>>>>>> batch use cases. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Assuming that the community is excited about this as > > >> well, > > >>>>>>>>>> and > > >>>>>>>>>>> in > > >>>>>>>>>>>>> favor > > >>>>>>>>>>>>>>>> of > > >>>>>>>>>>>>>>>>> these enhancements > > >>>>>>>>>>>>>>>>> to Flink's capabilities, below are some thoughts on how > > >>> this > > >>>>>>>>>>>>>>> contribution > > >>>>>>>>>>>>>>>>> and integration > > >>>>>>>>>>>>>>>>> could work. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> --- Making the code available --- > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> At the moment, the Blink code is in the form of a big > > >> Flink > > >>>>>>>>>>> fork > > >>>>>>>>>>>>>>> (rather > > >>>>>>>>>>>>>>>>> than isolated > > >>>>>>>>>>>>>>>>> patches on top of Flink), so the integration is > > >>> unfortunately > > >>>>>>>>>>> not > > >>>>>>>>>>>>> as > > >>>>>>>>>>>>>>> easy > > >>>>>>>>>>>>>>>>> as merging a > > >>>>>>>>>>>>>>>>> few patches or pull requests. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> To support a non-disruptive merge of such a big > > >>>>>>>>>> contribution, I > > >>>>>>>>>>>>> believe > > >>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>> make sense to make > > >>>>>>>>>>>>>>>>> the code of the fork available in the Flink project > > >> first. > > >>>>>>>>>>>>>>>>> From there on, we can start to work on the details > for > > >>>>>>>>>> merging > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>> enhancements, including > > >>>>>>>>>>>>>>>>> the refactoring of the necessary parts in the Flink > > >> master > > >>>>>>>>>> and > > >>>>>>>>>>>> the > > >>>>>>>>>>>>>>> Blink > > >>>>>>>>>>>>>>>>> code to make a > > >>>>>>>>>>>>>>>>> merge possible without repeatedly breaking > compatibility. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> The first question is where do we put the code of the > > >> Blink > > >>>>>>>>>>> fork > > >>>>>>>>>>>>> during > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>> merging procedure? > > >>>>>>>>>>>>>>>>> My first thought was to temporarily add a repository > > >> (like > > >>>>>>>>>>>>>>>>> "flink-blink-staging"), but we could > > >>>>>>>>>>>>>>>>> also put it into a special branch in the main Flink > > >>>>>>>>>> repository. > > >>>>>>>>>>>>>>>>> I will start a separate thread about discussing a > > >> possible > > >>>>>>>>>>>>> strategy to > > >>>>>>>>>>>>>>>>> handle and merge > > >>>>>>>>>>>>>>>>> such a big contribution. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>>>>> Stephan > > >>>>>>>>>>>>>>>>> > > >>>>>> > > >>>>> > > > > -- > Best, > Kurt >