I'm ok with point 3. Concerning point 8: Why do we have to build flink-core twice after having it built as a dependency for flink-libraries? This seems wrong to me.
Cheers, Till On Wed, Mar 15, 2017 at 4:23 PM, Robert Metzger <rmetz...@apache.org> wrote: > Thank you. Running on AWS is a good idea! > Let me know if you (or anybody else) wants to help me with the > infrastructure work! Any help is much appreciated (as I've said before, I > don't really have time for doing this, but it has to be done :) ) > > I'm against creating two new repositories. I fear that this introduces too > much complexity and too many repositories. > "flink" and "flink-libraries" are hopefully enough to get the build time > significantly down. > We can also consider putting the connectors into the "flink-libraries" repo > if we need to further reduce the build time. > > We should probably move "flink-table" of out "flink-libraries" if we want > to keep "flink-table" in the main repo. (This would eliminate the > "flink-libraries" module from main. > > Also, I agree that "flink-statebackend-rocksdb" is not correctly placed in > contrib anymore. > > > On Wed, Mar 15, 2017 at 4:07 PM, Greg Hogan <c...@greghogan.com> wrote: > > > Robert, appreciate your kickstarting this task. > > > > We should compare the verification time with and without the listed > > modules. I’ll try to run this by tomorrow on AWS and on Travis. > > > > Should we maintain separate repos for flink-contrib and flink-libraries? > > Are you intending that we move flink-table out of flink-libraries (and > > perhaps flink-statebackend-rocksdb out of flink-contrib)? > > > > Greg > > > > > > > On Mar 15, 2017, at 9:55 AM, Robert Metzger <rmetz...@apache.org> > wrote: > > > > > > Thank you for looking into this Till. > > > > > > I think we then have to split the repositories. > > > My main motivation for doing this is that it seems to be the only > > feasible > > > way of scaling the community to allow more committers working on the > > > libraries. > > > > > > I'll take care of getting things started. > > > > > > As the next steps I propose to: > > > 1. Ask INFRA to rename https://git-wip-us.apache.org/ > repos/asf?p=flink- > > > connectors.git;a=summary to "flink-libraries" > > > 2. Ask INFRA to set up GitHub and travis integration for > > "flink-libraries" > > > 3. Put the code of "flink-ml", "flink-gelly", "flink-python", > > "flink-cep", > > > "flink-scala-shell", "flink-storm" into the new repository. (I decided > > > against moving flink-contrib there, because rocksdb is in the contrib > > > module, for flink-table, I'm undecided, but I kept it in the main repo > > > because its probably going to interact more with the core code in the > > > future) > > > I try to preserve the history of those modules when splitting them into > > the > > > new repo > > > 4. I'll close all pull requests against those modules in the main repo. > > > 5. I'll set up a minimal documentation page for the library repository, > > > similar to the main documentation. > > > 6. I'll update the documentation build process to build both > > documentations > > > & link them to each other > > > 7. I'll update the nightly deployment process to include both > > repositories > > > 8. I'll update the release script to create the Flink release out of > both > > > repositories. In order to put the libraries into the opt/ dir of the > > > release, I'll need to change the build of "flink-dist" so that it first > > > builds flink core, then the libraries and then the core again with the > > > libraries as an additional dependency. > > > > > > The main question for the community is: do you agree with point 3 ? > Would > > > you like to include more or less? > > > > > > I'll start with 1. and 2. tomorrow morning. > > > > > > > > > > > > On Wed, Mar 15, 2017 at 1:48 PM, Till Rohrmann <trohrm...@apache.org> > > wrote: > > > > > >> In theory we could have a merging bot which solves the problem of the > > >> "commit window". Once the PR passes all tests and has enough +1s, the > > bot > > >> could do the merging and, thus, it effectively linearizes the merge > > >> process. > > >> > > >> I think the second point is actually a disadvantage because there is > not > > >> such an immediate incentive/pressure to fix the broken module if it > > lives > > >> in a separate repository. Furthermore, breaking API changes in the > core > > >> will most likely go unnoticed for some time in other modules which are > > not > > >> developed so actively. In the worst case these things will only be > > noticed > > >> when we try to make a release. > > >> > > >> But I also agree that we are not Google and we don't have the > > capacities to > > >> maintain such a smooth a build process that we can keep all the code > in > > a > > >> single repository. > > >> > > >> I looked a bit into Gradle and as far as I can tell it offers some > nice > > >> features wrt incrementally building projects. This would be beneficial > > for > > >> local development but it would not solve our build time problems on > > Travis. > > >> Gradle intends to introduce a task result cache which allows to reuse > > >> results across builds. This could help when building on Travis, > > however, it > > >> is not yet fully implemented. Moreover, migrating from Maven to Gradle > > >> won't come for free (there's simply no free lunch out there) and we > > might > > >> risk to introduce new bugs. Therefore, I would vote to split the > > repository > > >> in order to mitigate our current problems with Travis and the build > > time in > > >> general. Whether to use a different build system or not can then be > > >> discussed as an orthogonal question. > > >> > > >> Cheers, > > >> Till > > >> > > >> On Tue, Mar 14, 2017 at 8:05 PM, Stephan Ewen <se...@apache.org> > wrote: > > >> > > >>> Some other thoughts on how repository split would help. I am not sure > > for > > >>> all of them, so please comment: > > >>> > > >>> - There is less competition for a "commit window". It happens a lot > > >>> already that you run all tests and want to commit, but there was a > > commit > > >>> in the meantime. You rebase, need to re-test, again commit in the > > >> meantime. > > >>> For a "linear" commit history, this may become a bottleneck > > >> eventually > > >>> as well. > > >>> > > >>> - There is less risk of broken master. If one repository/modules > > breaks > > >>> its master, the others can still continue. > > >>> > > >>> Stephan > > >>> > > >>> > > >>> On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann < > trohrm...@apache.org> > > >>> wrote: > > >>> > > >>>> Thanks for all your input. In order to wrap the discussion up I'd > like > > >> to > > >>>> summarize the mentioned points: > > >>>> > > >>>> The problem of increasing build times and complexity of the project > > has > > >>>> been acknowledged. Ideally we would have everything in one > repository > > >>> using > > >>>> an incremental build tool. Since Maven does not properly support > this > > >> we > > >>>> would have to switch our build tool to something like Gradle, for > > >>> example. > > >>>> > > >>>> Another option is introducing build profiles for different sets of > > >>> modules > > >>>> as well as separating integration and unit tests. The third > > alternative > > >>>> would be creating sub-projects with their own repositories. I > actually > > >>>> think that these two proposal are not necessarily exclusive and it > > >> would > > >>>> also make sense to have a separation between unit and integration > > tests > > >>> if > > >>>> we split the respository. > > >>>> > > >>>> The overall consensus seems to be that we don't want to split the > > >>> community > > >>>> and want to keep everything under the same umbrella. I think this is > > >> the > > >>>> right way to go, because otherwise some parts of the project could > > >> become > > >>>> second class citizens. Given that and that we continue using Maven, > I > > >>> still > > >>>> think that creating sub-projects for the libraries, for example, > could > > >> be > > >>>> beneficial. A split could reduce the project's complexity and make > it > > >>>> potentially easier for libraries to get actively developed. The main > > >>>> concern is setting up the build infrastructure to aggregate docs > from > > >>>> multiple repositories and making them publicly available. > > >>>> > > >>>> Since I started this thread and I would really like to see Flink's > ML > > >>>> library being revived again, I'd volunteer investigating first > whether > > >> it > > >>>> is doable establishing a proper incremental build for Flink. If that > > >>> should > > >>>> not be possible, I will look into splitting the repository, first > only > > >>> for > > >>>> the libraries. I'll share my results with the community once I'm > done > > >>> with > > >>>> the investigation. > > >>>> > > >>>> Cheers, > > >>>> Till > > >>>> > > >>>> On Fri, Feb 24, 2017 at 3:50 PM, Robert Metzger < > rmetz...@apache.org> > > >>>> wrote: > > >>>> > > >>>>> @Jin Mingjian: You can not use the paid travis version for open > > >> source > > >>>>> projects. It only works for private repositories (at least back > then > > >>> when > > >>>>> we've asked them about that). > > >>>>> > > >>>>> @Stephan: I don't think that incremental builds will be available > > >> with > > >>>>> Maven anytime soon. > > >>>>> > > >>>>> I agree that we need to fix the build time issue on Travis. I've > > >>> recently > > >>>>> pushed a commit to use now three instead of two test groups. > > >>>>> But I don't think that this is feasible long-term solution. > > >>>>> > > >>>>> If this discussion is only about reducing the build and test time, > > >>>>> introducing build profiles for different components as Aljoscha > > >>> suggested > > >>>>> would solve the problem Till mentioned. > > >>>>> Also, if we decide that travis is not a good tool anymore for the > > >>>> testing, > > >>>>> I guess we can find a different solution. There are now competitors > > >> to > > >>>>> Travis that might be willing to offer a paid plan for an open > source > > >>>>> project, or we set up our own infra on a server sponsored by one of > > >> the > > >>>>> contributing companies. > > >>>>> If we want to solve "community issues" with the change as well, > then > > >> I > > >>>>> think its work the effort of splitting up Flink into different > > >>>>> repositories. > > >>>>> > > >>>>> Splitting up repositories is not a trivial task in my opinion. As > > >>> others > > >>>>> have mentioned before, we need to consider the following things: > > >>>>> - How are we doing to build the documentation? Ideally every repo > > >>> should > > >>>>> contain its docs, so we would need to pull them together when > > >> building > > >>>> the > > >>>>> main docs. > > >>>>> - How do organize the dependencies? If we have library repository > > >>> depend > > >>>> on > > >>>>> snapshot Flink versions, we need to make sure that the snapshot > > >>>> deployment > > >>>>> always works. This also means that people working on a library > > >>> repository > > >>>>> will pull from snapshot OR need to build first locally. > > >>>>> - We need to update the release scripts > > >>>>> > > >>>>> If we commit to do these changes, we need to assign at least one > > >>>> committer > > >>>>> (yes, in this case we need somebody who can commit, for example for > > >>>>> updating the buildbot stuff) who volunteers to do the change. > > >>>>> I've done a lot of infrastructure work in the past, but I'm > currently > > >>>>> pretty booked with many other things, so I don't realistically see > > >>> myself > > >>>>> doing that. Max who used to work on these things is taking some > time > > >>> off. > > >>>>> I think we need, best case 3 days for the change, worst case 5 > days. > > >>> The > > >>>>> problem is that there are no "unit tests" for the infra stuff, so > > >> many > > >>>>> things are "trial and error" (like Apache's buildbot, our release > > >>>> scripts, > > >>>>> the doc scripts, maven stuff, nightly builds). > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Thu, Feb 23, 2017 at 1:33 PM, Stephan Ewen <se...@apache.org> > > >>> wrote: > > >>>>> > > >>>>>> If we can get a incremental builds to work, that would actually be > > >>> the > > >>>>>> preferred solution in my opinion. > > >>>>>> > > >>>>>> Many companies have invested heavily in making a "single > > >> repository" > > >>>> code > > >>>>>> base work, because it has the advantage of not having to > > >>> update/publish > > >>>>>> several repositories first. > > >>>>>> However, the strong prerequisite for that is an incremental build > > >>>> system > > >>>>>> that builds only (fine grained) what it has to build. I am not > sure > > >>> how > > >>>>> we > > >>>>>> could make that work > > >>>>>> with Maven and Travis... > > >>>>>> > > >>>>>> On Wed, Feb 22, 2017 at 10:42 PM, Greg Hogan <c...@greghogan.com> > > >>>> wrote: > > >>>>>> > > >>>>>>> An additional option for reducing time to build and test is > > >>> parallel > > >>>>>>> execution. This would help users more than on TravisCI since > > >> we're > > >>>>>>> generally running on multi-core machines rather than VM slices. > > >>>>>>> > > >>>>>>> Is the idea that each user would only check out the modules that > > >> he > > >>>> or > > >>>>>> she > > >>>>>>> is developing with? For example, if a developer is not working on > > >>>>>>> flink-mesos or flink-yarn then the "flink-deploy" module would > > >> not > > >>> be > > >>>>>> clone > > >>>>>>> to their filesystem? > > >>>>>>> > > >>>>>>> We can run a TravisCI nightly build on each repo to validate > > >>> against > > >>>>> API > > >>>>>>> changes. > > >>>>>>> > > >>>>>>> Greg > > >>>>>>> > > >>>>>>> On Wed, Feb 22, 2017 at 12:24 PM, Fabian Hueske < > > >> fhue...@gmail.com > > >>>> > > >>>>>> wrote: > > >>>>>>> > > >>>>>>>> Hi everybody, > > >>>>>>>> > > >>>>>>>> I think this should be a discussion about the benefits and > > >>>> drawbacks > > >>>>> of > > >>>>>>>> separating the code into distinct repositories from a > > >> development > > >>>>> point > > >>>>>>> of > > >>>>>>>> view. > > >>>>>>>> So I agree with Stephan that we should not divide the community > > >>> by > > >>>>>>> creating > > >>>>>>>> separate groups of committers. > > >>>>>>>> Also the discussion about independent releases is not be > > >> strictly > > >>>>>> related > > >>>>>>>> to the decision, IMO. > > >>>>>>>> > > >>>>>>>> I see a few pros and cons for splitting the code base into > > >>> separate > > >>>>>>>> repositories which (I think) haven't been mentioned before: > > >>>>>>>> pros: > > >>>>>>>> - IDE setup will be leaner. It is not necessary to compile the > > >>>> whole > > >>>>>> code > > >>>>>>>> base to run a test after switching a branch. > > >>>>>>>> cons: > > >>>>>>>> - developing libraries features that require changes in the > > >> core > > >>> / > > >>>>> APIs > > >>>>>>>> become more time consuming due to back-and-forth between code > > >>>> bases. > > >>>>>>>> However, I think this is not very often the case. > > >>>>>>>> > > >>>>>>>> Aljoscha has good points as well. Many of the build issues > > >> could > > >>> be > > >>>>>>> solved > > >>>>>>>> by different build profiles and configurations. > > >>>>>>>> > > >>>>>>>> Best, Fabian > > >>>>>>>> > > >>>>>>>> 2017-02-22 14:59 GMT+01:00 Gábor Hermann < > > >> m...@gaborhermann.com > > >>>> : > > >>>>>>>> > > >>>>>>>>> @Stephan: > > >>>>>>>>> > > >>>>>>>>> Although I tried to raise some issues about splitting > > >>> committers, > > >>>>> I'm > > >>>>>>>>> still strongly in favor of some kind of restructuring. We > > >> just > > >>>> have > > >>>>>> to > > >>>>>>> be > > >>>>>>>>> conscious about the disadvantages. > > >>>>>>>>> > > >>>>>>>>> Not splitting the committers could leave the libraries in the > > >>>> same > > >>>>>>>>> stalling status, described by Till. Of course, dedicating > > >>> current > > >>>>>>>>> committers as shepherds of the libraries could easily resolve > > >>> the > > >>>>>>> issue. > > >>>>>>>>> But that requires time from current committers. It seems like > > >>>>>>> trade-offs > > >>>>>>>>> between code quality, speed of development, and committer > > >>>> efforts. > > >>>>>>>>> > > >>>>>>>>> From what I see in the discussion about ML, there are many > > >>> people > > >>>>>>> willing > > >>>>>>>>> to contribute as well as production use-cases. This means we > > >>>> could > > >>>>>> and > > >>>>>>>>> should move forward. However, the development speed is > > >>>>> significantly > > >>>>>>>> slowed > > >>>>>>>>> down by stalling PRs. The proposal for contributors helping > > >> the > > >>>>>> review > > >>>>>>>>> process did not really work out so far. In my opinion, either > > >>>> code > > >>>>>>>> quality > > >>>>>>>>> (by more easily accepting new committers) or some committer > > >>> time > > >>>>>>>>> (reviewing/merging) should be sacrificed to move forward. As > > >>> Till > > >>>>> has > > >>>>>>>>> indicated, it would be shameful if we let this contribution > > >>>> effort > > >>>>>> die. > > >>>>>>>>> > > >>>>>>>>> Cheers, > > >>>>>>>>> Gabor > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > > >