Thank you all for the positive feedback. I will start putting together a page in the wiki.
@Jark: Azure Pipelines provides a free services, that is even better than what Travis provides for free: 10 parallel builds with 6 hours timeouts. @Chesnay: I will answer your questions in the yet-to-be-written documentation in the wiki. On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise <ar...@ververica.com> wrote: > +1 I had good experiences with Azure pipelines in the past. > > On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <aljos...@apache.org> > wrote: > > > +1 > > > > Thanks for the effort! The tooling seems to be quite a bit nicer and I > > like that we can grow by adding more machines. > > > > Best, > > Aljoscha > > > > > On 5. Dec 2019, at 03:18, Jark Wu <imj...@gmail.com> wrote: > > > > > > +1 for Azure pipeline because it promises better performance. > > > > > > However, I have 2 concerns: > > > > > > 1) Travis provides personal free service for testing personal branches. > > > Usually, contributors use this feature to test PoC or run CRON jobs for > > > pull requests. > > > Using local machine will cost a lot of time. Does AZP provides the > > same > > > free service? > > > 2) Currently, we deployed a webhook [1] to receive Travis CI build > > > notifications [2] and send to bui...@flink.apache.org mailing list. > > > We need to figure out a way how to send Azure build results to the > > > mailing list. And this [3] might be the way to go. > > > > > > builds@f.a.o mailing list > > > > > > Best, > > > Jark > > > > > > [1]: https://github.com/wuchong/flink-notification-bot > > > [2]: > > > > > > https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications > > > [3]: > > > > > > https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops > > > > > > > > > > > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang <zjf...@gmail.com> wrote: > > > > > >> +1 > > >> > > >> Till Rohrmann <trohrm...@apache.org> 于2019年12月4日周三 下午10:43写道: > > >> > > >>> +1 for moving to Azure pipelines as it promises better scalability > and > > >>> tooling. Looking forward to having faster builds and hence shorter > > >> feedback > > >>> cycles :-) > > >>> > > >>> Cheers, > > >>> Till > > >>> > > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <ches...@apache.org> > > >>> wrote: > > >>> > > >>>> @robert Can you expand how the azure setup interacts with CiBot? Do > we > > >>>> have to continue mirroring builds into flink-ci? How will the > cronjob > > >>>> configuration work? We should have a general idea on how to > implement > > >>>> this before proceeding. > > >>>> Additionally, moving /all /jobs into flink-ci requires setting up > the > > >>>> environment variables we have; can we set these up via files or will > > we > > >>>> have to give all committers permissions for flink-ci/flink? > > >>>> > > >>>> On 04/12/2019 12:55, Chesnay Schepler wrote: > > >>>>> From what I've seen so far Azure will provide us a better > experience, > > >>>>> so I'd say +1 for the transition as a whole. > > >>>>> > > >>>>> I'd delay merge at least until the feature branch is cut. > > >>>>> Given the parental leave it may even make sense to only start > merging > > >>>>> in January afterwards, to reduce the total time taken for the > > >>> transition. > > >>>>> > > >>>>> Reviews could maybe be made earlier, but I'm wondering whether > anyone > > >>>>> would even have the time at the moment to do so. > > >>>>> > > >>>>> On 04/12/2019 12:35, Kurt Young wrote: > > >>>>>> Thanks Robert for driving this. There is another big pain point of > > >>>>>> current > > >>>>>> travis, > > >>>>>> which is its cache mechanism will fail from time to time. Almost > > >>>>>> around 50% > > >>>>>> of > > >>>>>> the build fails are caused by cache problem. I opened this issue > to > > >>>>>> travis > > >>>>>> but > > >>>>>> got no response yet. So big +1 from my side. > > >>>>>> > > >>>>>> Just one comment, it's close to 1.10 feature freeze and we will > > >> spend > > >>>>>> some > > >>>>>> time > > >>>>>> to make tests stable before release. I wish this replacement can > > >>> happen > > >>>>>> after > > >>>>>> 1.10 release, otherwise it will be a unstable factor during > release > > >>>>>> testing. > > >>>>>> > > >>>>>> Best, > > >>>>>> Kurt > > >>>>>> > > >>>>>> > > >>>>>> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu <reed...@gmail.com> wrote: > > >>>>>> > > >>>>>>> Thanks Robert for the updates! And thanks a lot for all the > efforts > > >>> to > > >>>>>>> investigate, experiment and tune Azure Pipelines for Flink > > >> building. > > >>>>>>> Big +1 for it. > > >>>>>>> > > >>>>>>> It would be great that the community building can be extended > with > > >>>>>>> custom > > >>>>>>> machines so that the tests would not be queued for long with > daily > > >>>>>>> growing > > >>>>>>> PRs. > > >>>>>>> > > >>>>>>> The increased timeout would be also very helpful. > > >>>>>>> The 50min timeout for free travis accounts is a pain currently, > > >>>>>>> especially > > >>>>>>> when we'd like to run e2e tests in our own travis. And I had to > > >>>>>>> manually > > >>>>>>> split the jobs to make it possible to pass. > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> Zhu Zhu > > >>>>>>> > > >>>>>>> Robert Metzger <rmetz...@apache.org> 于2019年12月4日周三 下午6:36写道: > > >>>>>>> > > >>>>>>>> Hi all, > > >>>>>>>> > > >>>>>>>> as a follow up from our discussion on reducing the build time > > >> [1], I > > >>>>>>> would > > >>>>>>>> like to propose migrating our build infrastructure to Azure > > >>> Pipelines > > >>>>>>> (away > > >>>>>>>> from Travis). > > >>>>>>>> > > >>>>>>>> I believe that we have reached the limits of what Travis can > > >>>>>>>> provide the > > >>>>>>>> Flink community, and I don't want the build system to limit or > > >>>>>>>> influence > > >>>>>>>> the project's growth. > > >>>>>>>> > > >>>>>>>> *Benefits:* > > >>>>>>>> 1. The free Travis account are limited to 5 parallel builds, > with > > >> a > > >>>>>>> timeout > > >>>>>>>> of 50 minutes. Azure offers *10 parallel builds with 300 minute > > >>>>>>>> timeouts > > >>>>>>>> *for > > >>>>>>>> free for open source projects. > > >>>>>>>> 2. Azure Pipelines allows us to *add custom build machines* to > the > > >>>>>>>> pool > > >>>>>>> of > > >>>>>>>> 10 free parallel builders. > > >>>>>>>> This will allow the Flink community to scale the available build > > >>>>>>>> capacity > > >>>>>>>> as the project grows. We are dependent on donations from > > >> supporting > > >>>>>>>> companies, but I believe that it is easier for companies to > donate > > >>>>>>> machines > > >>>>>>>> than money. > > >>>>>>>> Alibaba is willing to provide 10 machines, with 32 cores each to > > >> the > > >>>>>>> Flink > > >>>>>>>> project for this purpose. > > >>>>>>>> In addition, Xiyuan, who's working on adding ARM support for > Flink > > >>>>>>> provided > > >>>>>>>> me with 2 ARM machines (16 cores each). > > >>>>>>>> I want to use the custom, more efficient build machines for > > >> building > > >>>>>>>> Flink's pull requests and master-pushes. > > >>>>>>>> 3. *Azure Pipelines is a more feature-rich tool*, allowing for > > >>>>>>>> example to > > >>>>>>>> transfer intermediate build artifacts between pipeline stages. > > >> This > > >>>>>>>> will > > >>>>>>>> allow us to make the build more reliable (we are currently > abusing > > >>> the > > >>>>>>>> caching mechanism in Travis for this). > > >>>>>>>> It also has some basic analytics on test results / flaky tests > > >> etc. > > >>>>>>>> > > >>>>>>>> *Known problems:* > > >>>>>>>> - Initially, we might see different build instabilities than > > >> before > > >>>>>>>> - There's a higher maintenance overhead for the custom build > > >>> machines > > >>>>>>>> (keeping them up to date etc.) > > >>>>>>>> - We can not use the build status integration of AZP, because > they > > >>>>>>> require > > >>>>>>>> write access to the repository's source. The foundation does not > > >>> allow > > >>>>>>> that > > >>>>>>>> [2]. > > >>>>>>>> I propose to extend flinkbot / the flink-ci repository. > > >>>>>>>> > > >>>>>>>> *Current Status:* > > >>>>>>>> - I'm able [3] to execute [4] the current custom build scripts > on > > >>>>>>>> Azure > > >>>>>>>> Pipelines: This means that we will have one compile stage, and N > > >>>>>>>> testing > > >>>>>>>> jobs in the 2nd stage. Currently, we have N=10 testing jobs. > > >>>>>>>> The time from the start of a build till all tests have completed > > >> is > > >>>>>>>> 1h22 > > >>>>>>>> minutes. > > >>>>>>>> - I'm working on getting the nightly end to end tests to run on > > >> the > > >>>>>>>> new > > >>>>>>>> infrastructure. > > >>>>>>>> - I'm working on getting the build to work on our pool of custom > > >>>>>>>> machines > > >>>>>>>> as well > > >>>>>>>> - I'm working on setting up the full matrix of builds (different > > >>>>>>>> scala, > > >>>>>>>> hadoop etc. versions) for the nightlies > > >>>>>>>> > > >>>>>>>> *Next Steps:* > > >>>>>>>> - I propose to document the entire build system in the Flink > Wiki > > >>>>>>>> - Once Azure can cover the same pull request tests as Travis, I > > >>>>>>>> would set > > >>>>>>>> it up to run in parallel (including Flinkbot posting links to > > >>>>>>>> Azure). I > > >>>>>>>> hope that this phase lasts for 1-2 weeks only, so that we do not > > >>>>>>>> have to > > >>>>>>>> maintain things concurrently. I will monitor the build stability > > >>>>>>>> closely, > > >>>>>>>> but would expect some support with debugging potential issues > from > > >>> the > > >>>>>>>> contributors. > > >>>>>>>> - Once there are no problems with the new setup, we remove the > > >>> Travis > > >>>>>>>> setup. > > >>>>>>>> - Independently, I will work on triggering builds from master / > > >>>>>>>> release - > > >>>>>>>> branch pushes, as well as cron builds from the master branch ... > > >>>>>>>> all this > > >>>>>>>> will be described in the Wiki. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> *Timeline:*- Once I have the feeling that people are supportive > of > > >>> the > > >>>>>>>> idea, I will start documenting in the Wiki. The first pull > > >> requests > > >>>>>>> should > > >>>>>>>> show up after a few more days. > > >>>>>>>> I will do a one month parental leave starting some time later in > > >>>>>>> December, > > >>>>>>>> which will probably delay things a bit. I hope to have > everything > > >>>>>>> finished > > >>>>>>>> by end of January. > > >>>>>>>> > > >>>>>>>> I'm happy to hear your thoughts on this work. > > >>>>>>>> If nobody objects, I will start documenting the system and > prepare > > >>>>>>>> everything for the migration. > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Robert > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> [1] > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>> > > >>> > > >> > > > https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E > > >>>>>>> > > >>>>>>>> [2] https://issues.apache.org/jira/browse/INFRA-17030 > > >>>>>>>> [3] https://github.com/rmetzger/flink/tree/azure_playground > > >>>>>>>> [4] > > >>>>>>> > > >>> > https://dev.azure.com/rmetzger/Flink/_build?definitionId=4&_a=summary > > >>>>> > > >>>>> > > >>>>> > > >>>> > > >>>> > > >>> > > >> > > >> > > >> -- > > >> Best Regards > > >> > > >> Jeff Zhang > > >> > > > > >