+1 for migrating to Azure pipelines as this can have shorter build time, and faster response.
Best, Congxian Xiyuan Wang <wangxiyuan1...@gmail.com> 于2019年12月9日周一 上午10:13写道: > Hi Robert, > Thanks for bring up this topic. The 2 ARM machines(16cores) which I > donated is just for POC test. We(Huawei) can donate more once moving to > official Azure pipeline. :) > > Robert Metzger <rmetz...@apache.org> 于2019年12月6日周五 上午3:25写道: > > > Thanks for your comments Yun. > > If there's strong support for idea 2, it would actually make my > > life easier: the migration would be easier to do. > > > > I also noticed that the uploads to transfer.sh were broken, but this > should > > be fixed in the "rmetzger.flink" builds (coming from rmetzger/flink). The > > builds in "flink-ci.flink" (coming from flink-ci/flink) might have > troubles > > with transfer.sh. > > > > > > On Thu, Dec 5, 2019 at 5:50 PM Yun Tang <myas...@live.com> wrote: > > > > > Hi Robert > > > > > > Really exciting to see this new more powerful CI tool to get rid of the > > 50 > > > minutes limit of traivs-CI free account. > > > > > > After reading the wiki, I support idea 2 of AZP-setup version-2. > > > > > > However, after I dig into some failing builds at > > > https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view > the > > > logs of some IT cases which would be uploaded by traivs_watchdog to > > > transfer.sh previously. > > > I think this feature is also easy to implement in AZP, right? > > > > > > Best > > > Yun Tang > > > > > > On 12/6/19, 12:19 AM, "Robert Metzger" <rmetz...@apache.org> wrote: > > > > > > I've created a first draft of my plans in the wiki: > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines > > > . > > > I'm looking forward to your comments. > > > > > > On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger < > rmetz...@apache.org> > > > wrote: > > > > > > > Thank you all for the positive feedback. I will start putting > > > together a > > > > page in the wiki. > > > > > > > > @Jark: Azure Pipelines provides a free services, that is even > > better > > > than > > > > what Travis provides for free: 10 parallel builds with 6 hours > > > timeouts. > > > > > > > > @Chesnay: I will answer your questions in the yet-to-be-written > > > > documentation in the wiki. > > > > > > > > > > > > On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise <ar...@ververica.com > > > > > wrote: > > > > > > > >> +1 I had good experiences with Azure pipelines in the past. > > > >> > > > >> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek < > > > aljos...@apache.org> > > > >> wrote: > > > >> > > > >> > +1 > > > >> > > > > >> > Thanks for the effort! The tooling seems to be quite a bit > nicer > > > and I > > > >> > like that we can grow by adding more machines. > > > >> > > > > >> > Best, > > > >> > Aljoscha > > > >> > > > > >> > > On 5. Dec 2019, at 03:18, Jark Wu <imj...@gmail.com> wrote: > > > >> > > > > > >> > > +1 for Azure pipeline because it promises better > performance. > > > >> > > > > > >> > > However, I have 2 concerns: > > > >> > > > > > >> > > 1) Travis provides personal free service for testing > personal > > > >> branches. > > > >> > > Usually, contributors use this feature to test PoC or run > CRON > > > jobs > > > >> for > > > >> > > pull requests. > > > >> > > Using local machine will cost a lot of time. Does AZP > > > provides the > > > >> > same > > > >> > > free service? > > > >> > > 2) Currently, we deployed a webhook [1] to receive Travis CI > > > build > > > >> > > notifications [2] and send to bui...@flink.apache.org > mailing > > > list. > > > >> > > We need to figure out a way how to send Azure build > results > > > to the > > > >> > > mailing list. And this [3] might be the way to go. > > > >> > > > > > >> > > builds@f.a.o mailing list > > > >> > > > > > >> > > Best, > > > >> > > Jark > > > >> > > > > > >> > > [1]: https://github.com/wuchong/flink-notification-bot > > > >> > > [2]: > > > >> > > > > > >> > > > > >> > > > > > > https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications > > > >> > > [3]: > > > >> > > > > > >> > > > > >> > > > > > > https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops > > > >> > > > > > >> > > > > > >> > > > > > >> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang <zjf...@gmail.com> > > > wrote: > > > >> > > > > > >> > >> +1 > > > >> > >> > > > >> > >> Till Rohrmann <trohrm...@apache.org> 于2019年12月4日周三 > > 下午10:43写道: > > > >> > >> > > > >> > >>> +1 for moving to Azure pipelines as it promises better > > > scalability > > > >> and > > > >> > >>> tooling. Looking forward to having faster builds and hence > > > shorter > > > >> > >> feedback > > > >> > >>> cycles :-) > > > >> > >>> > > > >> > >>> Cheers, > > > >> > >>> Till > > > >> > >>> > > > >> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler < > > > ches...@apache.org > > > >> > > > > >> > >>> wrote: > > > >> > >>> > > > >> > >>>> @robert Can you expand how the azure setup interacts with > > > CiBot? > > > >> Do we > > > >> > >>>> have to continue mirroring builds into flink-ci? How will > > the > > > >> cronjob > > > >> > >>>> configuration work? We should have a general idea on how > to > > > >> implement > > > >> > >>>> this before proceeding. > > > >> > >>>> Additionally, moving /all /jobs into flink-ci requires > > > setting up > > > >> the > > > >> > >>>> environment variables we have; can we set these up via > > files > > > or > > > >> will > > > >> > we > > > >> > >>>> have to give all committers permissions for > flink-ci/flink? > > > >> > >>>> > > > >> > >>>> On 04/12/2019 12:55, Chesnay Schepler wrote: > > > >> > >>>>> From what I've seen so far Azure will provide us a > better > > > >> experience, > > > >> > >>>>> so I'd say +1 for the transition as a whole. > > > >> > >>>>> > > > >> > >>>>> I'd delay merge at least until the feature branch is > cut. > > > >> > >>>>> Given the parental leave it may even make sense to only > > > start > > > >> merging > > > >> > >>>>> in January afterwards, to reduce the total time taken > for > > > the > > > >> > >>> transition. > > > >> > >>>>> > > > >> > >>>>> Reviews could maybe be made earlier, but I'm wondering > > > whether > > > >> anyone > > > >> > >>>>> would even have the time at the moment to do so. > > > >> > >>>>> > > > >> > >>>>> On 04/12/2019 12:35, Kurt Young wrote: > > > >> > >>>>>> Thanks Robert for driving this. There is another big > pain > > > point > > > >> of > > > >> > >>>>>> current > > > >> > >>>>>> travis, > > > >> > >>>>>> which is its cache mechanism will fail from time to > time. > > > Almost > > > >> > >>>>>> around 50% > > > >> > >>>>>> of > > > >> > >>>>>> the build fails are caused by cache problem. I opened > > this > > > issue > > > >> to > > > >> > >>>>>> travis > > > >> > >>>>>> but > > > >> > >>>>>> got no response yet. So big +1 from my side. > > > >> > >>>>>> > > > >> > >>>>>> Just one comment, it's close to 1.10 feature freeze and > > we > > > will > > > >> > >> spend > > > >> > >>>>>> some > > > >> > >>>>>> time > > > >> > >>>>>> to make tests stable before release. I wish this > > > replacement can > > > >> > >>> happen > > > >> > >>>>>> after > > > >> > >>>>>> 1.10 release, otherwise it will be a unstable factor > > during > > > >> release > > > >> > >>>>>> testing. > > > >> > >>>>>> > > > >> > >>>>>> Best, > > > >> > >>>>>> Kurt > > > >> > >>>>>> > > > >> > >>>>>> > > > >> > >>>>>> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu < > > reed...@gmail.com> > > > >> wrote: > > > >> > >>>>>> > > > >> > >>>>>>> Thanks Robert for the updates! And thanks a lot for > all > > > the > > > >> efforts > > > >> > >>> to > > > >> > >>>>>>> investigate, experiment and tune Azure Pipelines for > > Flink > > > >> > >> building. > > > >> > >>>>>>> Big +1 for it. > > > >> > >>>>>>> > > > >> > >>>>>>> It would be great that the community building can be > > > extended > > > >> with > > > >> > >>>>>>> custom > > > >> > >>>>>>> machines so that the tests would not be queued for > long > > > with > > > >> daily > > > >> > >>>>>>> growing > > > >> > >>>>>>> PRs. > > > >> > >>>>>>> > > > >> > >>>>>>> The increased timeout would be also very helpful. > > > >> > >>>>>>> The 50min timeout for free travis accounts is a pain > > > currently, > > > >> > >>>>>>> especially > > > >> > >>>>>>> when we'd like to run e2e tests in our own travis. > And I > > > had to > > > >> > >>>>>>> manually > > > >> > >>>>>>> split the jobs to make it possible to pass. > > > >> > >>>>>>> > > > >> > >>>>>>> Thanks, > > > >> > >>>>>>> Zhu Zhu > > > >> > >>>>>>> > > > >> > >>>>>>> Robert Metzger <rmetz...@apache.org> 于2019年12月4日周三 > > > 下午6:36写道: > > > >> > >>>>>>> > > > >> > >>>>>>>> Hi all, > > > >> > >>>>>>>> > > > >> > >>>>>>>> as a follow up from our discussion on reducing the > > build > > > time > > > >> > >> [1], I > > > >> > >>>>>>> would > > > >> > >>>>>>>> like to propose migrating our build infrastructure to > > > Azure > > > >> > >>> Pipelines > > > >> > >>>>>>> (away > > > >> > >>>>>>>> from Travis). > > > >> > >>>>>>>> > > > >> > >>>>>>>> I believe that we have reached the limits of what > > Travis > > > can > > > >> > >>>>>>>> provide the > > > >> > >>>>>>>> Flink community, and I don't want the build system to > > > limit or > > > >> > >>>>>>>> influence > > > >> > >>>>>>>> the project's growth. > > > >> > >>>>>>>> > > > >> > >>>>>>>> *Benefits:* > > > >> > >>>>>>>> 1. The free Travis account are limited to 5 parallel > > > builds, > > > >> with > > > >> > >> a > > > >> > >>>>>>> timeout > > > >> > >>>>>>>> of 50 minutes. Azure offers *10 parallel builds with > > 300 > > > minute > > > >> > >>>>>>>> timeouts > > > >> > >>>>>>>> *for > > > >> > >>>>>>>> free for open source projects. > > > >> > >>>>>>>> 2. Azure Pipelines allows us to *add custom build > > > machines* to > > > >> the > > > >> > >>>>>>>> pool > > > >> > >>>>>>> of > > > >> > >>>>>>>> 10 free parallel builders. > > > >> > >>>>>>>> This will allow the Flink community to scale the > > > available > > > >> build > > > >> > >>>>>>>> capacity > > > >> > >>>>>>>> as the project grows. We are dependent on donations > > from > > > >> > >> supporting > > > >> > >>>>>>>> companies, but I believe that it is easier for > > companies > > > to > > > >> donate > > > >> > >>>>>>> machines > > > >> > >>>>>>>> than money. > > > >> > >>>>>>>> Alibaba is willing to provide 10 machines, with 32 > > cores > > > each > > > >> to > > > >> > >> the > > > >> > >>>>>>> Flink > > > >> > >>>>>>>> project for this purpose. > > > >> > >>>>>>>> In addition, Xiyuan, who's working on adding ARM > > support > > > for > > > >> Flink > > > >> > >>>>>>> provided > > > >> > >>>>>>>> me with 2 ARM machines (16 cores each). > > > >> > >>>>>>>> I want to use the custom, more efficient build > machines > > > for > > > >> > >> building > > > >> > >>>>>>>> Flink's pull requests and master-pushes. > > > >> > >>>>>>>> 3. *Azure Pipelines is a more feature-rich tool*, > > > allowing for > > > >> > >>>>>>>> example to > > > >> > >>>>>>>> transfer intermediate build artifacts between > pipeline > > > stages. > > > >> > >> This > > > >> > >>>>>>>> will > > > >> > >>>>>>>> allow us to make the build more reliable (we are > > > currently > > > >> abusing > > > >> > >>> the > > > >> > >>>>>>>> caching mechanism in Travis for this). > > > >> > >>>>>>>> It also has some basic analytics on test results / > > flaky > > > tests > > > >> > >> etc. > > > >> > >>>>>>>> > > > >> > >>>>>>>> *Known problems:* > > > >> > >>>>>>>> - Initially, we might see different build > instabilities > > > than > > > >> > >> before > > > >> > >>>>>>>> - There's a higher maintenance overhead for the > custom > > > build > > > >> > >>> machines > > > >> > >>>>>>>> (keeping them up to date etc.) > > > >> > >>>>>>>> - We can not use the build status integration of AZP, > > > because > > > >> they > > > >> > >>>>>>> require > > > >> > >>>>>>>> write access to the repository's source. The > foundation > > > does > > > >> not > > > >> > >>> allow > > > >> > >>>>>>> that > > > >> > >>>>>>>> [2]. > > > >> > >>>>>>>> I propose to extend flinkbot / the flink-ci > repository. > > > >> > >>>>>>>> > > > >> > >>>>>>>> *Current Status:* > > > >> > >>>>>>>> - I'm able [3] to execute [4] the current custom > build > > > scripts > > > >> on > > > >> > >>>>>>>> Azure > > > >> > >>>>>>>> Pipelines: This means that we will have one compile > > > stage, and > > > >> N > > > >> > >>>>>>>> testing > > > >> > >>>>>>>> jobs in the 2nd stage. Currently, we have N=10 > testing > > > jobs. > > > >> > >>>>>>>> The time from the start of a build till all tests > have > > > >> completed > > > >> > >> is > > > >> > >>>>>>>> 1h22 > > > >> > >>>>>>>> minutes. > > > >> > >>>>>>>> - I'm working on getting the nightly end to end tests > > to > > > run on > > > >> > >> the > > > >> > >>>>>>>> new > > > >> > >>>>>>>> infrastructure. > > > >> > >>>>>>>> - I'm working on getting the build to work on our > pool > > of > > > >> custom > > > >> > >>>>>>>> machines > > > >> > >>>>>>>> as well > > > >> > >>>>>>>> - I'm working on setting up the full matrix of builds > > > >> (different > > > >> > >>>>>>>> scala, > > > >> > >>>>>>>> hadoop etc. versions) for the nightlies > > > >> > >>>>>>>> > > > >> > >>>>>>>> *Next Steps:* > > > >> > >>>>>>>> - I propose to document the entire build system in > the > > > Flink > > > >> Wiki > > > >> > >>>>>>>> - Once Azure can cover the same pull request tests as > > > Travis, I > > > >> > >>>>>>>> would set > > > >> > >>>>>>>> it up to run in parallel (including Flinkbot posting > > > links to > > > >> > >>>>>>>> Azure). I > > > >> > >>>>>>>> hope that this phase lasts for 1-2 weeks only, so > that > > > we do > > > >> not > > > >> > >>>>>>>> have to > > > >> > >>>>>>>> maintain things concurrently. I will monitor the > build > > > >> stability > > > >> > >>>>>>>> closely, > > > >> > >>>>>>>> but would expect some support with debugging > potential > > > issues > > > >> from > > > >> > >>> the > > > >> > >>>>>>>> contributors. > > > >> > >>>>>>>> - Once there are no problems with the new setup, we > > > remove the > > > >> > >>> Travis > > > >> > >>>>>>>> setup. > > > >> > >>>>>>>> - Independently, I will work on triggering builds > from > > > master / > > > >> > >>>>>>>> release - > > > >> > >>>>>>>> branch pushes, as well as cron builds from the master > > > branch > > > >> ... > > > >> > >>>>>>>> all this > > > >> > >>>>>>>> will be described in the Wiki. > > > >> > >>>>>>>> > > > >> > >>>>>>>> > > > >> > >>>>>>>> *Timeline:*- Once I have the feeling that people are > > > >> supportive of > > > >> > >>> the > > > >> > >>>>>>>> idea, I will start documenting in the Wiki. The first > > > pull > > > >> > >> requests > > > >> > >>>>>>> should > > > >> > >>>>>>>> show up after a few more days. > > > >> > >>>>>>>> I will do a one month parental leave starting some > time > > > later > > > >> in > > > >> > >>>>>>> December, > > > >> > >>>>>>>> which will probably delay things a bit. I hope to > have > > > >> everything > > > >> > >>>>>>> finished > > > >> > >>>>>>>> by end of January. > > > >> > >>>>>>>> > > > >> > >>>>>>>> I'm happy to hear your thoughts on this work. > > > >> > >>>>>>>> If nobody objects, I will start documenting the > system > > > and > > > >> prepare > > > >> > >>>>>>>> everything for the migration. > > > >> > >>>>>>>> > > > >> > >>>>>>>> Best, > > > >> > >>>>>>>> Robert > > > >> > >>>>>>>> > > > >> > >>>>>>>> > > > >> > >>>>>>>> > > > >> > >>>>>>>> [1] > > > >> > >>>>>>>> > > > >> > >>>>>>>> > > > >> > >>>>>>> > > > >> > >>>> > > > >> > >>> > > > >> > >> > > > >> > > > > >> > > > > > > https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E > > > >> > >>>>>>> > > > >> > >>>>>>>> [2] > https://issues.apache.org/jira/browse/INFRA-17030 > > > >> > >>>>>>>> [3] > > > https://github.com/rmetzger/flink/tree/azure_playground > > > >> > >>>>>>>> [4] > > > >> > >>>>>>> > > > >> > >>> > > > >> > > > https://dev.azure.com/rmetzger/Flink/_build?definitionId=4&_a=summary > > > >> > >>>>> > > > >> > >>>>> > > > >> > >>>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>> > > > >> > >> > > > >> > >> > > > >> > >> -- > > > >> > >> Best Regards > > > >> > >> > > > >> > >> Jeff Zhang > > > >> > >> > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > > >