Hi Robert Really exciting to see this new more powerful CI tool to get rid of the 50 minutes limit of traivs-CI free account.
After reading the wiki, I support idea 2 of AZP-setup version-2. However, after I dig into some failing builds at https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view the logs of some IT cases which would be uploaded by traivs_watchdog to transfer.sh previously. I think this feature is also easy to implement in AZP, right? Best Yun Tang On 12/6/19, 12:19 AM, "Robert Metzger" <rmetz...@apache.org> wrote: I've created a first draft of my plans in the wiki: https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines. I'm looking forward to your comments. On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger <rmetz...@apache.org> wrote: > Thank you all for the positive feedback. I will start putting together a > page in the wiki. > > @Jark: Azure Pipelines provides a free services, that is even better than > what Travis provides for free: 10 parallel builds with 6 hours timeouts. > > @Chesnay: I will answer your questions in the yet-to-be-written > documentation in the wiki. > > > On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise <ar...@ververica.com> wrote: > >> +1 I had good experiences with Azure pipelines in the past. >> >> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <aljos...@apache.org> >> wrote: >> >> > +1 >> > >> > Thanks for the effort! The tooling seems to be quite a bit nicer and I >> > like that we can grow by adding more machines. >> > >> > Best, >> > Aljoscha >> > >> > > On 5. Dec 2019, at 03:18, Jark Wu <imj...@gmail.com> wrote: >> > > >> > > +1 for Azure pipeline because it promises better performance. >> > > >> > > However, I have 2 concerns: >> > > >> > > 1) Travis provides personal free service for testing personal >> branches. >> > > Usually, contributors use this feature to test PoC or run CRON jobs >> for >> > > pull requests. >> > > Using local machine will cost a lot of time. Does AZP provides the >> > same >> > > free service? >> > > 2) Currently, we deployed a webhook [1] to receive Travis CI build >> > > notifications [2] and send to bui...@flink.apache.org mailing list. >> > > We need to figure out a way how to send Azure build results to the >> > > mailing list. And this [3] might be the way to go. >> > > >> > > builds@f.a.o mailing list >> > > >> > > Best, >> > > Jark >> > > >> > > [1]: https://github.com/wuchong/flink-notification-bot >> > > [2]: >> > > >> > >> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications >> > > [3]: >> > > >> > >> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops >> > > >> > > >> > > >> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang <zjf...@gmail.com> wrote: >> > > >> > >> +1 >> > >> >> > >> Till Rohrmann <trohrm...@apache.org> 于2019年12月4日周三 下午10:43写道: >> > >> >> > >>> +1 for moving to Azure pipelines as it promises better scalability >> and >> > >>> tooling. Looking forward to having faster builds and hence shorter >> > >> feedback >> > >>> cycles :-) >> > >>> >> > >>> Cheers, >> > >>> Till >> > >>> >> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <ches...@apache.org >> > >> > >>> wrote: >> > >>> >> > >>>> @robert Can you expand how the azure setup interacts with CiBot? >> Do we >> > >>>> have to continue mirroring builds into flink-ci? How will the >> cronjob >> > >>>> configuration work? We should have a general idea on how to >> implement >> > >>>> this before proceeding. >> > >>>> Additionally, moving /all /jobs into flink-ci requires setting up >> the >> > >>>> environment variables we have; can we set these up via files or >> will >> > we >> > >>>> have to give all committers permissions for flink-ci/flink? >> > >>>> >> > >>>> On 04/12/2019 12:55, Chesnay Schepler wrote: >> > >>>>> From what I've seen so far Azure will provide us a better >> experience, >> > >>>>> so I'd say +1 for the transition as a whole. >> > >>>>> >> > >>>>> I'd delay merge at least until the feature branch is cut. >> > >>>>> Given the parental leave it may even make sense to only start >> merging >> > >>>>> in January afterwards, to reduce the total time taken for the >> > >>> transition. >> > >>>>> >> > >>>>> Reviews could maybe be made earlier, but I'm wondering whether >> anyone >> > >>>>> would even have the time at the moment to do so. >> > >>>>> >> > >>>>> On 04/12/2019 12:35, Kurt Young wrote: >> > >>>>>> Thanks Robert for driving this. There is another big pain point >> of >> > >>>>>> current >> > >>>>>> travis, >> > >>>>>> which is its cache mechanism will fail from time to time. Almost >> > >>>>>> around 50% >> > >>>>>> of >> > >>>>>> the build fails are caused by cache problem. I opened this issue >> to >> > >>>>>> travis >> > >>>>>> but >> > >>>>>> got no response yet. So big +1 from my side. >> > >>>>>> >> > >>>>>> Just one comment, it's close to 1.10 feature freeze and we will >> > >> spend >> > >>>>>> some >> > >>>>>> time >> > >>>>>> to make tests stable before release. I wish this replacement can >> > >>> happen >> > >>>>>> after >> > >>>>>> 1.10 release, otherwise it will be a unstable factor during >> release >> > >>>>>> testing. >> > >>>>>> >> > >>>>>> Best, >> > >>>>>> Kurt >> > >>>>>> >> > >>>>>> >> > >>>>>> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu <reed...@gmail.com> >> wrote: >> > >>>>>> >> > >>>>>>> Thanks Robert for the updates! And thanks a lot for all the >> efforts >> > >>> to >> > >>>>>>> investigate, experiment and tune Azure Pipelines for Flink >> > >> building. >> > >>>>>>> Big +1 for it. >> > >>>>>>> >> > >>>>>>> It would be great that the community building can be extended >> with >> > >>>>>>> custom >> > >>>>>>> machines so that the tests would not be queued for long with >> daily >> > >>>>>>> growing >> > >>>>>>> PRs. >> > >>>>>>> >> > >>>>>>> The increased timeout would be also very helpful. >> > >>>>>>> The 50min timeout for free travis accounts is a pain currently, >> > >>>>>>> especially >> > >>>>>>> when we'd like to run e2e tests in our own travis. And I had to >> > >>>>>>> manually >> > >>>>>>> split the jobs to make it possible to pass. >> > >>>>>>> >> > >>>>>>> Thanks, >> > >>>>>>> Zhu Zhu >> > >>>>>>> >> > >>>>>>> Robert Metzger <rmetz...@apache.org> 于2019年12月4日周三 下午6:36写道: >> > >>>>>>> >> > >>>>>>>> Hi all, >> > >>>>>>>> >> > >>>>>>>> as a follow up from our discussion on reducing the build time >> > >> [1], I >> > >>>>>>> would >> > >>>>>>>> like to propose migrating our build infrastructure to Azure >> > >>> Pipelines >> > >>>>>>> (away >> > >>>>>>>> from Travis). >> > >>>>>>>> >> > >>>>>>>> I believe that we have reached the limits of what Travis can >> > >>>>>>>> provide the >> > >>>>>>>> Flink community, and I don't want the build system to limit or >> > >>>>>>>> influence >> > >>>>>>>> the project's growth. >> > >>>>>>>> >> > >>>>>>>> *Benefits:* >> > >>>>>>>> 1. The free Travis account are limited to 5 parallel builds, >> with >> > >> a >> > >>>>>>> timeout >> > >>>>>>>> of 50 minutes. Azure offers *10 parallel builds with 300 minute >> > >>>>>>>> timeouts >> > >>>>>>>> *for >> > >>>>>>>> free for open source projects. >> > >>>>>>>> 2. Azure Pipelines allows us to *add custom build machines* to >> the >> > >>>>>>>> pool >> > >>>>>>> of >> > >>>>>>>> 10 free parallel builders. >> > >>>>>>>> This will allow the Flink community to scale the available >> build >> > >>>>>>>> capacity >> > >>>>>>>> as the project grows. We are dependent on donations from >> > >> supporting >> > >>>>>>>> companies, but I believe that it is easier for companies to >> donate >> > >>>>>>> machines >> > >>>>>>>> than money. >> > >>>>>>>> Alibaba is willing to provide 10 machines, with 32 cores each >> to >> > >> the >> > >>>>>>> Flink >> > >>>>>>>> project for this purpose. >> > >>>>>>>> In addition, Xiyuan, who's working on adding ARM support for >> Flink >> > >>>>>>> provided >> > >>>>>>>> me with 2 ARM machines (16 cores each). >> > >>>>>>>> I want to use the custom, more efficient build machines for >> > >> building >> > >>>>>>>> Flink's pull requests and master-pushes. >> > >>>>>>>> 3. *Azure Pipelines is a more feature-rich tool*, allowing for >> > >>>>>>>> example to >> > >>>>>>>> transfer intermediate build artifacts between pipeline stages. >> > >> This >> > >>>>>>>> will >> > >>>>>>>> allow us to make the build more reliable (we are currently >> abusing >> > >>> the >> > >>>>>>>> caching mechanism in Travis for this). >> > >>>>>>>> It also has some basic analytics on test results / flaky tests >> > >> etc. >> > >>>>>>>> >> > >>>>>>>> *Known problems:* >> > >>>>>>>> - Initially, we might see different build instabilities than >> > >> before >> > >>>>>>>> - There's a higher maintenance overhead for the custom build >> > >>> machines >> > >>>>>>>> (keeping them up to date etc.) >> > >>>>>>>> - We can not use the build status integration of AZP, because >> they >> > >>>>>>> require >> > >>>>>>>> write access to the repository's source. The foundation does >> not >> > >>> allow >> > >>>>>>> that >> > >>>>>>>> [2]. >> > >>>>>>>> I propose to extend flinkbot / the flink-ci repository. >> > >>>>>>>> >> > >>>>>>>> *Current Status:* >> > >>>>>>>> - I'm able [3] to execute [4] the current custom build scripts >> on >> > >>>>>>>> Azure >> > >>>>>>>> Pipelines: This means that we will have one compile stage, and >> N >> > >>>>>>>> testing >> > >>>>>>>> jobs in the 2nd stage. Currently, we have N=10 testing jobs. >> > >>>>>>>> The time from the start of a build till all tests have >> completed >> > >> is >> > >>>>>>>> 1h22 >> > >>>>>>>> minutes. >> > >>>>>>>> - I'm working on getting the nightly end to end tests to run on >> > >> the >> > >>>>>>>> new >> > >>>>>>>> infrastructure. >> > >>>>>>>> - I'm working on getting the build to work on our pool of >> custom >> > >>>>>>>> machines >> > >>>>>>>> as well >> > >>>>>>>> - I'm working on setting up the full matrix of builds >> (different >> > >>>>>>>> scala, >> > >>>>>>>> hadoop etc. versions) for the nightlies >> > >>>>>>>> >> > >>>>>>>> *Next Steps:* >> > >>>>>>>> - I propose to document the entire build system in the Flink >> Wiki >> > >>>>>>>> - Once Azure can cover the same pull request tests as Travis, I >> > >>>>>>>> would set >> > >>>>>>>> it up to run in parallel (including Flinkbot posting links to >> > >>>>>>>> Azure). I >> > >>>>>>>> hope that this phase lasts for 1-2 weeks only, so that we do >> not >> > >>>>>>>> have to >> > >>>>>>>> maintain things concurrently. I will monitor the build >> stability >> > >>>>>>>> closely, >> > >>>>>>>> but would expect some support with debugging potential issues >> from >> > >>> the >> > >>>>>>>> contributors. >> > >>>>>>>> - Once there are no problems with the new setup, we remove the >> > >>> Travis >> > >>>>>>>> setup. >> > >>>>>>>> - Independently, I will work on triggering builds from master / >> > >>>>>>>> release - >> > >>>>>>>> branch pushes, as well as cron builds from the master branch >> ... >> > >>>>>>>> all this >> > >>>>>>>> will be described in the Wiki. >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> *Timeline:*- Once I have the feeling that people are >> supportive of >> > >>> the >> > >>>>>>>> idea, I will start documenting in the Wiki. The first pull >> > >> requests >> > >>>>>>> should >> > >>>>>>>> show up after a few more days. >> > >>>>>>>> I will do a one month parental leave starting some time later >> in >> > >>>>>>> December, >> > >>>>>>>> which will probably delay things a bit. I hope to have >> everything >> > >>>>>>> finished >> > >>>>>>>> by end of January. >> > >>>>>>>> >> > >>>>>>>> I'm happy to hear your thoughts on this work. >> > >>>>>>>> If nobody objects, I will start documenting the system and >> prepare >> > >>>>>>>> everything for the migration. >> > >>>>>>>> >> > >>>>>>>> Best, >> > >>>>>>>> Robert >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> [1] >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>> >> > >>>> >> > >>> >> > >> >> > >> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E >> > >>>>>>> >> > >>>>>>>> [2] https://issues.apache.org/jira/browse/INFRA-17030 >> > >>>>>>>> [3] https://github.com/rmetzger/flink/tree/azure_playground >> > >>>>>>>> [4] >> > >>>>>>> >> > >>> >> https://dev.azure.com/rmetzger/Flink/_build?definitionId=4&_a=summary >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>> >> > >>>> >> > >>> >> > >> >> > >> >> > >> -- >> > >> Best Regards >> > >> >> > >> Jeff Zhang >> > >> >> > >> > >> >