+1 Thanks for the effort! The tooling seems to be quite a bit nicer and I like that we can grow by adding more machines.
Best, Aljoscha > On 5. Dec 2019, at 03:18, Jark Wu <imj...@gmail.com> wrote: > > +1 for Azure pipeline because it promises better performance. > > However, I have 2 concerns: > > 1) Travis provides personal free service for testing personal branches. > Usually, contributors use this feature to test PoC or run CRON jobs for > pull requests. > Using local machine will cost a lot of time. Does AZP provides the same > free service? > 2) Currently, we deployed a webhook [1] to receive Travis CI build > notifications [2] and send to bui...@flink.apache.org mailing list. > We need to figure out a way how to send Azure build results to the > mailing list. And this [3] might be the way to go. > > builds@f.a.o mailing list > > Best, > Jark > > [1]: https://github.com/wuchong/flink-notification-bot > [2]: > https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications > [3]: > https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops > > > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang <zjf...@gmail.com> wrote: > >> +1 >> >> Till Rohrmann <trohrm...@apache.org> 于2019年12月4日周三 下午10:43写道: >> >>> +1 for moving to Azure pipelines as it promises better scalability and >>> tooling. Looking forward to having faster builds and hence shorter >> feedback >>> cycles :-) >>> >>> Cheers, >>> Till >>> >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <ches...@apache.org> >>> wrote: >>> >>>> @robert Can you expand how the azure setup interacts with CiBot? Do we >>>> have to continue mirroring builds into flink-ci? How will the cronjob >>>> configuration work? We should have a general idea on how to implement >>>> this before proceeding. >>>> Additionally, moving /all /jobs into flink-ci requires setting up the >>>> environment variables we have; can we set these up via files or will we >>>> have to give all committers permissions for flink-ci/flink? >>>> >>>> On 04/12/2019 12:55, Chesnay Schepler wrote: >>>>> From what I've seen so far Azure will provide us a better experience, >>>>> so I'd say +1 for the transition as a whole. >>>>> >>>>> I'd delay merge at least until the feature branch is cut. >>>>> Given the parental leave it may even make sense to only start merging >>>>> in January afterwards, to reduce the total time taken for the >>> transition. >>>>> >>>>> Reviews could maybe be made earlier, but I'm wondering whether anyone >>>>> would even have the time at the moment to do so. >>>>> >>>>> On 04/12/2019 12:35, Kurt Young wrote: >>>>>> Thanks Robert for driving this. There is another big pain point of >>>>>> current >>>>>> travis, >>>>>> which is its cache mechanism will fail from time to time. Almost >>>>>> around 50% >>>>>> of >>>>>> the build fails are caused by cache problem. I opened this issue to >>>>>> travis >>>>>> but >>>>>> got no response yet. So big +1 from my side. >>>>>> >>>>>> Just one comment, it's close to 1.10 feature freeze and we will >> spend >>>>>> some >>>>>> time >>>>>> to make tests stable before release. I wish this replacement can >>> happen >>>>>> after >>>>>> 1.10 release, otherwise it will be a unstable factor during release >>>>>> testing. >>>>>> >>>>>> Best, >>>>>> Kurt >>>>>> >>>>>> >>>>>> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu <reed...@gmail.com> wrote: >>>>>> >>>>>>> Thanks Robert for the updates! And thanks a lot for all the efforts >>> to >>>>>>> investigate, experiment and tune Azure Pipelines for Flink >> building. >>>>>>> Big +1 for it. >>>>>>> >>>>>>> It would be great that the community building can be extended with >>>>>>> custom >>>>>>> machines so that the tests would not be queued for long with daily >>>>>>> growing >>>>>>> PRs. >>>>>>> >>>>>>> The increased timeout would be also very helpful. >>>>>>> The 50min timeout for free travis accounts is a pain currently, >>>>>>> especially >>>>>>> when we'd like to run e2e tests in our own travis. And I had to >>>>>>> manually >>>>>>> split the jobs to make it possible to pass. >>>>>>> >>>>>>> Thanks, >>>>>>> Zhu Zhu >>>>>>> >>>>>>> Robert Metzger <rmetz...@apache.org> 于2019年12月4日周三 下午6:36写道: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> as a follow up from our discussion on reducing the build time >> [1], I >>>>>>> would >>>>>>>> like to propose migrating our build infrastructure to Azure >>> Pipelines >>>>>>> (away >>>>>>>> from Travis). >>>>>>>> >>>>>>>> I believe that we have reached the limits of what Travis can >>>>>>>> provide the >>>>>>>> Flink community, and I don't want the build system to limit or >>>>>>>> influence >>>>>>>> the project's growth. >>>>>>>> >>>>>>>> *Benefits:* >>>>>>>> 1. The free Travis account are limited to 5 parallel builds, with >> a >>>>>>> timeout >>>>>>>> of 50 minutes. Azure offers *10 parallel builds with 300 minute >>>>>>>> timeouts >>>>>>>> *for >>>>>>>> free for open source projects. >>>>>>>> 2. Azure Pipelines allows us to *add custom build machines* to the >>>>>>>> pool >>>>>>> of >>>>>>>> 10 free parallel builders. >>>>>>>> This will allow the Flink community to scale the available build >>>>>>>> capacity >>>>>>>> as the project grows. We are dependent on donations from >> supporting >>>>>>>> companies, but I believe that it is easier for companies to donate >>>>>>> machines >>>>>>>> than money. >>>>>>>> Alibaba is willing to provide 10 machines, with 32 cores each to >> the >>>>>>> Flink >>>>>>>> project for this purpose. >>>>>>>> In addition, Xiyuan, who's working on adding ARM support for Flink >>>>>>> provided >>>>>>>> me with 2 ARM machines (16 cores each). >>>>>>>> I want to use the custom, more efficient build machines for >> building >>>>>>>> Flink's pull requests and master-pushes. >>>>>>>> 3. *Azure Pipelines is a more feature-rich tool*, allowing for >>>>>>>> example to >>>>>>>> transfer intermediate build artifacts between pipeline stages. >> This >>>>>>>> will >>>>>>>> allow us to make the build more reliable (we are currently abusing >>> the >>>>>>>> caching mechanism in Travis for this). >>>>>>>> It also has some basic analytics on test results / flaky tests >> etc. >>>>>>>> >>>>>>>> *Known problems:* >>>>>>>> - Initially, we might see different build instabilities than >> before >>>>>>>> - There's a higher maintenance overhead for the custom build >>> machines >>>>>>>> (keeping them up to date etc.) >>>>>>>> - We can not use the build status integration of AZP, because they >>>>>>> require >>>>>>>> write access to the repository's source. The foundation does not >>> allow >>>>>>> that >>>>>>>> [2]. >>>>>>>> I propose to extend flinkbot / the flink-ci repository. >>>>>>>> >>>>>>>> *Current Status:* >>>>>>>> - I'm able [3] to execute [4] the current custom build scripts on >>>>>>>> Azure >>>>>>>> Pipelines: This means that we will have one compile stage, and N >>>>>>>> testing >>>>>>>> jobs in the 2nd stage. Currently, we have N=10 testing jobs. >>>>>>>> The time from the start of a build till all tests have completed >> is >>>>>>>> 1h22 >>>>>>>> minutes. >>>>>>>> - I'm working on getting the nightly end to end tests to run on >> the >>>>>>>> new >>>>>>>> infrastructure. >>>>>>>> - I'm working on getting the build to work on our pool of custom >>>>>>>> machines >>>>>>>> as well >>>>>>>> - I'm working on setting up the full matrix of builds (different >>>>>>>> scala, >>>>>>>> hadoop etc. versions) for the nightlies >>>>>>>> >>>>>>>> *Next Steps:* >>>>>>>> - I propose to document the entire build system in the Flink Wiki >>>>>>>> - Once Azure can cover the same pull request tests as Travis, I >>>>>>>> would set >>>>>>>> it up to run in parallel (including Flinkbot posting links to >>>>>>>> Azure). I >>>>>>>> hope that this phase lasts for 1-2 weeks only, so that we do not >>>>>>>> have to >>>>>>>> maintain things concurrently. I will monitor the build stability >>>>>>>> closely, >>>>>>>> but would expect some support with debugging potential issues from >>> the >>>>>>>> contributors. >>>>>>>> - Once there are no problems with the new setup, we remove the >>> Travis >>>>>>>> setup. >>>>>>>> - Independently, I will work on triggering builds from master / >>>>>>>> release - >>>>>>>> branch pushes, as well as cron builds from the master branch ... >>>>>>>> all this >>>>>>>> will be described in the Wiki. >>>>>>>> >>>>>>>> >>>>>>>> *Timeline:*- Once I have the feeling that people are supportive of >>> the >>>>>>>> idea, I will start documenting in the Wiki. The first pull >> requests >>>>>>> should >>>>>>>> show up after a few more days. >>>>>>>> I will do a one month parental leave starting some time later in >>>>>>> December, >>>>>>>> which will probably delay things a bit. I hope to have everything >>>>>>> finished >>>>>>>> by end of January. >>>>>>>> >>>>>>>> I'm happy to hear your thoughts on this work. >>>>>>>> If nobody objects, I will start documenting the system and prepare >>>>>>>> everything for the migration. >>>>>>>> >>>>>>>> Best, >>>>>>>> Robert >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> [1] >>>>>>>> >>>>>>>> >>>>>>> >>>> >>> >> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E >>>>>>> >>>>>>>> [2] https://issues.apache.org/jira/browse/INFRA-17030 >>>>>>>> [3] https://github.com/rmetzger/flink/tree/azure_playground >>>>>>>> [4] >>>>>>> >>> https://dev.azure.com/rmetzger/Flink/_build?definitionId=4&_a=summary >>>>> >>>>> >>>>> >>>> >>>> >>> >> >> >> -- >> Best Regards >> >> Jeff Zhang >>