Thank you all for the positive feedback. I will start putting together a
page in the wiki.

@Jark: Azure Pipelines provides a free services, that is even better than
what Travis provides for free: 10 parallel builds with 6 hours timeouts.

@Chesnay: I will answer your questions in the yet-to-be-written
documentation in the wiki.


On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise <ar...@ververica.com> wrote:

> +1 I had good experiences with Azure pipelines in the past.
>
> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
> > +1
> >
> > Thanks for the effort! The tooling seems to be quite a bit nicer and I
> > like that we can grow by adding more machines.
> >
> > Best,
> > Aljoscha
> >
> > > On 5. Dec 2019, at 03:18, Jark Wu <imj...@gmail.com> wrote:
> > >
> > > +1 for Azure pipeline because it promises better performance.
> > >
> > > However, I have 2 concerns:
> > >
> > > 1) Travis provides personal free service for testing personal branches.
> > > Usually, contributors use this feature to test PoC or run CRON jobs for
> > > pull requests.
> > >    Using local machine will cost a lot of time. Does AZP provides the
> > same
> > > free service?
> > > 2) Currently, we deployed a webhook [1] to receive Travis CI build
> > > notifications [2] and send to bui...@flink.apache.org mailing list.
> > >    We need to figure out a way how to send Azure build results to the
> > > mailing list. And this [3] might be the way to go.
> > >
> > > builds@f.a.o mailing list
> > >
> > > Best,
> > > Jark
> > >
> > > [1]: https://github.com/wuchong/flink-notification-bot
> > > [2]:
> > >
> >
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> > > [3]:
> > >
> >
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> > >
> > >
> > >
> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang <zjf...@gmail.com> wrote:
> > >
> > >> +1
> > >>
> > >> Till Rohrmann <trohrm...@apache.org> 于2019年12月4日周三 下午10:43写道:
> > >>
> > >>> +1 for moving to Azure pipelines as it promises better scalability
> and
> > >>> tooling. Looking forward to having faster builds and hence shorter
> > >> feedback
> > >>> cycles :-)
> > >>>
> > >>> Cheers,
> > >>> Till
> > >>>
> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <ches...@apache.org>
> > >>> wrote:
> > >>>
> > >>>> @robert Can you expand how the azure setup interacts with CiBot? Do
> we
> > >>>> have to continue mirroring builds into flink-ci? How will the
> cronjob
> > >>>> configuration work? We should have a general idea on how to
> implement
> > >>>> this before proceeding.
> > >>>> Additionally, moving /all /jobs into flink-ci requires setting up
> the
> > >>>> environment variables we have; can we set these up via files or will
> > we
> > >>>> have to give all committers permissions for flink-ci/flink?
> > >>>>
> > >>>> On 04/12/2019 12:55, Chesnay Schepler wrote:
> > >>>>> From what I've seen so far Azure will provide us a better
> experience,
> > >>>>> so I'd say +1 for the transition as a whole.
> > >>>>>
> > >>>>> I'd delay merge at least until the feature branch is cut.
> > >>>>> Given the parental leave it may even make sense to only start
> merging
> > >>>>> in January afterwards, to reduce the total time taken for the
> > >>> transition.
> > >>>>>
> > >>>>> Reviews could maybe be made earlier, but I'm wondering whether
> anyone
> > >>>>> would even have the time at the moment to do so.
> > >>>>>
> > >>>>> On 04/12/2019 12:35, Kurt Young wrote:
> > >>>>>> Thanks Robert for driving this. There is another big pain point of
> > >>>>>> current
> > >>>>>> travis,
> > >>>>>> which is its cache mechanism will fail from time to time. Almost
> > >>>>>> around 50%
> > >>>>>> of
> > >>>>>> the build fails are caused by cache problem. I opened this issue
> to
> > >>>>>> travis
> > >>>>>> but
> > >>>>>> got no response yet. So big +1 from my side.
> > >>>>>>
> > >>>>>> Just one comment, it's close to 1.10 feature freeze and we will
> > >> spend
> > >>>>>> some
> > >>>>>> time
> > >>>>>> to make tests stable before release. I wish this replacement can
> > >>> happen
> > >>>>>> after
> > >>>>>> 1.10 release, otherwise it will be a unstable factor during
> release
> > >>>>>> testing.
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Kurt
> > >>>>>>
> > >>>>>>
> > >>>>>> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu <reed...@gmail.com> wrote:
> > >>>>>>
> > >>>>>>> Thanks Robert for the updates! And thanks a lot for all the
> efforts
> > >>> to
> > >>>>>>> investigate, experiment and tune Azure Pipelines for Flink
> > >> building.
> > >>>>>>> Big +1 for it.
> > >>>>>>>
> > >>>>>>> It would be great that the community building can be extended
> with
> > >>>>>>> custom
> > >>>>>>> machines so that the tests would not be queued for long with
> daily
> > >>>>>>> growing
> > >>>>>>> PRs.
> > >>>>>>>
> > >>>>>>> The increased timeout would be also very helpful.
> > >>>>>>> The 50min timeout for free travis accounts is a pain currently,
> > >>>>>>> especially
> > >>>>>>> when we'd like to run e2e tests in our own travis. And I had to
> > >>>>>>> manually
> > >>>>>>> split the jobs to make it possible to pass.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Zhu Zhu
> > >>>>>>>
> > >>>>>>> Robert Metzger <rmetz...@apache.org> 于2019年12月4日周三 下午6:36写道:
> > >>>>>>>
> > >>>>>>>> Hi all,
> > >>>>>>>>
> > >>>>>>>> as a follow up from our discussion on reducing the build time
> > >> [1], I
> > >>>>>>> would
> > >>>>>>>> like to propose migrating our build infrastructure to Azure
> > >>> Pipelines
> > >>>>>>> (away
> > >>>>>>>> from Travis).
> > >>>>>>>>
> > >>>>>>>> I believe that we have reached the limits of what Travis can
> > >>>>>>>> provide the
> > >>>>>>>> Flink community, and I don't want the build system to limit or
> > >>>>>>>> influence
> > >>>>>>>> the project's growth.
> > >>>>>>>>
> > >>>>>>>> *Benefits:*
> > >>>>>>>> 1. The free Travis account are limited to 5 parallel builds,
> with
> > >> a
> > >>>>>>> timeout
> > >>>>>>>> of 50 minutes. Azure offers *10 parallel builds with 300 minute
> > >>>>>>>> timeouts
> > >>>>>>>> *for
> > >>>>>>>> free for open source projects.
> > >>>>>>>> 2. Azure Pipelines allows us to *add custom build machines* to
> the
> > >>>>>>>> pool
> > >>>>>>> of
> > >>>>>>>> 10 free parallel builders.
> > >>>>>>>> This will allow the Flink community to scale the available build
> > >>>>>>>> capacity
> > >>>>>>>> as the project grows. We are dependent on donations from
> > >> supporting
> > >>>>>>>> companies, but I believe that it is easier for companies to
> donate
> > >>>>>>> machines
> > >>>>>>>> than money.
> > >>>>>>>> Alibaba is willing to provide 10 machines, with 32 cores each to
> > >> the
> > >>>>>>> Flink
> > >>>>>>>> project for this purpose.
> > >>>>>>>> In addition, Xiyuan, who's working on adding ARM support for
> Flink
> > >>>>>>> provided
> > >>>>>>>> me with 2 ARM machines (16 cores each).
> > >>>>>>>> I want to use the custom, more efficient build machines for
> > >> building
> > >>>>>>>> Flink's pull requests and master-pushes.
> > >>>>>>>> 3. *Azure Pipelines is a more feature-rich tool*, allowing for
> > >>>>>>>> example to
> > >>>>>>>> transfer intermediate build artifacts between pipeline stages.
> > >> This
> > >>>>>>>> will
> > >>>>>>>> allow us to make the build more reliable (we are currently
> abusing
> > >>> the
> > >>>>>>>> caching mechanism in Travis for this).
> > >>>>>>>> It also has some basic analytics on test results / flaky tests
> > >> etc.
> > >>>>>>>>
> > >>>>>>>> *Known problems:*
> > >>>>>>>> - Initially, we might see different build instabilities than
> > >> before
> > >>>>>>>> - There's a higher maintenance overhead for the custom build
> > >>> machines
> > >>>>>>>> (keeping them up to date etc.)
> > >>>>>>>> - We can not use the build status integration of AZP, because
> they
> > >>>>>>> require
> > >>>>>>>> write access to the repository's source. The foundation does not
> > >>> allow
> > >>>>>>> that
> > >>>>>>>> [2].
> > >>>>>>>> I propose to extend flinkbot / the flink-ci repository.
> > >>>>>>>>
> > >>>>>>>> *Current Status:*
> > >>>>>>>> - I'm able [3] to execute [4] the current custom build scripts
> on
> > >>>>>>>> Azure
> > >>>>>>>> Pipelines: This means that we will have one compile stage, and N
> > >>>>>>>> testing
> > >>>>>>>> jobs in the 2nd stage. Currently, we have N=10 testing jobs.
> > >>>>>>>> The time from the start of a build till all tests have completed
> > >> is
> > >>>>>>>> 1h22
> > >>>>>>>> minutes.
> > >>>>>>>> - I'm working on getting the nightly end to end tests to run on
> > >> the
> > >>>>>>>> new
> > >>>>>>>> infrastructure.
> > >>>>>>>> - I'm working on getting the build to work on our pool of custom
> > >>>>>>>> machines
> > >>>>>>>> as well
> > >>>>>>>> - I'm working on setting up the full matrix of builds (different
> > >>>>>>>> scala,
> > >>>>>>>> hadoop etc. versions) for the nightlies
> > >>>>>>>>
> > >>>>>>>> *Next Steps:*
> > >>>>>>>> - I propose to document the entire build system in the Flink
> Wiki
> > >>>>>>>> - Once Azure can cover the same pull request tests as Travis, I
> > >>>>>>>> would set
> > >>>>>>>> it up to run in parallel (including Flinkbot posting links to
> > >>>>>>>> Azure). I
> > >>>>>>>> hope that this phase lasts for 1-2 weeks only, so that we do not
> > >>>>>>>> have to
> > >>>>>>>> maintain things concurrently. I will monitor the build stability
> > >>>>>>>> closely,
> > >>>>>>>> but would expect some support with debugging potential issues
> from
> > >>> the
> > >>>>>>>> contributors.
> > >>>>>>>> - Once there are no problems with the new setup, we remove the
> > >>> Travis
> > >>>>>>>> setup.
> > >>>>>>>> - Independently, I will work on triggering builds from master /
> > >>>>>>>> release -
> > >>>>>>>> branch pushes, as well as cron builds from the master branch ...
> > >>>>>>>> all this
> > >>>>>>>> will be described in the Wiki.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> *Timeline:*- Once I have the feeling that people are supportive
> of
> > >>> the
> > >>>>>>>> idea, I will start documenting in the Wiki. The first pull
> > >> requests
> > >>>>>>> should
> > >>>>>>>> show up after a few more days.
> > >>>>>>>> I will do a one month parental leave starting some time later in
> > >>>>>>> December,
> > >>>>>>>> which will probably delay things a bit. I hope to have
> everything
> > >>>>>>> finished
> > >>>>>>>> by end of January.
> > >>>>>>>>
> > >>>>>>>> I'm happy to hear your thoughts on this work.
> > >>>>>>>> If nobody objects, I will start documenting the system and
> prepare
> > >>>>>>>> everything for the migration.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Robert
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> [1]
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>
> > >>>
> > >>
> >
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> > >>>>>>>
> > >>>>>>>> [2] https://issues.apache.org/jira/browse/INFRA-17030
> > >>>>>>>> [3] https://github.com/rmetzger/flink/tree/azure_playground
> > >>>>>>>> [4]
> > >>>>>>>
> > >>>
> https://dev.azure.com/rmetzger/Flink/_build?definitionId=4&_a=summary
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >> --
> > >> Best Regards
> > >>
> > >> Jeff Zhang
> > >>
> >
> >
>

Reply via email to