Thanks for your comments Yun.
If there's strong support for idea 2, it would actually make my
life easier: the migration would be easier to do.

I also noticed that the uploads to transfer.sh were broken, but this should
be fixed in the "rmetzger.flink" builds (coming from rmetzger/flink). The
builds in "flink-ci.flink" (coming from flink-ci/flink) might have troubles
with transfer.sh.


On Thu, Dec 5, 2019 at 5:50 PM Yun Tang <myas...@live.com> wrote:

> Hi Robert
>
> Really exciting to see this new more powerful CI tool to get rid of the 50
> minutes limit of traivs-CI free account.
>
> After reading the wiki, I support idea 2 of AZP-setup version-2.
>
> However, after I dig into some failing builds at
> https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view the
> logs of some IT cases which would be uploaded by traivs_watchdog to
> transfer.sh previously.
> I think this feature is also easy to implement in AZP, right?
>
> Best
> Yun Tang
>
> On 12/6/19, 12:19 AM, "Robert Metzger" <rmetz...@apache.org> wrote:
>
>     I've created a first draft of my plans in the wiki:
>
> https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines
> .
>     I'm looking forward to your comments.
>
>     On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger <rmetz...@apache.org>
> wrote:
>
>     > Thank you all for the positive feedback. I will start putting
> together a
>     > page in the wiki.
>     >
>     > @Jark: Azure Pipelines provides a free services, that is even better
> than
>     > what Travis provides for free: 10 parallel builds with 6 hours
> timeouts.
>     >
>     > @Chesnay: I will answer your questions in the yet-to-be-written
>     > documentation in the wiki.
>     >
>     >
>     > On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise <ar...@ververica.com>
> wrote:
>     >
>     >> +1 I had good experiences with Azure pipelines in the past.
>     >>
>     >> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <
> aljos...@apache.org>
>     >> wrote:
>     >>
>     >> > +1
>     >> >
>     >> > Thanks for the effort! The tooling seems to be quite a bit nicer
> and I
>     >> > like that we can grow by adding more machines.
>     >> >
>     >> > Best,
>     >> > Aljoscha
>     >> >
>     >> > > On 5. Dec 2019, at 03:18, Jark Wu <imj...@gmail.com> wrote:
>     >> > >
>     >> > > +1 for Azure pipeline because it promises better performance.
>     >> > >
>     >> > > However, I have 2 concerns:
>     >> > >
>     >> > > 1) Travis provides personal free service for testing personal
>     >> branches.
>     >> > > Usually, contributors use this feature to test PoC or run CRON
> jobs
>     >> for
>     >> > > pull requests.
>     >> > >    Using local machine will cost a lot of time. Does AZP
> provides the
>     >> > same
>     >> > > free service?
>     >> > > 2) Currently, we deployed a webhook [1] to receive Travis CI
> build
>     >> > > notifications [2] and send to bui...@flink.apache.org mailing
> list.
>     >> > >    We need to figure out a way how to send Azure build results
> to the
>     >> > > mailing list. And this [3] might be the way to go.
>     >> > >
>     >> > > builds@f.a.o mailing list
>     >> > >
>     >> > > Best,
>     >> > > Jark
>     >> > >
>     >> > > [1]: https://github.com/wuchong/flink-notification-bot
>     >> > > [2]:
>     >> > >
>     >> >
>     >>
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
>     >> > > [3]:
>     >> > >
>     >> >
>     >>
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
>     >> > >
>     >> > >
>     >> > >
>     >> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang <zjf...@gmail.com>
> wrote:
>     >> > >
>     >> > >> +1
>     >> > >>
>     >> > >> Till Rohrmann <trohrm...@apache.org> 于2019年12月4日周三 下午10:43写道:
>     >> > >>
>     >> > >>> +1 for moving to Azure pipelines as it promises better
> scalability
>     >> and
>     >> > >>> tooling. Looking forward to having faster builds and hence
> shorter
>     >> > >> feedback
>     >> > >>> cycles :-)
>     >> > >>>
>     >> > >>> Cheers,
>     >> > >>> Till
>     >> > >>>
>     >> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <
> ches...@apache.org
>     >> >
>     >> > >>> wrote:
>     >> > >>>
>     >> > >>>> @robert Can you expand how the azure setup interacts with
> CiBot?
>     >> Do we
>     >> > >>>> have to continue mirroring builds into flink-ci? How will the
>     >> cronjob
>     >> > >>>> configuration work? We should have a general idea on how to
>     >> implement
>     >> > >>>> this before proceeding.
>     >> > >>>> Additionally, moving /all /jobs into flink-ci requires
> setting up
>     >> the
>     >> > >>>> environment variables we have; can we set these up via files
> or
>     >> will
>     >> > we
>     >> > >>>> have to give all committers permissions for flink-ci/flink?
>     >> > >>>>
>     >> > >>>> On 04/12/2019 12:55, Chesnay Schepler wrote:
>     >> > >>>>> From what I've seen so far Azure will provide us a better
>     >> experience,
>     >> > >>>>> so I'd say +1 for the transition as a whole.
>     >> > >>>>>
>     >> > >>>>> I'd delay merge at least until the feature branch is cut.
>     >> > >>>>> Given the parental leave it may even make sense to only
> start
>     >> merging
>     >> > >>>>> in January afterwards, to reduce the total time taken for
> the
>     >> > >>> transition.
>     >> > >>>>>
>     >> > >>>>> Reviews could maybe be made earlier, but I'm wondering
> whether
>     >> anyone
>     >> > >>>>> would even have the time at the moment to do so.
>     >> > >>>>>
>     >> > >>>>> On 04/12/2019 12:35, Kurt Young wrote:
>     >> > >>>>>> Thanks Robert for driving this. There is another big pain
> point
>     >> of
>     >> > >>>>>> current
>     >> > >>>>>> travis,
>     >> > >>>>>> which is its cache mechanism will fail from time to time.
> Almost
>     >> > >>>>>> around 50%
>     >> > >>>>>> of
>     >> > >>>>>> the build fails are caused by cache problem. I opened this
> issue
>     >> to
>     >> > >>>>>> travis
>     >> > >>>>>> but
>     >> > >>>>>> got no response yet. So big +1 from my side.
>     >> > >>>>>>
>     >> > >>>>>> Just one comment, it's close to 1.10 feature freeze and we
> will
>     >> > >> spend
>     >> > >>>>>> some
>     >> > >>>>>> time
>     >> > >>>>>> to make tests stable before release. I wish this
> replacement can
>     >> > >>> happen
>     >> > >>>>>> after
>     >> > >>>>>> 1.10 release, otherwise it will be a unstable factor during
>     >> release
>     >> > >>>>>> testing.
>     >> > >>>>>>
>     >> > >>>>>> Best,
>     >> > >>>>>> Kurt
>     >> > >>>>>>
>     >> > >>>>>>
>     >> > >>>>>> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu <reed...@gmail.com>
>     >> wrote:
>     >> > >>>>>>
>     >> > >>>>>>> Thanks Robert for the updates! And thanks a lot for all
> the
>     >> efforts
>     >> > >>> to
>     >> > >>>>>>> investigate, experiment and tune Azure Pipelines for Flink
>     >> > >> building.
>     >> > >>>>>>> Big +1 for it.
>     >> > >>>>>>>
>     >> > >>>>>>> It would be great that the community building can be
> extended
>     >> with
>     >> > >>>>>>> custom
>     >> > >>>>>>> machines so that the tests would not be queued for long
> with
>     >> daily
>     >> > >>>>>>> growing
>     >> > >>>>>>> PRs.
>     >> > >>>>>>>
>     >> > >>>>>>> The increased timeout would be also very helpful.
>     >> > >>>>>>> The 50min timeout for free travis accounts is a pain
> currently,
>     >> > >>>>>>> especially
>     >> > >>>>>>> when we'd like to run e2e tests in our own travis. And I
> had to
>     >> > >>>>>>> manually
>     >> > >>>>>>> split the jobs to make it possible to pass.
>     >> > >>>>>>>
>     >> > >>>>>>> Thanks,
>     >> > >>>>>>> Zhu Zhu
>     >> > >>>>>>>
>     >> > >>>>>>> Robert Metzger <rmetz...@apache.org> 于2019年12月4日周三
> 下午6:36写道:
>     >> > >>>>>>>
>     >> > >>>>>>>> Hi all,
>     >> > >>>>>>>>
>     >> > >>>>>>>> as a follow up from our discussion on reducing the build
> time
>     >> > >> [1], I
>     >> > >>>>>>> would
>     >> > >>>>>>>> like to propose migrating our build infrastructure to
> Azure
>     >> > >>> Pipelines
>     >> > >>>>>>> (away
>     >> > >>>>>>>> from Travis).
>     >> > >>>>>>>>
>     >> > >>>>>>>> I believe that we have reached the limits of what Travis
> can
>     >> > >>>>>>>> provide the
>     >> > >>>>>>>> Flink community, and I don't want the build system to
> limit or
>     >> > >>>>>>>> influence
>     >> > >>>>>>>> the project's growth.
>     >> > >>>>>>>>
>     >> > >>>>>>>> *Benefits:*
>     >> > >>>>>>>> 1. The free Travis account are limited to 5 parallel
> builds,
>     >> with
>     >> > >> a
>     >> > >>>>>>> timeout
>     >> > >>>>>>>> of 50 minutes. Azure offers *10 parallel builds with 300
> minute
>     >> > >>>>>>>> timeouts
>     >> > >>>>>>>> *for
>     >> > >>>>>>>> free for open source projects.
>     >> > >>>>>>>> 2. Azure Pipelines allows us to *add custom build
> machines* to
>     >> the
>     >> > >>>>>>>> pool
>     >> > >>>>>>> of
>     >> > >>>>>>>> 10 free parallel builders.
>     >> > >>>>>>>> This will allow the Flink community to scale the
> available
>     >> build
>     >> > >>>>>>>> capacity
>     >> > >>>>>>>> as the project grows. We are dependent on donations from
>     >> > >> supporting
>     >> > >>>>>>>> companies, but I believe that it is easier for companies
> to
>     >> donate
>     >> > >>>>>>> machines
>     >> > >>>>>>>> than money.
>     >> > >>>>>>>> Alibaba is willing to provide 10 machines, with 32 cores
> each
>     >> to
>     >> > >> the
>     >> > >>>>>>> Flink
>     >> > >>>>>>>> project for this purpose.
>     >> > >>>>>>>> In addition, Xiyuan, who's working on adding ARM support
> for
>     >> Flink
>     >> > >>>>>>> provided
>     >> > >>>>>>>> me with 2 ARM machines (16 cores each).
>     >> > >>>>>>>> I want to use the custom, more efficient build machines
> for
>     >> > >> building
>     >> > >>>>>>>> Flink's pull requests and master-pushes.
>     >> > >>>>>>>> 3. *Azure Pipelines is a more feature-rich tool*,
> allowing for
>     >> > >>>>>>>> example to
>     >> > >>>>>>>> transfer intermediate build artifacts between pipeline
> stages.
>     >> > >> This
>     >> > >>>>>>>> will
>     >> > >>>>>>>> allow us to make the build more reliable (we are
> currently
>     >> abusing
>     >> > >>> the
>     >> > >>>>>>>> caching mechanism in Travis for this).
>     >> > >>>>>>>> It also has some basic analytics on test results / flaky
> tests
>     >> > >> etc.
>     >> > >>>>>>>>
>     >> > >>>>>>>> *Known problems:*
>     >> > >>>>>>>> - Initially, we might see different build instabilities
> than
>     >> > >> before
>     >> > >>>>>>>> - There's a higher maintenance overhead for the custom
> build
>     >> > >>> machines
>     >> > >>>>>>>> (keeping them up to date etc.)
>     >> > >>>>>>>> - We can not use the build status integration of AZP,
> because
>     >> they
>     >> > >>>>>>> require
>     >> > >>>>>>>> write access to the repository's source. The foundation
> does
>     >> not
>     >> > >>> allow
>     >> > >>>>>>> that
>     >> > >>>>>>>> [2].
>     >> > >>>>>>>> I propose to extend flinkbot / the flink-ci repository.
>     >> > >>>>>>>>
>     >> > >>>>>>>> *Current Status:*
>     >> > >>>>>>>> - I'm able [3] to execute [4] the current custom build
> scripts
>     >> on
>     >> > >>>>>>>> Azure
>     >> > >>>>>>>> Pipelines: This means that we will have one compile
> stage, and
>     >> N
>     >> > >>>>>>>> testing
>     >> > >>>>>>>> jobs in the 2nd stage. Currently, we have N=10 testing
> jobs.
>     >> > >>>>>>>> The time from the start of a build till all tests have
>     >> completed
>     >> > >> is
>     >> > >>>>>>>> 1h22
>     >> > >>>>>>>> minutes.
>     >> > >>>>>>>> - I'm working on getting the nightly end to end tests to
> run on
>     >> > >> the
>     >> > >>>>>>>> new
>     >> > >>>>>>>> infrastructure.
>     >> > >>>>>>>> - I'm working on getting the build to work on our pool of
>     >> custom
>     >> > >>>>>>>> machines
>     >> > >>>>>>>> as well
>     >> > >>>>>>>> - I'm working on setting up the full matrix of builds
>     >> (different
>     >> > >>>>>>>> scala,
>     >> > >>>>>>>> hadoop etc. versions) for the nightlies
>     >> > >>>>>>>>
>     >> > >>>>>>>> *Next Steps:*
>     >> > >>>>>>>> - I propose to document the entire build system in the
> Flink
>     >> Wiki
>     >> > >>>>>>>> - Once Azure can cover the same pull request tests as
> Travis, I
>     >> > >>>>>>>> would set
>     >> > >>>>>>>> it up to run in parallel (including Flinkbot posting
> links to
>     >> > >>>>>>>> Azure). I
>     >> > >>>>>>>> hope that this phase lasts for 1-2 weeks only, so that
> we do
>     >> not
>     >> > >>>>>>>> have to
>     >> > >>>>>>>> maintain things concurrently. I will monitor the build
>     >> stability
>     >> > >>>>>>>> closely,
>     >> > >>>>>>>> but would expect some support with debugging potential
> issues
>     >> from
>     >> > >>> the
>     >> > >>>>>>>> contributors.
>     >> > >>>>>>>> - Once there are no problems with the new setup, we
> remove the
>     >> > >>> Travis
>     >> > >>>>>>>> setup.
>     >> > >>>>>>>> - Independently, I will work on triggering builds from
> master /
>     >> > >>>>>>>> release -
>     >> > >>>>>>>> branch pushes, as well as cron builds from the master
> branch
>     >> ...
>     >> > >>>>>>>> all this
>     >> > >>>>>>>> will be described in the Wiki.
>     >> > >>>>>>>>
>     >> > >>>>>>>>
>     >> > >>>>>>>> *Timeline:*- Once I have the feeling that people are
>     >> supportive of
>     >> > >>> the
>     >> > >>>>>>>> idea, I will start documenting in the Wiki. The first
> pull
>     >> > >> requests
>     >> > >>>>>>> should
>     >> > >>>>>>>> show up after a few more days.
>     >> > >>>>>>>> I will do a one month parental leave starting some time
> later
>     >> in
>     >> > >>>>>>> December,
>     >> > >>>>>>>> which will probably delay things a bit. I hope to have
>     >> everything
>     >> > >>>>>>> finished
>     >> > >>>>>>>> by end of January.
>     >> > >>>>>>>>
>     >> > >>>>>>>> I'm happy to hear your thoughts on this work.
>     >> > >>>>>>>> If nobody objects, I will start documenting the system
> and
>     >> prepare
>     >> > >>>>>>>> everything for the migration.
>     >> > >>>>>>>>
>     >> > >>>>>>>> Best,
>     >> > >>>>>>>> Robert
>     >> > >>>>>>>>
>     >> > >>>>>>>>
>     >> > >>>>>>>>
>     >> > >>>>>>>> [1]
>     >> > >>>>>>>>
>     >> > >>>>>>>>
>     >> > >>>>>>>
>     >> > >>>>
>     >> > >>>
>     >> > >>
>     >> >
>     >>
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
>     >> > >>>>>>>
>     >> > >>>>>>>> [2] https://issues.apache.org/jira/browse/INFRA-17030
>     >> > >>>>>>>> [3]
> https://github.com/rmetzger/flink/tree/azure_playground
>     >> > >>>>>>>> [4]
>     >> > >>>>>>>
>     >> > >>>
>     >>
> https://dev.azure.com/rmetzger/Flink/_build?definitionId=4&_a=summary
>     >> > >>>>>
>     >> > >>>>>
>     >> > >>>>>
>     >> > >>>>
>     >> > >>>>
>     >> > >>>
>     >> > >>
>     >> > >>
>     >> > >> --
>     >> > >> Best Regards
>     >> > >>
>     >> > >> Jeff Zhang
>     >> > >>
>     >> >
>     >> >
>     >>
>     >
>
>
>

Reply via email to