Hi Robert

Really exciting to see this new more powerful CI tool to get rid of the 50 
minutes limit of traivs-CI free account.

After reading the wiki, I support idea 2 of AZP-setup version-2. 

However, after I dig into some failing builds at 
https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view the logs 
of some IT cases which would be uploaded by traivs_watchdog to transfer.sh 
previously.
I think this feature is also easy to implement in AZP, right?

Best
Yun Tang

On 12/6/19, 12:19 AM, "Robert Metzger" <rmetz...@apache.org> wrote:

    I've created a first draft of my plans in the wiki:
    
https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines.
    I'm looking forward to your comments.
    
    On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger <rmetz...@apache.org> wrote:
    
    > Thank you all for the positive feedback. I will start putting together a
    > page in the wiki.
    >
    > @Jark: Azure Pipelines provides a free services, that is even better than
    > what Travis provides for free: 10 parallel builds with 6 hours timeouts.
    >
    > @Chesnay: I will answer your questions in the yet-to-be-written
    > documentation in the wiki.
    >
    >
    > On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise <ar...@ververica.com> wrote:
    >
    >> +1 I had good experiences with Azure pipelines in the past.
    >>
    >> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <aljos...@apache.org>
    >> wrote:
    >>
    >> > +1
    >> >
    >> > Thanks for the effort! The tooling seems to be quite a bit nicer and I
    >> > like that we can grow by adding more machines.
    >> >
    >> > Best,
    >> > Aljoscha
    >> >
    >> > > On 5. Dec 2019, at 03:18, Jark Wu <imj...@gmail.com> wrote:
    >> > >
    >> > > +1 for Azure pipeline because it promises better performance.
    >> > >
    >> > > However, I have 2 concerns:
    >> > >
    >> > > 1) Travis provides personal free service for testing personal
    >> branches.
    >> > > Usually, contributors use this feature to test PoC or run CRON jobs
    >> for
    >> > > pull requests.
    >> > >    Using local machine will cost a lot of time. Does AZP provides the
    >> > same
    >> > > free service?
    >> > > 2) Currently, we deployed a webhook [1] to receive Travis CI build
    >> > > notifications [2] and send to bui...@flink.apache.org mailing list.
    >> > >    We need to figure out a way how to send Azure build results to the
    >> > > mailing list. And this [3] might be the way to go.
    >> > >
    >> > > builds@f.a.o mailing list
    >> > >
    >> > > Best,
    >> > > Jark
    >> > >
    >> > > [1]: https://github.com/wuchong/flink-notification-bot
    >> > > [2]:
    >> > >
    >> >
    >> 
https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
    >> > > [3]:
    >> > >
    >> >
    >> 
https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
    >> > >
    >> > >
    >> > >
    >> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang <zjf...@gmail.com> wrote:
    >> > >
    >> > >> +1
    >> > >>
    >> > >> Till Rohrmann <trohrm...@apache.org> 于2019年12月4日周三 下午10:43写道:
    >> > >>
    >> > >>> +1 for moving to Azure pipelines as it promises better scalability
    >> and
    >> > >>> tooling. Looking forward to having faster builds and hence shorter
    >> > >> feedback
    >> > >>> cycles :-)
    >> > >>>
    >> > >>> Cheers,
    >> > >>> Till
    >> > >>>
    >> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <ches...@apache.org
    >> >
    >> > >>> wrote:
    >> > >>>
    >> > >>>> @robert Can you expand how the azure setup interacts with CiBot?
    >> Do we
    >> > >>>> have to continue mirroring builds into flink-ci? How will the
    >> cronjob
    >> > >>>> configuration work? We should have a general idea on how to
    >> implement
    >> > >>>> this before proceeding.
    >> > >>>> Additionally, moving /all /jobs into flink-ci requires setting up
    >> the
    >> > >>>> environment variables we have; can we set these up via files or
    >> will
    >> > we
    >> > >>>> have to give all committers permissions for flink-ci/flink?
    >> > >>>>
    >> > >>>> On 04/12/2019 12:55, Chesnay Schepler wrote:
    >> > >>>>> From what I've seen so far Azure will provide us a better
    >> experience,
    >> > >>>>> so I'd say +1 for the transition as a whole.
    >> > >>>>>
    >> > >>>>> I'd delay merge at least until the feature branch is cut.
    >> > >>>>> Given the parental leave it may even make sense to only start
    >> merging
    >> > >>>>> in January afterwards, to reduce the total time taken for the
    >> > >>> transition.
    >> > >>>>>
    >> > >>>>> Reviews could maybe be made earlier, but I'm wondering whether
    >> anyone
    >> > >>>>> would even have the time at the moment to do so.
    >> > >>>>>
    >> > >>>>> On 04/12/2019 12:35, Kurt Young wrote:
    >> > >>>>>> Thanks Robert for driving this. There is another big pain point
    >> of
    >> > >>>>>> current
    >> > >>>>>> travis,
    >> > >>>>>> which is its cache mechanism will fail from time to time. Almost
    >> > >>>>>> around 50%
    >> > >>>>>> of
    >> > >>>>>> the build fails are caused by cache problem. I opened this issue
    >> to
    >> > >>>>>> travis
    >> > >>>>>> but
    >> > >>>>>> got no response yet. So big +1 from my side.
    >> > >>>>>>
    >> > >>>>>> Just one comment, it's close to 1.10 feature freeze and we will
    >> > >> spend
    >> > >>>>>> some
    >> > >>>>>> time
    >> > >>>>>> to make tests stable before release. I wish this replacement can
    >> > >>> happen
    >> > >>>>>> after
    >> > >>>>>> 1.10 release, otherwise it will be a unstable factor during
    >> release
    >> > >>>>>> testing.
    >> > >>>>>>
    >> > >>>>>> Best,
    >> > >>>>>> Kurt
    >> > >>>>>>
    >> > >>>>>>
    >> > >>>>>> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu <reed...@gmail.com>
    >> wrote:
    >> > >>>>>>
    >> > >>>>>>> Thanks Robert for the updates! And thanks a lot for all the
    >> efforts
    >> > >>> to
    >> > >>>>>>> investigate, experiment and tune Azure Pipelines for Flink
    >> > >> building.
    >> > >>>>>>> Big +1 for it.
    >> > >>>>>>>
    >> > >>>>>>> It would be great that the community building can be extended
    >> with
    >> > >>>>>>> custom
    >> > >>>>>>> machines so that the tests would not be queued for long with
    >> daily
    >> > >>>>>>> growing
    >> > >>>>>>> PRs.
    >> > >>>>>>>
    >> > >>>>>>> The increased timeout would be also very helpful.
    >> > >>>>>>> The 50min timeout for free travis accounts is a pain currently,
    >> > >>>>>>> especially
    >> > >>>>>>> when we'd like to run e2e tests in our own travis. And I had to
    >> > >>>>>>> manually
    >> > >>>>>>> split the jobs to make it possible to pass.
    >> > >>>>>>>
    >> > >>>>>>> Thanks,
    >> > >>>>>>> Zhu Zhu
    >> > >>>>>>>
    >> > >>>>>>> Robert Metzger <rmetz...@apache.org> 于2019年12月4日周三 下午6:36写道:
    >> > >>>>>>>
    >> > >>>>>>>> Hi all,
    >> > >>>>>>>>
    >> > >>>>>>>> as a follow up from our discussion on reducing the build time
    >> > >> [1], I
    >> > >>>>>>> would
    >> > >>>>>>>> like to propose migrating our build infrastructure to Azure
    >> > >>> Pipelines
    >> > >>>>>>> (away
    >> > >>>>>>>> from Travis).
    >> > >>>>>>>>
    >> > >>>>>>>> I believe that we have reached the limits of what Travis can
    >> > >>>>>>>> provide the
    >> > >>>>>>>> Flink community, and I don't want the build system to limit or
    >> > >>>>>>>> influence
    >> > >>>>>>>> the project's growth.
    >> > >>>>>>>>
    >> > >>>>>>>> *Benefits:*
    >> > >>>>>>>> 1. The free Travis account are limited to 5 parallel builds,
    >> with
    >> > >> a
    >> > >>>>>>> timeout
    >> > >>>>>>>> of 50 minutes. Azure offers *10 parallel builds with 300 
minute
    >> > >>>>>>>> timeouts
    >> > >>>>>>>> *for
    >> > >>>>>>>> free for open source projects.
    >> > >>>>>>>> 2. Azure Pipelines allows us to *add custom build machines* to
    >> the
    >> > >>>>>>>> pool
    >> > >>>>>>> of
    >> > >>>>>>>> 10 free parallel builders.
    >> > >>>>>>>> This will allow the Flink community to scale the available
    >> build
    >> > >>>>>>>> capacity
    >> > >>>>>>>> as the project grows. We are dependent on donations from
    >> > >> supporting
    >> > >>>>>>>> companies, but I believe that it is easier for companies to
    >> donate
    >> > >>>>>>> machines
    >> > >>>>>>>> than money.
    >> > >>>>>>>> Alibaba is willing to provide 10 machines, with 32 cores each
    >> to
    >> > >> the
    >> > >>>>>>> Flink
    >> > >>>>>>>> project for this purpose.
    >> > >>>>>>>> In addition, Xiyuan, who's working on adding ARM support for
    >> Flink
    >> > >>>>>>> provided
    >> > >>>>>>>> me with 2 ARM machines (16 cores each).
    >> > >>>>>>>> I want to use the custom, more efficient build machines for
    >> > >> building
    >> > >>>>>>>> Flink's pull requests and master-pushes.
    >> > >>>>>>>> 3. *Azure Pipelines is a more feature-rich tool*, allowing for
    >> > >>>>>>>> example to
    >> > >>>>>>>> transfer intermediate build artifacts between pipeline stages.
    >> > >> This
    >> > >>>>>>>> will
    >> > >>>>>>>> allow us to make the build more reliable (we are currently
    >> abusing
    >> > >>> the
    >> > >>>>>>>> caching mechanism in Travis for this).
    >> > >>>>>>>> It also has some basic analytics on test results / flaky tests
    >> > >> etc.
    >> > >>>>>>>>
    >> > >>>>>>>> *Known problems:*
    >> > >>>>>>>> - Initially, we might see different build instabilities than
    >> > >> before
    >> > >>>>>>>> - There's a higher maintenance overhead for the custom build
    >> > >>> machines
    >> > >>>>>>>> (keeping them up to date etc.)
    >> > >>>>>>>> - We can not use the build status integration of AZP, because
    >> they
    >> > >>>>>>> require
    >> > >>>>>>>> write access to the repository's source. The foundation does
    >> not
    >> > >>> allow
    >> > >>>>>>> that
    >> > >>>>>>>> [2].
    >> > >>>>>>>> I propose to extend flinkbot / the flink-ci repository.
    >> > >>>>>>>>
    >> > >>>>>>>> *Current Status:*
    >> > >>>>>>>> - I'm able [3] to execute [4] the current custom build scripts
    >> on
    >> > >>>>>>>> Azure
    >> > >>>>>>>> Pipelines: This means that we will have one compile stage, and
    >> N
    >> > >>>>>>>> testing
    >> > >>>>>>>> jobs in the 2nd stage. Currently, we have N=10 testing jobs.
    >> > >>>>>>>> The time from the start of a build till all tests have
    >> completed
    >> > >> is
    >> > >>>>>>>> 1h22
    >> > >>>>>>>> minutes.
    >> > >>>>>>>> - I'm working on getting the nightly end to end tests to run 
on
    >> > >> the
    >> > >>>>>>>> new
    >> > >>>>>>>> infrastructure.
    >> > >>>>>>>> - I'm working on getting the build to work on our pool of
    >> custom
    >> > >>>>>>>> machines
    >> > >>>>>>>> as well
    >> > >>>>>>>> - I'm working on setting up the full matrix of builds
    >> (different
    >> > >>>>>>>> scala,
    >> > >>>>>>>> hadoop etc. versions) for the nightlies
    >> > >>>>>>>>
    >> > >>>>>>>> *Next Steps:*
    >> > >>>>>>>> - I propose to document the entire build system in the Flink
    >> Wiki
    >> > >>>>>>>> - Once Azure can cover the same pull request tests as Travis, 
I
    >> > >>>>>>>> would set
    >> > >>>>>>>> it up to run in parallel (including Flinkbot posting links to
    >> > >>>>>>>> Azure). I
    >> > >>>>>>>> hope that this phase lasts for 1-2 weeks only, so that we do
    >> not
    >> > >>>>>>>> have to
    >> > >>>>>>>> maintain things concurrently. I will monitor the build
    >> stability
    >> > >>>>>>>> closely,
    >> > >>>>>>>> but would expect some support with debugging potential issues
    >> from
    >> > >>> the
    >> > >>>>>>>> contributors.
    >> > >>>>>>>> - Once there are no problems with the new setup, we remove the
    >> > >>> Travis
    >> > >>>>>>>> setup.
    >> > >>>>>>>> - Independently, I will work on triggering builds from master 
/
    >> > >>>>>>>> release -
    >> > >>>>>>>> branch pushes, as well as cron builds from the master branch
    >> ...
    >> > >>>>>>>> all this
    >> > >>>>>>>> will be described in the Wiki.
    >> > >>>>>>>>
    >> > >>>>>>>>
    >> > >>>>>>>> *Timeline:*- Once I have the feeling that people are
    >> supportive of
    >> > >>> the
    >> > >>>>>>>> idea, I will start documenting in the Wiki. The first pull
    >> > >> requests
    >> > >>>>>>> should
    >> > >>>>>>>> show up after a few more days.
    >> > >>>>>>>> I will do a one month parental leave starting some time later
    >> in
    >> > >>>>>>> December,
    >> > >>>>>>>> which will probably delay things a bit. I hope to have
    >> everything
    >> > >>>>>>> finished
    >> > >>>>>>>> by end of January.
    >> > >>>>>>>>
    >> > >>>>>>>> I'm happy to hear your thoughts on this work.
    >> > >>>>>>>> If nobody objects, I will start documenting the system and
    >> prepare
    >> > >>>>>>>> everything for the migration.
    >> > >>>>>>>>
    >> > >>>>>>>> Best,
    >> > >>>>>>>> Robert
    >> > >>>>>>>>
    >> > >>>>>>>>
    >> > >>>>>>>>
    >> > >>>>>>>> [1]
    >> > >>>>>>>>
    >> > >>>>>>>>
    >> > >>>>>>>
    >> > >>>>
    >> > >>>
    >> > >>
    >> >
    >> 
https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
    >> > >>>>>>>
    >> > >>>>>>>> [2] https://issues.apache.org/jira/browse/INFRA-17030
    >> > >>>>>>>> [3] https://github.com/rmetzger/flink/tree/azure_playground
    >> > >>>>>>>> [4]
    >> > >>>>>>>
    >> > >>>
    >> https://dev.azure.com/rmetzger/Flink/_build?definitionId=4&_a=summary
    >> > >>>>>
    >> > >>>>>
    >> > >>>>>
    >> > >>>>
    >> > >>>>
    >> > >>>
    >> > >>
    >> > >>
    >> > >> --
    >> > >> Best Regards
    >> > >>
    >> > >> Jeff Zhang
    >> > >>
    >> >
    >> >
    >>
    >
    

Reply via email to