+1

Thanks for the effort! The tooling seems to be quite a bit nicer and I like 
that we can grow by adding more machines.

Best,
Aljoscha

> On 5. Dec 2019, at 03:18, Jark Wu <imj...@gmail.com> wrote:
> 
> +1 for Azure pipeline because it promises better performance.
> 
> However, I have 2 concerns:
> 
> 1) Travis provides personal free service for testing personal branches.
> Usually, contributors use this feature to test PoC or run CRON jobs for
> pull requests.
>    Using local machine will cost a lot of time. Does AZP provides the same
> free service?
> 2) Currently, we deployed a webhook [1] to receive Travis CI build
> notifications [2] and send to bui...@flink.apache.org mailing list.
>    We need to figure out a way how to send Azure build results to the
> mailing list. And this [3] might be the way to go.
> 
> builds@f.a.o mailing list
> 
> Best,
> Jark
> 
> [1]: https://github.com/wuchong/flink-notification-bot
> [2]:
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> [3]:
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> 
> 
> 
> On Wed, 4 Dec 2019 at 22:48, Jeff Zhang <zjf...@gmail.com> wrote:
> 
>> +1
>> 
>> Till Rohrmann <trohrm...@apache.org> 于2019年12月4日周三 下午10:43写道:
>> 
>>> +1 for moving to Azure pipelines as it promises better scalability and
>>> tooling. Looking forward to having faster builds and hence shorter
>> feedback
>>> cycles :-)
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <ches...@apache.org>
>>> wrote:
>>> 
>>>> @robert Can you expand how the azure setup interacts with CiBot? Do we
>>>> have to continue mirroring builds into flink-ci? How will the cronjob
>>>> configuration work? We should have a general idea on how to implement
>>>> this before proceeding.
>>>> Additionally, moving /all /jobs into flink-ci requires setting up the
>>>> environment variables we have; can we set these up via files or will we
>>>> have to give all committers permissions for flink-ci/flink?
>>>> 
>>>> On 04/12/2019 12:55, Chesnay Schepler wrote:
>>>>> From what I've seen so far Azure will provide us a better experience,
>>>>> so I'd say +1 for the transition as a whole.
>>>>> 
>>>>> I'd delay merge at least until the feature branch is cut.
>>>>> Given the parental leave it may even make sense to only start merging
>>>>> in January afterwards, to reduce the total time taken for the
>>> transition.
>>>>> 
>>>>> Reviews could maybe be made earlier, but I'm wondering whether anyone
>>>>> would even have the time at the moment to do so.
>>>>> 
>>>>> On 04/12/2019 12:35, Kurt Young wrote:
>>>>>> Thanks Robert for driving this. There is another big pain point of
>>>>>> current
>>>>>> travis,
>>>>>> which is its cache mechanism will fail from time to time. Almost
>>>>>> around 50%
>>>>>> of
>>>>>> the build fails are caused by cache problem. I opened this issue to
>>>>>> travis
>>>>>> but
>>>>>> got no response yet. So big +1 from my side.
>>>>>> 
>>>>>> Just one comment, it's close to 1.10 feature freeze and we will
>> spend
>>>>>> some
>>>>>> time
>>>>>> to make tests stable before release. I wish this replacement can
>>> happen
>>>>>> after
>>>>>> 1.10 release, otherwise it will be a unstable factor during release
>>>>>> testing.
>>>>>> 
>>>>>> Best,
>>>>>> Kurt
>>>>>> 
>>>>>> 
>>>>>> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu <reed...@gmail.com> wrote:
>>>>>> 
>>>>>>> Thanks Robert for the updates! And thanks a lot for all the efforts
>>> to
>>>>>>> investigate, experiment and tune Azure Pipelines for Flink
>> building.
>>>>>>> Big +1 for it.
>>>>>>> 
>>>>>>> It would be great that the community building can be extended with
>>>>>>> custom
>>>>>>> machines so that the tests would not be queued for long with daily
>>>>>>> growing
>>>>>>> PRs.
>>>>>>> 
>>>>>>> The increased timeout would be also very helpful.
>>>>>>> The 50min timeout for free travis accounts is a pain currently,
>>>>>>> especially
>>>>>>> when we'd like to run e2e tests in our own travis. And I had to
>>>>>>> manually
>>>>>>> split the jobs to make it possible to pass.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Zhu Zhu
>>>>>>> 
>>>>>>> Robert Metzger <rmetz...@apache.org> 于2019年12月4日周三 下午6:36写道:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> as a follow up from our discussion on reducing the build time
>> [1], I
>>>>>>> would
>>>>>>>> like to propose migrating our build infrastructure to Azure
>>> Pipelines
>>>>>>> (away
>>>>>>>> from Travis).
>>>>>>>> 
>>>>>>>> I believe that we have reached the limits of what Travis can
>>>>>>>> provide the
>>>>>>>> Flink community, and I don't want the build system to limit or
>>>>>>>> influence
>>>>>>>> the project's growth.
>>>>>>>> 
>>>>>>>> *Benefits:*
>>>>>>>> 1. The free Travis account are limited to 5 parallel builds, with
>> a
>>>>>>> timeout
>>>>>>>> of 50 minutes. Azure offers *10 parallel builds with 300 minute
>>>>>>>> timeouts
>>>>>>>> *for
>>>>>>>> free for open source projects.
>>>>>>>> 2. Azure Pipelines allows us to *add custom build machines* to the
>>>>>>>> pool
>>>>>>> of
>>>>>>>> 10 free parallel builders.
>>>>>>>> This will allow the Flink community to scale the available build
>>>>>>>> capacity
>>>>>>>> as the project grows. We are dependent on donations from
>> supporting
>>>>>>>> companies, but I believe that it is easier for companies to donate
>>>>>>> machines
>>>>>>>> than money.
>>>>>>>> Alibaba is willing to provide 10 machines, with 32 cores each to
>> the
>>>>>>> Flink
>>>>>>>> project for this purpose.
>>>>>>>> In addition, Xiyuan, who's working on adding ARM support for Flink
>>>>>>> provided
>>>>>>>> me with 2 ARM machines (16 cores each).
>>>>>>>> I want to use the custom, more efficient build machines for
>> building
>>>>>>>> Flink's pull requests and master-pushes.
>>>>>>>> 3. *Azure Pipelines is a more feature-rich tool*, allowing for
>>>>>>>> example to
>>>>>>>> transfer intermediate build artifacts between pipeline stages.
>> This
>>>>>>>> will
>>>>>>>> allow us to make the build more reliable (we are currently abusing
>>> the
>>>>>>>> caching mechanism in Travis for this).
>>>>>>>> It also has some basic analytics on test results / flaky tests
>> etc.
>>>>>>>> 
>>>>>>>> *Known problems:*
>>>>>>>> - Initially, we might see different build instabilities than
>> before
>>>>>>>> - There's a higher maintenance overhead for the custom build
>>> machines
>>>>>>>> (keeping them up to date etc.)
>>>>>>>> - We can not use the build status integration of AZP, because they
>>>>>>> require
>>>>>>>> write access to the repository's source. The foundation does not
>>> allow
>>>>>>> that
>>>>>>>> [2].
>>>>>>>> I propose to extend flinkbot / the flink-ci repository.
>>>>>>>> 
>>>>>>>> *Current Status:*
>>>>>>>> - I'm able [3] to execute [4] the current custom build scripts on
>>>>>>>> Azure
>>>>>>>> Pipelines: This means that we will have one compile stage, and N
>>>>>>>> testing
>>>>>>>> jobs in the 2nd stage. Currently, we have N=10 testing jobs.
>>>>>>>> The time from the start of a build till all tests have completed
>> is
>>>>>>>> 1h22
>>>>>>>> minutes.
>>>>>>>> - I'm working on getting the nightly end to end tests to run on
>> the
>>>>>>>> new
>>>>>>>> infrastructure.
>>>>>>>> - I'm working on getting the build to work on our pool of custom
>>>>>>>> machines
>>>>>>>> as well
>>>>>>>> - I'm working on setting up the full matrix of builds (different
>>>>>>>> scala,
>>>>>>>> hadoop etc. versions) for the nightlies
>>>>>>>> 
>>>>>>>> *Next Steps:*
>>>>>>>> - I propose to document the entire build system in the Flink Wiki
>>>>>>>> - Once Azure can cover the same pull request tests as Travis, I
>>>>>>>> would set
>>>>>>>> it up to run in parallel (including Flinkbot posting links to
>>>>>>>> Azure). I
>>>>>>>> hope that this phase lasts for 1-2 weeks only, so that we do not
>>>>>>>> have to
>>>>>>>> maintain things concurrently. I will monitor the build stability
>>>>>>>> closely,
>>>>>>>> but would expect some support with debugging potential issues from
>>> the
>>>>>>>> contributors.
>>>>>>>> - Once there are no problems with the new setup, we remove the
>>> Travis
>>>>>>>> setup.
>>>>>>>> - Independently, I will work on triggering builds from master /
>>>>>>>> release -
>>>>>>>> branch pushes, as well as cron builds from the master branch ...
>>>>>>>> all this
>>>>>>>> will be described in the Wiki.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> *Timeline:*- Once I have the feeling that people are supportive of
>>> the
>>>>>>>> idea, I will start documenting in the Wiki. The first pull
>> requests
>>>>>>> should
>>>>>>>> show up after a few more days.
>>>>>>>> I will do a one month parental leave starting some time later in
>>>>>>> December,
>>>>>>>> which will probably delay things a bit. I hope to have everything
>>>>>>> finished
>>>>>>>> by end of January.
>>>>>>>> 
>>>>>>>> I'm happy to hear your thoughts on this work.
>>>>>>>> If nobody objects, I will start documenting the system and prepare
>>>>>>>> everything for the migration.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Robert
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>> 
>>> 
>> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
>>>>>>> 
>>>>>>>> [2] https://issues.apache.org/jira/browse/INFRA-17030
>>>>>>>> [3] https://github.com/rmetzger/flink/tree/azure_playground
>>>>>>>> [4]
>>>>>>> 
>>> https://dev.azure.com/rmetzger/Flink/_build?definitionId=4&_a=summary
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> --
>> Best Regards
>> 
>> Jeff Zhang
>> 

Reply via email to