Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2020-02-28 Thread Robert Metzger
Quick update on this effort: Since yesterday, I'm experimenting with
running the end to end tests with each pull request and "master" push.
I hope that this helps to uncover issues earlier (without waiting for the
nightly test execution)
The tests run for almost 3 hours, so the overall build status will remain
"PENDING" for quite a while. You should still have the regular compile /
test results quicker (depending on the time of day).
We might run into capacity issues with the end to end test execution for
each PR. I'll be closely monitoring this and report back.

In general, please let me know where if you have any problems with the new
CI setup.
For test failures, I'm happy to fix any issues caused by the build system,
just file a ticket for the new "Build System / Azure Pipelines

" component.




On Mon, Feb 17, 2020 at 12:23 PM Robert Metzger  wrote:

> @Leonard: On Azure, I'm not splitting the execution of the end to end
> tests anymore. We won't have the overhead of compiling the same profile
> multiple times anymore.
>
>
> @all: We have recently merged a first version of the Azure configuration
> files to Flink [1]. This will allow us to build pull requests with all the
> additional checks we had in place for Travis as well.
> In the next few days, I'm going to build pushes and the nightly crons on
> Azure as well.
>
> From now on, you can set up Azure Pipelines for your own Flink fork as
> well, and execute end to end tests there quite easily [2].
> I'll be closely monitoring the new setup in the coming days. Expect some
> smaller issues while not all pull requests have my changes (at some point,
> I will change a configuration in Azure, which will break builds that do not
> have my changes)
> Once Azure is stable, and we have the same features as the Travis build,
> we'll stop processing builds on Travis.
>
>
> [1] https://github.com/apache/flink/pull/10976
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines#id-[preview]AzurePipelines-Runningendtoendtests:
>
> On Mon, Dec 9, 2019 at 2:16 PM Leonard Xu  wrote:
>
>> +1 for the migration.
>> *10 parallel builds with 300 minute timeouts * is very useful for tasks
>> that takes long time like e2e tests.
>> And in Travis, looks like we compile entire project for every cron task
>> even if they use same profile, eg:
>>  `name: e2e - misc - hadoop 2.8
>>   name: e2e - ha - hadoop 2.8
>>   name: e2e - sticky - hadoop 2.8
>>   name: e2e - checkpoints - hadoop 2.8
>>   name: e2e - container - hadoop 2.8
>>   name: e2e - heavy - hadoop 2.8
>>   name: e2e - tpcds - hadoop 2.8`
>> We will compile entire project with profile `hadoop 2.8`  7 times, and
>> every task will take about 25  minutes.
>> @robert @chesnay Should we consider to compile once for multi cron task
>> which have same profile in the new Azure Pipelines?
>>
>> Best,
>> Leonard Xu
>>
>> > On Dec 9, 2019, at 11:57, Congxian Qiu  wrote:
>> >
>> > +1 for migrating to Azure pipelines as this can have shorter build time,
>> > and faster response.
>> >
>> > Best,
>> > Congxian
>> >
>> >
>> > Xiyuan Wang  于2019年12月9日周一 上午10:13写道:
>> >
>> >> Hi Robert,
>> >>  Thanks for bring up this topic. The 2 ARM machines(16cores) which I
>> >> donated is just for POC test. We(Huawei) can donate more once moving to
>> >> official Azure pipeline. :)
>> >>
>> >> Robert Metzger  于2019年12月6日周五 上午3:25写道:
>> >>
>> >>> Thanks for your comments Yun.
>> >>> If there's strong support for idea 2, it would actually make my
>> >>> life easier: the migration would be easier to do.
>> >>>
>> >>> I also noticed that the uploads to transfer.sh were broken, but this
>> >> should
>> >>> be fixed in the "rmetzger.flink" builds (coming from rmetzger/flink).
>> The
>> >>> builds in "flink-ci.flink" (coming from flink-ci/flink) might have
>> >> troubles
>> >>> with transfer.sh.
>> >>>
>> >>>
>> >>> On Thu, Dec 5, 2019 at 5:50 PM Yun Tang  wrote:
>> >>>
>>  Hi Robert
>> 
>>  Really exciting to see this new more powerful CI tool to get rid of
>> the
>> >>> 50
>>  minutes limit of traivs-CI free account.
>> 
>>  After reading the wiki, I support idea 2 of AZP-setup version-2.
>> 
>>  However, after I dig into some failing builds at
>>  https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view
>> >> the
>>  logs of some IT cases which would be uploaded by traivs_watchdog to
>>  transfer.sh previously.
>>  I think this feature is also easy to implement in AZP, right?
>> 
>>  Best
>>  Yun Tang
>> 
>>  On 12/6/19, 12:19 AM, "Robert Metzger"  wrote:
>> 
>> I've created a first draft of my plans in the wiki:
>> 
>> 
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines
>>  .
>> I'm looking forward to your comments.

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2020-02-17 Thread Robert Metzger
@Leonard: On Azure, I'm not splitting the execution of the end to end tests
anymore. We won't have the overhead of compiling the same profile multiple
times anymore.


@all: We have recently merged a first version of the Azure configuration
files to Flink [1]. This will allow us to build pull requests with all the
additional checks we had in place for Travis as well.
In the next few days, I'm going to build pushes and the nightly crons on
Azure as well.

>From now on, you can set up Azure Pipelines for your own Flink fork as
well, and execute end to end tests there quite easily [2].
I'll be closely monitoring the new setup in the coming days. Expect some
smaller issues while not all pull requests have my changes (at some point,
I will change a configuration in Azure, which will break builds that do not
have my changes)
Once Azure is stable, and we have the same features as the Travis build,
we'll stop processing builds on Travis.


[1] https://github.com/apache/flink/pull/10976
[2]
https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines#id-[preview]AzurePipelines-Runningendtoendtests:

On Mon, Dec 9, 2019 at 2:16 PM Leonard Xu  wrote:

> +1 for the migration.
> *10 parallel builds with 300 minute timeouts * is very useful for tasks
> that takes long time like e2e tests.
> And in Travis, looks like we compile entire project for every cron task
> even if they use same profile, eg:
>  `name: e2e - misc - hadoop 2.8
>   name: e2e - ha - hadoop 2.8
>   name: e2e - sticky - hadoop 2.8
>   name: e2e - checkpoints - hadoop 2.8
>   name: e2e - container - hadoop 2.8
>   name: e2e - heavy - hadoop 2.8
>   name: e2e - tpcds - hadoop 2.8`
> We will compile entire project with profile `hadoop 2.8`  7 times, and
> every task will take about 25  minutes.
> @robert @chesnay Should we consider to compile once for multi cron task
> which have same profile in the new Azure Pipelines?
>
> Best,
> Leonard Xu
>
> > On Dec 9, 2019, at 11:57, Congxian Qiu  wrote:
> >
> > +1 for migrating to Azure pipelines as this can have shorter build time,
> > and faster response.
> >
> > Best,
> > Congxian
> >
> >
> > Xiyuan Wang  于2019年12月9日周一 上午10:13写道:
> >
> >> Hi Robert,
> >>  Thanks for bring up this topic. The 2 ARM machines(16cores) which I
> >> donated is just for POC test. We(Huawei) can donate more once moving to
> >> official Azure pipeline. :)
> >>
> >> Robert Metzger  于2019年12月6日周五 上午3:25写道:
> >>
> >>> Thanks for your comments Yun.
> >>> If there's strong support for idea 2, it would actually make my
> >>> life easier: the migration would be easier to do.
> >>>
> >>> I also noticed that the uploads to transfer.sh were broken, but this
> >> should
> >>> be fixed in the "rmetzger.flink" builds (coming from rmetzger/flink).
> The
> >>> builds in "flink-ci.flink" (coming from flink-ci/flink) might have
> >> troubles
> >>> with transfer.sh.
> >>>
> >>>
> >>> On Thu, Dec 5, 2019 at 5:50 PM Yun Tang  wrote:
> >>>
>  Hi Robert
> 
>  Really exciting to see this new more powerful CI tool to get rid of
> the
> >>> 50
>  minutes limit of traivs-CI free account.
> 
>  After reading the wiki, I support idea 2 of AZP-setup version-2.
> 
>  However, after I dig into some failing builds at
>  https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view
> >> the
>  logs of some IT cases which would be uploaded by traivs_watchdog to
>  transfer.sh previously.
>  I think this feature is also easy to implement in AZP, right?
> 
>  Best
>  Yun Tang
> 
>  On 12/6/19, 12:19 AM, "Robert Metzger"  wrote:
> 
> I've created a first draft of my plans in the wiki:
> 
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines
>  .
> I'm looking forward to your comments.
> 
> On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger <
> >> rmetz...@apache.org>
>  wrote:
> 
> > Thank you all for the positive feedback. I will start putting
>  together a
> > page in the wiki.
> >
> > @Jark: Azure Pipelines provides a free services, that is even
> >>> better
>  than
> > what Travis provides for free: 10 parallel builds with 6 hours
>  timeouts.
> >
> > @Chesnay: I will answer your questions in the yet-to-be-written
> > documentation in the wiki.
> >
> >
> > On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise  >>>
>  wrote:
> >
> >> +1 I had good experiences with Azure pipelines in the past.
> >>
> >> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <
>  aljos...@apache.org>
> >> wrote:
> >>
> >>> +1
> >>>
> >>> Thanks for the effort! The tooling seems to be quite a bit
> >> nicer
>  and I
> >>> like that we can grow by adding more machines.
> >>>
> >>> Best,
> >>> Aljoscha
> >>>
>  On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> 
> 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-09 Thread Leonard Xu
+1 for the migration. 
*10 parallel builds with 300 minute timeouts * is very useful for tasks that 
takes long time like e2e tests.
And in Travis, looks like we compile entire project for every cron task even if 
they use same profile, eg:  
 `name: e2e - misc - hadoop 2.8
  name: e2e - ha - hadoop 2.8
  name: e2e - sticky - hadoop 2.8
  name: e2e - checkpoints - hadoop 2.8
  name: e2e - container - hadoop 2.8
  name: e2e - heavy - hadoop 2.8
  name: e2e - tpcds - hadoop 2.8`
We will compile entire project with profile `hadoop 2.8`  7 times, and every 
task will take about 25  minutes.
@robert @chesnay Should we consider to compile once for multi cron task which 
have same profile in the new Azure Pipelines?

Best,
Leonard Xu

> On Dec 9, 2019, at 11:57, Congxian Qiu  wrote:
> 
> +1 for migrating to Azure pipelines as this can have shorter build time,
> and faster response.
> 
> Best,
> Congxian
> 
> 
> Xiyuan Wang  于2019年12月9日周一 上午10:13写道:
> 
>> Hi Robert,
>>  Thanks for bring up this topic. The 2 ARM machines(16cores) which I
>> donated is just for POC test. We(Huawei) can donate more once moving to
>> official Azure pipeline. :)
>> 
>> Robert Metzger  于2019年12月6日周五 上午3:25写道:
>> 
>>> Thanks for your comments Yun.
>>> If there's strong support for idea 2, it would actually make my
>>> life easier: the migration would be easier to do.
>>> 
>>> I also noticed that the uploads to transfer.sh were broken, but this
>> should
>>> be fixed in the "rmetzger.flink" builds (coming from rmetzger/flink). The
>>> builds in "flink-ci.flink" (coming from flink-ci/flink) might have
>> troubles
>>> with transfer.sh.
>>> 
>>> 
>>> On Thu, Dec 5, 2019 at 5:50 PM Yun Tang  wrote:
>>> 
 Hi Robert
 
 Really exciting to see this new more powerful CI tool to get rid of the
>>> 50
 minutes limit of traivs-CI free account.
 
 After reading the wiki, I support idea 2 of AZP-setup version-2.
 
 However, after I dig into some failing builds at
 https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view
>> the
 logs of some IT cases which would be uploaded by traivs_watchdog to
 transfer.sh previously.
 I think this feature is also easy to implement in AZP, right?
 
 Best
 Yun Tang
 
 On 12/6/19, 12:19 AM, "Robert Metzger"  wrote:
 
I've created a first draft of my plans in the wiki:
 
 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines
 .
I'm looking forward to your comments.
 
On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger <
>> rmetz...@apache.org>
 wrote:
 
> Thank you all for the positive feedback. I will start putting
 together a
> page in the wiki.
> 
> @Jark: Azure Pipelines provides a free services, that is even
>>> better
 than
> what Travis provides for free: 10 parallel builds with 6 hours
 timeouts.
> 
> @Chesnay: I will answer your questions in the yet-to-be-written
> documentation in the wiki.
> 
> 
> On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise >> 
 wrote:
> 
>> +1 I had good experiences with Azure pipelines in the past.
>> 
>> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <
 aljos...@apache.org>
>> wrote:
>> 
>>> +1
>>> 
>>> Thanks for the effort! The tooling seems to be quite a bit
>> nicer
 and I
>>> like that we can grow by adding more machines.
>>> 
>>> Best,
>>> Aljoscha
>>> 
 On 5. Dec 2019, at 03:18, Jark Wu  wrote:
 
 +1 for Azure pipeline because it promises better
>> performance.
 
 However, I have 2 concerns:
 
 1) Travis provides personal free service for testing
>> personal
>> branches.
 Usually, contributors use this feature to test PoC or run
>> CRON
 jobs
>> for
 pull requests.
   Using local machine will cost a lot of time. Does AZP
 provides the
>>> same
 free service?
 2) Currently, we deployed a webhook [1] to receive Travis CI
 build
 notifications [2] and send to bui...@flink.apache.org
>> mailing
 list.
   We need to figure out a way how to send Azure build
>> results
 to the
 mailing list. And this [3] might be the way to go.
 
 builds@f.a.o mailing list
 
 Best,
 Jark
 
 [1]: https://github.com/wuchong/flink-notification-bot
 [2]:
 
>>> 
>> 
 
>>> 
>> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
 [3]:
 
>>> 
>> 
 
>>> 
>> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
 
 
 
 On Wed, 4 Dec 2019 at 22:48, Jeff Zhang 
 wrote:
 
> +1
> 
> Till Rohrmann  于2019年12月4日周三
>>> 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-08 Thread Congxian Qiu
+1 for migrating to Azure pipelines as this can have shorter build time,
and faster response.

Best,
Congxian


Xiyuan Wang  于2019年12月9日周一 上午10:13写道:

> Hi Robert,
>   Thanks for bring up this topic. The 2 ARM machines(16cores) which I
> donated is just for POC test. We(Huawei) can donate more once moving to
> official Azure pipeline. :)
>
> Robert Metzger  于2019年12月6日周五 上午3:25写道:
>
> > Thanks for your comments Yun.
> > If there's strong support for idea 2, it would actually make my
> > life easier: the migration would be easier to do.
> >
> > I also noticed that the uploads to transfer.sh were broken, but this
> should
> > be fixed in the "rmetzger.flink" builds (coming from rmetzger/flink). The
> > builds in "flink-ci.flink" (coming from flink-ci/flink) might have
> troubles
> > with transfer.sh.
> >
> >
> > On Thu, Dec 5, 2019 at 5:50 PM Yun Tang  wrote:
> >
> > > Hi Robert
> > >
> > > Really exciting to see this new more powerful CI tool to get rid of the
> > 50
> > > minutes limit of traivs-CI free account.
> > >
> > > After reading the wiki, I support idea 2 of AZP-setup version-2.
> > >
> > > However, after I dig into some failing builds at
> > > https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view
> the
> > > logs of some IT cases which would be uploaded by traivs_watchdog to
> > > transfer.sh previously.
> > > I think this feature is also easy to implement in AZP, right?
> > >
> > > Best
> > > Yun Tang
> > >
> > > On 12/6/19, 12:19 AM, "Robert Metzger"  wrote:
> > >
> > > I've created a first draft of my plans in the wiki:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines
> > > .
> > > I'm looking forward to your comments.
> > >
> > > On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger <
> rmetz...@apache.org>
> > > wrote:
> > >
> > > > Thank you all for the positive feedback. I will start putting
> > > together a
> > > > page in the wiki.
> > > >
> > > > @Jark: Azure Pipelines provides a free services, that is even
> > better
> > > than
> > > > what Travis provides for free: 10 parallel builds with 6 hours
> > > timeouts.
> > > >
> > > > @Chesnay: I will answer your questions in the yet-to-be-written
> > > > documentation in the wiki.
> > > >
> > > >
> > > > On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise  >
> > > wrote:
> > > >
> > > >> +1 I had good experiences with Azure pipelines in the past.
> > > >>
> > > >> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <
> > > aljos...@apache.org>
> > > >> wrote:
> > > >>
> > > >> > +1
> > > >> >
> > > >> > Thanks for the effort! The tooling seems to be quite a bit
> nicer
> > > and I
> > > >> > like that we can grow by adding more machines.
> > > >> >
> > > >> > Best,
> > > >> > Aljoscha
> > > >> >
> > > >> > > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> > > >> > >
> > > >> > > +1 for Azure pipeline because it promises better
> performance.
> > > >> > >
> > > >> > > However, I have 2 concerns:
> > > >> > >
> > > >> > > 1) Travis provides personal free service for testing
> personal
> > > >> branches.
> > > >> > > Usually, contributors use this feature to test PoC or run
> CRON
> > > jobs
> > > >> for
> > > >> > > pull requests.
> > > >> > >Using local machine will cost a lot of time. Does AZP
> > > provides the
> > > >> > same
> > > >> > > free service?
> > > >> > > 2) Currently, we deployed a webhook [1] to receive Travis CI
> > > build
> > > >> > > notifications [2] and send to bui...@flink.apache.org
> mailing
> > > list.
> > > >> > >We need to figure out a way how to send Azure build
> results
> > > to the
> > > >> > > mailing list. And this [3] might be the way to go.
> > > >> > >
> > > >> > > builds@f.a.o mailing list
> > > >> > >
> > > >> > > Best,
> > > >> > > Jark
> > > >> > >
> > > >> > > [1]: https://github.com/wuchong/flink-notification-bot
> > > >> > > [2]:
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> > > >> > > [3]:
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang 
> > > wrote:
> > > >> > >
> > > >> > >> +1
> > > >> > >>
> > > >> > >> Till Rohrmann  于2019年12月4日周三
> > 下午10:43写道:
> > > >> > >>
> > > >> > >>> +1 for moving to Azure pipelines as it promises better
> > > scalability
> > > >> and
> > > >> > >>> tooling. Looking forward to having faster builds and hence
> > > shorter
> > > >> > >> feedback
> > > >> > >>> cycles :-)
> > > >> > >>>
> > > >> > >>> Cheers,
> > > >> > >>> Till
> > > >> > >>>
> > > >> > 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-08 Thread Xiyuan Wang
Hi Robert,
  Thanks for bring up this topic. The 2 ARM machines(16cores) which I
donated is just for POC test. We(Huawei) can donate more once moving to
official Azure pipeline. :)

Robert Metzger  于2019年12月6日周五 上午3:25写道:

> Thanks for your comments Yun.
> If there's strong support for idea 2, it would actually make my
> life easier: the migration would be easier to do.
>
> I also noticed that the uploads to transfer.sh were broken, but this should
> be fixed in the "rmetzger.flink" builds (coming from rmetzger/flink). The
> builds in "flink-ci.flink" (coming from flink-ci/flink) might have troubles
> with transfer.sh.
>
>
> On Thu, Dec 5, 2019 at 5:50 PM Yun Tang  wrote:
>
> > Hi Robert
> >
> > Really exciting to see this new more powerful CI tool to get rid of the
> 50
> > minutes limit of traivs-CI free account.
> >
> > After reading the wiki, I support idea 2 of AZP-setup version-2.
> >
> > However, after I dig into some failing builds at
> > https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view the
> > logs of some IT cases which would be uploaded by traivs_watchdog to
> > transfer.sh previously.
> > I think this feature is also easy to implement in AZP, right?
> >
> > Best
> > Yun Tang
> >
> > On 12/6/19, 12:19 AM, "Robert Metzger"  wrote:
> >
> > I've created a first draft of my plans in the wiki:
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines
> > .
> > I'm looking forward to your comments.
> >
> > On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger 
> > wrote:
> >
> > > Thank you all for the positive feedback. I will start putting
> > together a
> > > page in the wiki.
> > >
> > > @Jark: Azure Pipelines provides a free services, that is even
> better
> > than
> > > what Travis provides for free: 10 parallel builds with 6 hours
> > timeouts.
> > >
> > > @Chesnay: I will answer your questions in the yet-to-be-written
> > > documentation in the wiki.
> > >
> > >
> > > On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise 
> > wrote:
> > >
> > >> +1 I had good experiences with Azure pipelines in the past.
> > >>
> > >> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <
> > aljos...@apache.org>
> > >> wrote:
> > >>
> > >> > +1
> > >> >
> > >> > Thanks for the effort! The tooling seems to be quite a bit nicer
> > and I
> > >> > like that we can grow by adding more machines.
> > >> >
> > >> > Best,
> > >> > Aljoscha
> > >> >
> > >> > > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> > >> > >
> > >> > > +1 for Azure pipeline because it promises better performance.
> > >> > >
> > >> > > However, I have 2 concerns:
> > >> > >
> > >> > > 1) Travis provides personal free service for testing personal
> > >> branches.
> > >> > > Usually, contributors use this feature to test PoC or run CRON
> > jobs
> > >> for
> > >> > > pull requests.
> > >> > >Using local machine will cost a lot of time. Does AZP
> > provides the
> > >> > same
> > >> > > free service?
> > >> > > 2) Currently, we deployed a webhook [1] to receive Travis CI
> > build
> > >> > > notifications [2] and send to bui...@flink.apache.org mailing
> > list.
> > >> > >We need to figure out a way how to send Azure build results
> > to the
> > >> > > mailing list. And this [3] might be the way to go.
> > >> > >
> > >> > > builds@f.a.o mailing list
> > >> > >
> > >> > > Best,
> > >> > > Jark
> > >> > >
> > >> > > [1]: https://github.com/wuchong/flink-notification-bot
> > >> > > [2]:
> > >> > >
> > >> >
> > >>
> >
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> > >> > > [3]:
> > >> > >
> > >> >
> > >>
> >
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang 
> > wrote:
> > >> > >
> > >> > >> +1
> > >> > >>
> > >> > >> Till Rohrmann  于2019年12月4日周三
> 下午10:43写道:
> > >> > >>
> > >> > >>> +1 for moving to Azure pipelines as it promises better
> > scalability
> > >> and
> > >> > >>> tooling. Looking forward to having faster builds and hence
> > shorter
> > >> > >> feedback
> > >> > >>> cycles :-)
> > >> > >>>
> > >> > >>> Cheers,
> > >> > >>> Till
> > >> > >>>
> > >> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <
> > ches...@apache.org
> > >> >
> > >> > >>> wrote:
> > >> > >>>
> > >> >  @robert Can you expand how the azure setup interacts with
> > CiBot?
> > >> Do we
> > >> >  have to continue mirroring builds into flink-ci? How will
> the
> > >> cronjob
> > >> >  configuration work? We should have a general idea on how to
> > >> implement
> > >> >  this before proceeding.
> > >> > 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Robert Metzger
Thanks for your comments Yun.
If there's strong support for idea 2, it would actually make my
life easier: the migration would be easier to do.

I also noticed that the uploads to transfer.sh were broken, but this should
be fixed in the "rmetzger.flink" builds (coming from rmetzger/flink). The
builds in "flink-ci.flink" (coming from flink-ci/flink) might have troubles
with transfer.sh.


On Thu, Dec 5, 2019 at 5:50 PM Yun Tang  wrote:

> Hi Robert
>
> Really exciting to see this new more powerful CI tool to get rid of the 50
> minutes limit of traivs-CI free account.
>
> After reading the wiki, I support idea 2 of AZP-setup version-2.
>
> However, after I dig into some failing builds at
> https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view the
> logs of some IT cases which would be uploaded by traivs_watchdog to
> transfer.sh previously.
> I think this feature is also easy to implement in AZP, right?
>
> Best
> Yun Tang
>
> On 12/6/19, 12:19 AM, "Robert Metzger"  wrote:
>
> I've created a first draft of my plans in the wiki:
>
> https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines
> .
> I'm looking forward to your comments.
>
> On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger 
> wrote:
>
> > Thank you all for the positive feedback. I will start putting
> together a
> > page in the wiki.
> >
> > @Jark: Azure Pipelines provides a free services, that is even better
> than
> > what Travis provides for free: 10 parallel builds with 6 hours
> timeouts.
> >
> > @Chesnay: I will answer your questions in the yet-to-be-written
> > documentation in the wiki.
> >
> >
> > On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise 
> wrote:
> >
> >> +1 I had good experiences with Azure pipelines in the past.
> >>
> >> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek <
> aljos...@apache.org>
> >> wrote:
> >>
> >> > +1
> >> >
> >> > Thanks for the effort! The tooling seems to be quite a bit nicer
> and I
> >> > like that we can grow by adding more machines.
> >> >
> >> > Best,
> >> > Aljoscha
> >> >
> >> > > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> >> > >
> >> > > +1 for Azure pipeline because it promises better performance.
> >> > >
> >> > > However, I have 2 concerns:
> >> > >
> >> > > 1) Travis provides personal free service for testing personal
> >> branches.
> >> > > Usually, contributors use this feature to test PoC or run CRON
> jobs
> >> for
> >> > > pull requests.
> >> > >Using local machine will cost a lot of time. Does AZP
> provides the
> >> > same
> >> > > free service?
> >> > > 2) Currently, we deployed a webhook [1] to receive Travis CI
> build
> >> > > notifications [2] and send to bui...@flink.apache.org mailing
> list.
> >> > >We need to figure out a way how to send Azure build results
> to the
> >> > > mailing list. And this [3] might be the way to go.
> >> > >
> >> > > builds@f.a.o mailing list
> >> > >
> >> > > Best,
> >> > > Jark
> >> > >
> >> > > [1]: https://github.com/wuchong/flink-notification-bot
> >> > > [2]:
> >> > >
> >> >
> >>
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> >> > > [3]:
> >> > >
> >> >
> >>
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> >> > >
> >> > >
> >> > >
> >> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang 
> wrote:
> >> > >
> >> > >> +1
> >> > >>
> >> > >> Till Rohrmann  于2019年12月4日周三 下午10:43写道:
> >> > >>
> >> > >>> +1 for moving to Azure pipelines as it promises better
> scalability
> >> and
> >> > >>> tooling. Looking forward to having faster builds and hence
> shorter
> >> > >> feedback
> >> > >>> cycles :-)
> >> > >>>
> >> > >>> Cheers,
> >> > >>> Till
> >> > >>>
> >> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler <
> ches...@apache.org
> >> >
> >> > >>> wrote:
> >> > >>>
> >> >  @robert Can you expand how the azure setup interacts with
> CiBot?
> >> Do we
> >> >  have to continue mirroring builds into flink-ci? How will the
> >> cronjob
> >> >  configuration work? We should have a general idea on how to
> >> implement
> >> >  this before proceeding.
> >> >  Additionally, moving /all /jobs into flink-ci requires
> setting up
> >> the
> >> >  environment variables we have; can we set these up via files
> or
> >> will
> >> > we
> >> >  have to give all committers permissions for flink-ci/flink?
> >> > 
> >> >  On 04/12/2019 12:55, Chesnay Schepler wrote:
> >> > > From what I've seen so far Azure will provide us a better
> >> experience,
> >> > > so I'd say +1 for the transition as a whole.
> >> > 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Yun Tang
Hi Robert

Really exciting to see this new more powerful CI tool to get rid of the 50 
minutes limit of traivs-CI free account.

After reading the wiki, I support idea 2 of AZP-setup version-2. 

However, after I dig into some failing builds at 
https://dev.azure.com/rmetzger/Flink/_build , I found we cannot view the logs 
of some IT cases which would be uploaded by traivs_watchdog to transfer.sh 
previously.
I think this feature is also easy to implement in AZP, right?

Best
Yun Tang

On 12/6/19, 12:19 AM, "Robert Metzger"  wrote:

I've created a first draft of my plans in the wiki:

https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines.
I'm looking forward to your comments.

On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger  wrote:

> Thank you all for the positive feedback. I will start putting together a
> page in the wiki.
>
> @Jark: Azure Pipelines provides a free services, that is even better than
> what Travis provides for free: 10 parallel builds with 6 hours timeouts.
>
> @Chesnay: I will answer your questions in the yet-to-be-written
> documentation in the wiki.
>
>
> On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise  wrote:
>
>> +1 I had good experiences with Azure pipelines in the past.
>>
>> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek 
>> wrote:
>>
>> > +1
>> >
>> > Thanks for the effort! The tooling seems to be quite a bit nicer and I
>> > like that we can grow by adding more machines.
>> >
>> > Best,
>> > Aljoscha
>> >
>> > > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
>> > >
>> > > +1 for Azure pipeline because it promises better performance.
>> > >
>> > > However, I have 2 concerns:
>> > >
>> > > 1) Travis provides personal free service for testing personal
>> branches.
>> > > Usually, contributors use this feature to test PoC or run CRON jobs
>> for
>> > > pull requests.
>> > >Using local machine will cost a lot of time. Does AZP provides the
>> > same
>> > > free service?
>> > > 2) Currently, we deployed a webhook [1] to receive Travis CI build
>> > > notifications [2] and send to bui...@flink.apache.org mailing list.
>> > >We need to figure out a way how to send Azure build results to the
>> > > mailing list. And this [3] might be the way to go.
>> > >
>> > > builds@f.a.o mailing list
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > [1]: https://github.com/wuchong/flink-notification-bot
>> > > [2]:
>> > >
>> >
>> 
https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
>> > > [3]:
>> > >
>> >
>> 
https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
>> > >
>> > >
>> > >
>> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:
>> > >
>> > >> +1
>> > >>
>> > >> Till Rohrmann  于2019年12月4日周三 下午10:43写道:
>> > >>
>> > >>> +1 for moving to Azure pipelines as it promises better scalability
>> and
>> > >>> tooling. Looking forward to having faster builds and hence shorter
>> > >> feedback
>> > >>> cycles :-)
>> > >>>
>> > >>> Cheers,
>> > >>> Till
>> > >>>
>> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler > >
>> > >>> wrote:
>> > >>>
>> >  @robert Can you expand how the azure setup interacts with CiBot?
>> Do we
>> >  have to continue mirroring builds into flink-ci? How will the
>> cronjob
>> >  configuration work? We should have a general idea on how to
>> implement
>> >  this before proceeding.
>> >  Additionally, moving /all /jobs into flink-ci requires setting up
>> the
>> >  environment variables we have; can we set these up via files or
>> will
>> > we
>> >  have to give all committers permissions for flink-ci/flink?
>> > 
>> >  On 04/12/2019 12:55, Chesnay Schepler wrote:
>> > > From what I've seen so far Azure will provide us a better
>> experience,
>> > > so I'd say +1 for the transition as a whole.
>> > >
>> > > I'd delay merge at least until the feature branch is cut.
>> > > Given the parental leave it may even make sense to only start
>> merging
>> > > in January afterwards, to reduce the total time taken for the
>> > >>> transition.
>> > >
>> > > Reviews could maybe be made earlier, but I'm wondering whether
>> anyone
>> > > would even have the time at the moment to do so.
>> > >
>> > > On 04/12/2019 12:35, Kurt Young wrote:
>> > >> Thanks Robert for driving this. There is another big pain point
>> of
>> > >> current
>> > >> travis,
>> > >> which is its cache mechanism will fail from time to time. Almost
>> > >> around 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Robert Metzger
I've created a first draft of my plans in the wiki:
https://cwiki.apache.org/confluence/display/FLINK/%5Bpreview%5D+Azure+Pipelines.
I'm looking forward to your comments.

On Thu, Dec 5, 2019 at 12:37 PM Robert Metzger  wrote:

> Thank you all for the positive feedback. I will start putting together a
> page in the wiki.
>
> @Jark: Azure Pipelines provides a free services, that is even better than
> what Travis provides for free: 10 parallel builds with 6 hours timeouts.
>
> @Chesnay: I will answer your questions in the yet-to-be-written
> documentation in the wiki.
>
>
> On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise  wrote:
>
>> +1 I had good experiences with Azure pipelines in the past.
>>
>> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek 
>> wrote:
>>
>> > +1
>> >
>> > Thanks for the effort! The tooling seems to be quite a bit nicer and I
>> > like that we can grow by adding more machines.
>> >
>> > Best,
>> > Aljoscha
>> >
>> > > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
>> > >
>> > > +1 for Azure pipeline because it promises better performance.
>> > >
>> > > However, I have 2 concerns:
>> > >
>> > > 1) Travis provides personal free service for testing personal
>> branches.
>> > > Usually, contributors use this feature to test PoC or run CRON jobs
>> for
>> > > pull requests.
>> > >Using local machine will cost a lot of time. Does AZP provides the
>> > same
>> > > free service?
>> > > 2) Currently, we deployed a webhook [1] to receive Travis CI build
>> > > notifications [2] and send to bui...@flink.apache.org mailing list.
>> > >We need to figure out a way how to send Azure build results to the
>> > > mailing list. And this [3] might be the way to go.
>> > >
>> > > builds@f.a.o mailing list
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > [1]: https://github.com/wuchong/flink-notification-bot
>> > > [2]:
>> > >
>> >
>> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
>> > > [3]:
>> > >
>> >
>> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
>> > >
>> > >
>> > >
>> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:
>> > >
>> > >> +1
>> > >>
>> > >> Till Rohrmann  于2019年12月4日周三 下午10:43写道:
>> > >>
>> > >>> +1 for moving to Azure pipelines as it promises better scalability
>> and
>> > >>> tooling. Looking forward to having faster builds and hence shorter
>> > >> feedback
>> > >>> cycles :-)
>> > >>>
>> > >>> Cheers,
>> > >>> Till
>> > >>>
>> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler > >
>> > >>> wrote:
>> > >>>
>> >  @robert Can you expand how the azure setup interacts with CiBot?
>> Do we
>> >  have to continue mirroring builds into flink-ci? How will the
>> cronjob
>> >  configuration work? We should have a general idea on how to
>> implement
>> >  this before proceeding.
>> >  Additionally, moving /all /jobs into flink-ci requires setting up
>> the
>> >  environment variables we have; can we set these up via files or
>> will
>> > we
>> >  have to give all committers permissions for flink-ci/flink?
>> > 
>> >  On 04/12/2019 12:55, Chesnay Schepler wrote:
>> > > From what I've seen so far Azure will provide us a better
>> experience,
>> > > so I'd say +1 for the transition as a whole.
>> > >
>> > > I'd delay merge at least until the feature branch is cut.
>> > > Given the parental leave it may even make sense to only start
>> merging
>> > > in January afterwards, to reduce the total time taken for the
>> > >>> transition.
>> > >
>> > > Reviews could maybe be made earlier, but I'm wondering whether
>> anyone
>> > > would even have the time at the moment to do so.
>> > >
>> > > On 04/12/2019 12:35, Kurt Young wrote:
>> > >> Thanks Robert for driving this. There is another big pain point
>> of
>> > >> current
>> > >> travis,
>> > >> which is its cache mechanism will fail from time to time. Almost
>> > >> around 50%
>> > >> of
>> > >> the build fails are caused by cache problem. I opened this issue
>> to
>> > >> travis
>> > >> but
>> > >> got no response yet. So big +1 from my side.
>> > >>
>> > >> Just one comment, it's close to 1.10 feature freeze and we will
>> > >> spend
>> > >> some
>> > >> time
>> > >> to make tests stable before release. I wish this replacement can
>> > >>> happen
>> > >> after
>> > >> 1.10 release, otherwise it will be a unstable factor during
>> release
>> > >> testing.
>> > >>
>> > >> Best,
>> > >> Kurt
>> > >>
>> > >>
>> > >> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu 
>> wrote:
>> > >>
>> > >>> Thanks Robert for the updates! And thanks a lot for all the
>> efforts
>> > >>> to
>> > >>> investigate, experiment and tune Azure Pipelines for Flink
>> > >> building.
>> > >>> Big +1 for it.
>> > >>>
>> > >>> It would be great that the community building can be extended
>> with
>> > >>> custom
>> > 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Robert Metzger
Thank you all for the positive feedback. I will start putting together a
page in the wiki.

@Jark: Azure Pipelines provides a free services, that is even better than
what Travis provides for free: 10 parallel builds with 6 hours timeouts.

@Chesnay: I will answer your questions in the yet-to-be-written
documentation in the wiki.


On Thu, Dec 5, 2019 at 11:58 AM Arvid Heise  wrote:

> +1 I had good experiences with Azure pipelines in the past.
>
> On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek 
> wrote:
>
> > +1
> >
> > Thanks for the effort! The tooling seems to be quite a bit nicer and I
> > like that we can grow by adding more machines.
> >
> > Best,
> > Aljoscha
> >
> > > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> > >
> > > +1 for Azure pipeline because it promises better performance.
> > >
> > > However, I have 2 concerns:
> > >
> > > 1) Travis provides personal free service for testing personal branches.
> > > Usually, contributors use this feature to test PoC or run CRON jobs for
> > > pull requests.
> > >Using local machine will cost a lot of time. Does AZP provides the
> > same
> > > free service?
> > > 2) Currently, we deployed a webhook [1] to receive Travis CI build
> > > notifications [2] and send to bui...@flink.apache.org mailing list.
> > >We need to figure out a way how to send Azure build results to the
> > > mailing list. And this [3] might be the way to go.
> > >
> > > builds@f.a.o mailing list
> > >
> > > Best,
> > > Jark
> > >
> > > [1]: https://github.com/wuchong/flink-notification-bot
> > > [2]:
> > >
> >
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> > > [3]:
> > >
> >
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> > >
> > >
> > >
> > > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:
> > >
> > >> +1
> > >>
> > >> Till Rohrmann  于2019年12月4日周三 下午10:43写道:
> > >>
> > >>> +1 for moving to Azure pipelines as it promises better scalability
> and
> > >>> tooling. Looking forward to having faster builds and hence shorter
> > >> feedback
> > >>> cycles :-)
> > >>>
> > >>> Cheers,
> > >>> Till
> > >>>
> > >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler 
> > >>> wrote:
> > >>>
> >  @robert Can you expand how the azure setup interacts with CiBot? Do
> we
> >  have to continue mirroring builds into flink-ci? How will the
> cronjob
> >  configuration work? We should have a general idea on how to
> implement
> >  this before proceeding.
> >  Additionally, moving /all /jobs into flink-ci requires setting up
> the
> >  environment variables we have; can we set these up via files or will
> > we
> >  have to give all committers permissions for flink-ci/flink?
> > 
> >  On 04/12/2019 12:55, Chesnay Schepler wrote:
> > > From what I've seen so far Azure will provide us a better
> experience,
> > > so I'd say +1 for the transition as a whole.
> > >
> > > I'd delay merge at least until the feature branch is cut.
> > > Given the parental leave it may even make sense to only start
> merging
> > > in January afterwards, to reduce the total time taken for the
> > >>> transition.
> > >
> > > Reviews could maybe be made earlier, but I'm wondering whether
> anyone
> > > would even have the time at the moment to do so.
> > >
> > > On 04/12/2019 12:35, Kurt Young wrote:
> > >> Thanks Robert for driving this. There is another big pain point of
> > >> current
> > >> travis,
> > >> which is its cache mechanism will fail from time to time. Almost
> > >> around 50%
> > >> of
> > >> the build fails are caused by cache problem. I opened this issue
> to
> > >> travis
> > >> but
> > >> got no response yet. So big +1 from my side.
> > >>
> > >> Just one comment, it's close to 1.10 feature freeze and we will
> > >> spend
> > >> some
> > >> time
> > >> to make tests stable before release. I wish this replacement can
> > >>> happen
> > >> after
> > >> 1.10 release, otherwise it will be a unstable factor during
> release
> > >> testing.
> > >>
> > >> Best,
> > >> Kurt
> > >>
> > >>
> > >> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:
> > >>
> > >>> Thanks Robert for the updates! And thanks a lot for all the
> efforts
> > >>> to
> > >>> investigate, experiment and tune Azure Pipelines for Flink
> > >> building.
> > >>> Big +1 for it.
> > >>>
> > >>> It would be great that the community building can be extended
> with
> > >>> custom
> > >>> machines so that the tests would not be queued for long with
> daily
> > >>> growing
> > >>> PRs.
> > >>>
> > >>> The increased timeout would be also very helpful.
> > >>> The 50min timeout for free travis accounts is a pain currently,
> > >>> especially
> > >>> when we'd like to run e2e tests in our own travis. And I had to
> > >>> manually
> > >>> split 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Arvid Heise
+1 I had good experiences with Azure pipelines in the past.

On Thu, Dec 5, 2019 at 11:35 AM Aljoscha Krettek 
wrote:

> +1
>
> Thanks for the effort! The tooling seems to be quite a bit nicer and I
> like that we can grow by adding more machines.
>
> Best,
> Aljoscha
>
> > On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> >
> > +1 for Azure pipeline because it promises better performance.
> >
> > However, I have 2 concerns:
> >
> > 1) Travis provides personal free service for testing personal branches.
> > Usually, contributors use this feature to test PoC or run CRON jobs for
> > pull requests.
> >Using local machine will cost a lot of time. Does AZP provides the
> same
> > free service?
> > 2) Currently, we deployed a webhook [1] to receive Travis CI build
> > notifications [2] and send to bui...@flink.apache.org mailing list.
> >We need to figure out a way how to send Azure build results to the
> > mailing list. And this [3] might be the way to go.
> >
> > builds@f.a.o mailing list
> >
> > Best,
> > Jark
> >
> > [1]: https://github.com/wuchong/flink-notification-bot
> > [2]:
> >
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> > [3]:
> >
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> >
> >
> >
> > On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:
> >
> >> +1
> >>
> >> Till Rohrmann  于2019年12月4日周三 下午10:43写道:
> >>
> >>> +1 for moving to Azure pipelines as it promises better scalability and
> >>> tooling. Looking forward to having faster builds and hence shorter
> >> feedback
> >>> cycles :-)
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler 
> >>> wrote:
> >>>
>  @robert Can you expand how the azure setup interacts with CiBot? Do we
>  have to continue mirroring builds into flink-ci? How will the cronjob
>  configuration work? We should have a general idea on how to implement
>  this before proceeding.
>  Additionally, moving /all /jobs into flink-ci requires setting up the
>  environment variables we have; can we set these up via files or will
> we
>  have to give all committers permissions for flink-ci/flink?
> 
>  On 04/12/2019 12:55, Chesnay Schepler wrote:
> > From what I've seen so far Azure will provide us a better experience,
> > so I'd say +1 for the transition as a whole.
> >
> > I'd delay merge at least until the feature branch is cut.
> > Given the parental leave it may even make sense to only start merging
> > in January afterwards, to reduce the total time taken for the
> >>> transition.
> >
> > Reviews could maybe be made earlier, but I'm wondering whether anyone
> > would even have the time at the moment to do so.
> >
> > On 04/12/2019 12:35, Kurt Young wrote:
> >> Thanks Robert for driving this. There is another big pain point of
> >> current
> >> travis,
> >> which is its cache mechanism will fail from time to time. Almost
> >> around 50%
> >> of
> >> the build fails are caused by cache problem. I opened this issue to
> >> travis
> >> but
> >> got no response yet. So big +1 from my side.
> >>
> >> Just one comment, it's close to 1.10 feature freeze and we will
> >> spend
> >> some
> >> time
> >> to make tests stable before release. I wish this replacement can
> >>> happen
> >> after
> >> 1.10 release, otherwise it will be a unstable factor during release
> >> testing.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:
> >>
> >>> Thanks Robert for the updates! And thanks a lot for all the efforts
> >>> to
> >>> investigate, experiment and tune Azure Pipelines for Flink
> >> building.
> >>> Big +1 for it.
> >>>
> >>> It would be great that the community building can be extended with
> >>> custom
> >>> machines so that the tests would not be queued for long with daily
> >>> growing
> >>> PRs.
> >>>
> >>> The increased timeout would be also very helpful.
> >>> The 50min timeout for free travis accounts is a pain currently,
> >>> especially
> >>> when we'd like to run e2e tests in our own travis. And I had to
> >>> manually
> >>> split the jobs to make it possible to pass.
> >>>
> >>> Thanks,
> >>> Zhu Zhu
> >>>
> >>> Robert Metzger  于2019年12月4日周三 下午6:36写道:
> >>>
>  Hi all,
> 
>  as a follow up from our discussion on reducing the build time
> >> [1], I
> >>> would
>  like to propose migrating our build infrastructure to Azure
> >>> Pipelines
> >>> (away
>  from Travis).
> 
>  I believe that we have reached the limits of what Travis can
>  provide the
>  Flink community, and I don't want the build system to limit or
>  influence
>  the project's growth.
> 
>  

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-05 Thread Aljoscha Krettek
+1

Thanks for the effort! The tooling seems to be quite a bit nicer and I like 
that we can grow by adding more machines.

Best,
Aljoscha

> On 5. Dec 2019, at 03:18, Jark Wu  wrote:
> 
> +1 for Azure pipeline because it promises better performance.
> 
> However, I have 2 concerns:
> 
> 1) Travis provides personal free service for testing personal branches.
> Usually, contributors use this feature to test PoC or run CRON jobs for
> pull requests.
>Using local machine will cost a lot of time. Does AZP provides the same
> free service?
> 2) Currently, we deployed a webhook [1] to receive Travis CI build
> notifications [2] and send to bui...@flink.apache.org mailing list.
>We need to figure out a way how to send Azure build results to the
> mailing list. And this [3] might be the way to go.
> 
> builds@f.a.o mailing list
> 
> Best,
> Jark
> 
> [1]: https://github.com/wuchong/flink-notification-bot
> [2]:
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> [3]:
> https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops
> 
> 
> 
> On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:
> 
>> +1
>> 
>> Till Rohrmann  于2019年12月4日周三 下午10:43写道:
>> 
>>> +1 for moving to Azure pipelines as it promises better scalability and
>>> tooling. Looking forward to having faster builds and hence shorter
>> feedback
>>> cycles :-)
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler 
>>> wrote:
>>> 
 @robert Can you expand how the azure setup interacts with CiBot? Do we
 have to continue mirroring builds into flink-ci? How will the cronjob
 configuration work? We should have a general idea on how to implement
 this before proceeding.
 Additionally, moving /all /jobs into flink-ci requires setting up the
 environment variables we have; can we set these up via files or will we
 have to give all committers permissions for flink-ci/flink?
 
 On 04/12/2019 12:55, Chesnay Schepler wrote:
> From what I've seen so far Azure will provide us a better experience,
> so I'd say +1 for the transition as a whole.
> 
> I'd delay merge at least until the feature branch is cut.
> Given the parental leave it may even make sense to only start merging
> in January afterwards, to reduce the total time taken for the
>>> transition.
> 
> Reviews could maybe be made earlier, but I'm wondering whether anyone
> would even have the time at the moment to do so.
> 
> On 04/12/2019 12:35, Kurt Young wrote:
>> Thanks Robert for driving this. There is another big pain point of
>> current
>> travis,
>> which is its cache mechanism will fail from time to time. Almost
>> around 50%
>> of
>> the build fails are caused by cache problem. I opened this issue to
>> travis
>> but
>> got no response yet. So big +1 from my side.
>> 
>> Just one comment, it's close to 1.10 feature freeze and we will
>> spend
>> some
>> time
>> to make tests stable before release. I wish this replacement can
>>> happen
>> after
>> 1.10 release, otherwise it will be a unstable factor during release
>> testing.
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:
>> 
>>> Thanks Robert for the updates! And thanks a lot for all the efforts
>>> to
>>> investigate, experiment and tune Azure Pipelines for Flink
>> building.
>>> Big +1 for it.
>>> 
>>> It would be great that the community building can be extended with
>>> custom
>>> machines so that the tests would not be queued for long with daily
>>> growing
>>> PRs.
>>> 
>>> The increased timeout would be also very helpful.
>>> The 50min timeout for free travis accounts is a pain currently,
>>> especially
>>> when we'd like to run e2e tests in our own travis. And I had to
>>> manually
>>> split the jobs to make it possible to pass.
>>> 
>>> Thanks,
>>> Zhu Zhu
>>> 
>>> Robert Metzger  于2019年12月4日周三 下午6:36写道:
>>> 
 Hi all,
 
 as a follow up from our discussion on reducing the build time
>> [1], I
>>> would
 like to propose migrating our build infrastructure to Azure
>>> Pipelines
>>> (away
 from Travis).
 
 I believe that we have reached the limits of what Travis can
 provide the
 Flink community, and I don't want the build system to limit or
 influence
 the project's growth.
 
 *Benefits:*
 1. The free Travis account are limited to 5 parallel builds, with
>> a
>>> timeout
 of 50 minutes. Azure offers *10 parallel builds with 300 minute
 timeouts
 *for
 free for open source projects.
 2. Azure Pipelines allows us to *add custom build machines* to the
 pool
>>> of
 10 free parallel 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-04 Thread Jark Wu
+1 for Azure pipeline because it promises better performance.

However, I have 2 concerns:

1) Travis provides personal free service for testing personal branches.
Usually, contributors use this feature to test PoC or run CRON jobs for
pull requests.
Using local machine will cost a lot of time. Does AZP provides the same
free service?
2) Currently, we deployed a webhook [1] to receive Travis CI build
notifications [2] and send to bui...@flink.apache.org mailing list.
We need to figure out a way how to send Azure build results to the
mailing list. And this [3] might be the way to go.

builds@f.a.o mailing list

Best,
Jark

[1]: https://github.com/wuchong/flink-notification-bot
[2]:
https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
[3]:
https://docs.microsoft.com/en-us/azure/devops/service-hooks/overview?view=azure-devops



On Wed, 4 Dec 2019 at 22:48, Jeff Zhang  wrote:

> +1
>
> Till Rohrmann  于2019年12月4日周三 下午10:43写道:
>
> > +1 for moving to Azure pipelines as it promises better scalability and
> > tooling. Looking forward to having faster builds and hence shorter
> feedback
> > cycles :-)
> >
> > Cheers,
> > Till
> >
> > On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler 
> > wrote:
> >
> > > @robert Can you expand how the azure setup interacts with CiBot? Do we
> > > have to continue mirroring builds into flink-ci? How will the cronjob
> > > configuration work? We should have a general idea on how to implement
> > > this before proceeding.
> > > Additionally, moving /all /jobs into flink-ci requires setting up the
> > > environment variables we have; can we set these up via files or will we
> > > have to give all committers permissions for flink-ci/flink?
> > >
> > > On 04/12/2019 12:55, Chesnay Schepler wrote:
> > > > From what I've seen so far Azure will provide us a better experience,
> > > > so I'd say +1 for the transition as a whole.
> > > >
> > > > I'd delay merge at least until the feature branch is cut.
> > > > Given the parental leave it may even make sense to only start merging
> > > > in January afterwards, to reduce the total time taken for the
> > transition.
> > > >
> > > > Reviews could maybe be made earlier, but I'm wondering whether anyone
> > > > would even have the time at the moment to do so.
> > > >
> > > > On 04/12/2019 12:35, Kurt Young wrote:
> > > >> Thanks Robert for driving this. There is another big pain point of
> > > >> current
> > > >> travis,
> > > >> which is its cache mechanism will fail from time to time. Almost
> > > >> around 50%
> > > >> of
> > > >> the build fails are caused by cache problem. I opened this issue to
> > > >> travis
> > > >> but
> > > >> got no response yet. So big +1 from my side.
> > > >>
> > > >> Just one comment, it's close to 1.10 feature freeze and we will
> spend
> > > >> some
> > > >> time
> > > >> to make tests stable before release. I wish this replacement can
> > happen
> > > >> after
> > > >> 1.10 release, otherwise it will be a unstable factor during release
> > > >> testing.
> > > >>
> > > >> Best,
> > > >> Kurt
> > > >>
> > > >>
> > > >> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:
> > > >>
> > > >>> Thanks Robert for the updates! And thanks a lot for all the efforts
> > to
> > > >>> investigate, experiment and tune Azure Pipelines for Flink
> building.
> > > >>> Big +1 for it.
> > > >>>
> > > >>> It would be great that the community building can be extended with
> > > >>> custom
> > > >>> machines so that the tests would not be queued for long with daily
> > > >>> growing
> > > >>> PRs.
> > > >>>
> > > >>> The increased timeout would be also very helpful.
> > > >>> The 50min timeout for free travis accounts is a pain currently,
> > > >>> especially
> > > >>> when we'd like to run e2e tests in our own travis. And I had to
> > > >>> manually
> > > >>> split the jobs to make it possible to pass.
> > > >>>
> > > >>> Thanks,
> > > >>> Zhu Zhu
> > > >>>
> > > >>> Robert Metzger  于2019年12月4日周三 下午6:36写道:
> > > >>>
> > >  Hi all,
> > > 
> > >  as a follow up from our discussion on reducing the build time
> [1], I
> > > >>> would
> > >  like to propose migrating our build infrastructure to Azure
> > Pipelines
> > > >>> (away
> > >  from Travis).
> > > 
> > >  I believe that we have reached the limits of what Travis can
> > >  provide the
> > >  Flink community, and I don't want the build system to limit or
> > >  influence
> > >  the project's growth.
> > > 
> > >  *Benefits:*
> > >  1. The free Travis account are limited to 5 parallel builds, with
> a
> > > >>> timeout
> > >  of 50 minutes. Azure offers *10 parallel builds with 300 minute
> > >  timeouts
> > >  *for
> > >  free for open source projects.
> > >  2. Azure Pipelines allows us to *add custom build machines* to the
> > >  pool
> > > >>> of
> > >  10 free parallel builders.
> > >  This will allow the Flink community to scale the available build
> > >  

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-04 Thread Jeff Zhang
+1

Till Rohrmann  于2019年12月4日周三 下午10:43写道:

> +1 for moving to Azure pipelines as it promises better scalability and
> tooling. Looking forward to having faster builds and hence shorter feedback
> cycles :-)
>
> Cheers,
> Till
>
> On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler 
> wrote:
>
> > @robert Can you expand how the azure setup interacts with CiBot? Do we
> > have to continue mirroring builds into flink-ci? How will the cronjob
> > configuration work? We should have a general idea on how to implement
> > this before proceeding.
> > Additionally, moving /all /jobs into flink-ci requires setting up the
> > environment variables we have; can we set these up via files or will we
> > have to give all committers permissions for flink-ci/flink?
> >
> > On 04/12/2019 12:55, Chesnay Schepler wrote:
> > > From what I've seen so far Azure will provide us a better experience,
> > > so I'd say +1 for the transition as a whole.
> > >
> > > I'd delay merge at least until the feature branch is cut.
> > > Given the parental leave it may even make sense to only start merging
> > > in January afterwards, to reduce the total time taken for the
> transition.
> > >
> > > Reviews could maybe be made earlier, but I'm wondering whether anyone
> > > would even have the time at the moment to do so.
> > >
> > > On 04/12/2019 12:35, Kurt Young wrote:
> > >> Thanks Robert for driving this. There is another big pain point of
> > >> current
> > >> travis,
> > >> which is its cache mechanism will fail from time to time. Almost
> > >> around 50%
> > >> of
> > >> the build fails are caused by cache problem. I opened this issue to
> > >> travis
> > >> but
> > >> got no response yet. So big +1 from my side.
> > >>
> > >> Just one comment, it's close to 1.10 feature freeze and we will spend
> > >> some
> > >> time
> > >> to make tests stable before release. I wish this replacement can
> happen
> > >> after
> > >> 1.10 release, otherwise it will be a unstable factor during release
> > >> testing.
> > >>
> > >> Best,
> > >> Kurt
> > >>
> > >>
> > >> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:
> > >>
> > >>> Thanks Robert for the updates! And thanks a lot for all the efforts
> to
> > >>> investigate, experiment and tune Azure Pipelines for Flink building.
> > >>> Big +1 for it.
> > >>>
> > >>> It would be great that the community building can be extended with
> > >>> custom
> > >>> machines so that the tests would not be queued for long with daily
> > >>> growing
> > >>> PRs.
> > >>>
> > >>> The increased timeout would be also very helpful.
> > >>> The 50min timeout for free travis accounts is a pain currently,
> > >>> especially
> > >>> when we'd like to run e2e tests in our own travis. And I had to
> > >>> manually
> > >>> split the jobs to make it possible to pass.
> > >>>
> > >>> Thanks,
> > >>> Zhu Zhu
> > >>>
> > >>> Robert Metzger  于2019年12月4日周三 下午6:36写道:
> > >>>
> >  Hi all,
> > 
> >  as a follow up from our discussion on reducing the build time [1], I
> > >>> would
> >  like to propose migrating our build infrastructure to Azure
> Pipelines
> > >>> (away
> >  from Travis).
> > 
> >  I believe that we have reached the limits of what Travis can
> >  provide the
> >  Flink community, and I don't want the build system to limit or
> >  influence
> >  the project's growth.
> > 
> >  *Benefits:*
> >  1. The free Travis account are limited to 5 parallel builds, with a
> > >>> timeout
> >  of 50 minutes. Azure offers *10 parallel builds with 300 minute
> >  timeouts
> >  *for
> >  free for open source projects.
> >  2. Azure Pipelines allows us to *add custom build machines* to the
> >  pool
> > >>> of
> >  10 free parallel builders.
> >  This will allow the Flink community to scale the available build
> >  capacity
> >  as the project grows. We are dependent on donations from supporting
> >  companies, but I believe that it is easier for companies to donate
> > >>> machines
> >  than money.
> >  Alibaba is willing to provide 10 machines, with 32 cores each to the
> > >>> Flink
> >  project for this purpose.
> >  In addition, Xiyuan, who's working on adding ARM support for Flink
> > >>> provided
> >  me with 2 ARM machines (16 cores each).
> >  I want to use the custom, more efficient build machines for building
> >  Flink's pull requests and master-pushes.
> >  3. *Azure Pipelines is a more feature-rich tool*, allowing for
> >  example to
> >  transfer intermediate build artifacts between pipeline stages. This
> >  will
> >  allow us to make the build more reliable (we are currently abusing
> the
> >  caching mechanism in Travis for this).
> >  It also has some basic analytics on test results / flaky tests etc.
> > 
> >  *Known problems:*
> >  - Initially, we might see different build instabilities than before
> >  - There's a higher maintenance overhead for the 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-04 Thread Till Rohrmann
+1 for moving to Azure pipelines as it promises better scalability and
tooling. Looking forward to having faster builds and hence shorter feedback
cycles :-)

Cheers,
Till

On Wed, Dec 4, 2019 at 1:24 PM Chesnay Schepler  wrote:

> @robert Can you expand how the azure setup interacts with CiBot? Do we
> have to continue mirroring builds into flink-ci? How will the cronjob
> configuration work? We should have a general idea on how to implement
> this before proceeding.
> Additionally, moving /all /jobs into flink-ci requires setting up the
> environment variables we have; can we set these up via files or will we
> have to give all committers permissions for flink-ci/flink?
>
> On 04/12/2019 12:55, Chesnay Schepler wrote:
> > From what I've seen so far Azure will provide us a better experience,
> > so I'd say +1 for the transition as a whole.
> >
> > I'd delay merge at least until the feature branch is cut.
> > Given the parental leave it may even make sense to only start merging
> > in January afterwards, to reduce the total time taken for the transition.
> >
> > Reviews could maybe be made earlier, but I'm wondering whether anyone
> > would even have the time at the moment to do so.
> >
> > On 04/12/2019 12:35, Kurt Young wrote:
> >> Thanks Robert for driving this. There is another big pain point of
> >> current
> >> travis,
> >> which is its cache mechanism will fail from time to time. Almost
> >> around 50%
> >> of
> >> the build fails are caused by cache problem. I opened this issue to
> >> travis
> >> but
> >> got no response yet. So big +1 from my side.
> >>
> >> Just one comment, it's close to 1.10 feature freeze and we will spend
> >> some
> >> time
> >> to make tests stable before release. I wish this replacement can happen
> >> after
> >> 1.10 release, otherwise it will be a unstable factor during release
> >> testing.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:
> >>
> >>> Thanks Robert for the updates! And thanks a lot for all the efforts to
> >>> investigate, experiment and tune Azure Pipelines for Flink building.
> >>> Big +1 for it.
> >>>
> >>> It would be great that the community building can be extended with
> >>> custom
> >>> machines so that the tests would not be queued for long with daily
> >>> growing
> >>> PRs.
> >>>
> >>> The increased timeout would be also very helpful.
> >>> The 50min timeout for free travis accounts is a pain currently,
> >>> especially
> >>> when we'd like to run e2e tests in our own travis. And I had to
> >>> manually
> >>> split the jobs to make it possible to pass.
> >>>
> >>> Thanks,
> >>> Zhu Zhu
> >>>
> >>> Robert Metzger  于2019年12月4日周三 下午6:36写道:
> >>>
>  Hi all,
> 
>  as a follow up from our discussion on reducing the build time [1], I
> >>> would
>  like to propose migrating our build infrastructure to Azure Pipelines
> >>> (away
>  from Travis).
> 
>  I believe that we have reached the limits of what Travis can
>  provide the
>  Flink community, and I don't want the build system to limit or
>  influence
>  the project's growth.
> 
>  *Benefits:*
>  1. The free Travis account are limited to 5 parallel builds, with a
> >>> timeout
>  of 50 minutes. Azure offers *10 parallel builds with 300 minute
>  timeouts
>  *for
>  free for open source projects.
>  2. Azure Pipelines allows us to *add custom build machines* to the
>  pool
> >>> of
>  10 free parallel builders.
>  This will allow the Flink community to scale the available build
>  capacity
>  as the project grows. We are dependent on donations from supporting
>  companies, but I believe that it is easier for companies to donate
> >>> machines
>  than money.
>  Alibaba is willing to provide 10 machines, with 32 cores each to the
> >>> Flink
>  project for this purpose.
>  In addition, Xiyuan, who's working on adding ARM support for Flink
> >>> provided
>  me with 2 ARM machines (16 cores each).
>  I want to use the custom, more efficient build machines for building
>  Flink's pull requests and master-pushes.
>  3. *Azure Pipelines is a more feature-rich tool*, allowing for
>  example to
>  transfer intermediate build artifacts between pipeline stages. This
>  will
>  allow us to make the build more reliable (we are currently abusing the
>  caching mechanism in Travis for this).
>  It also has some basic analytics on test results / flaky tests etc.
> 
>  *Known problems:*
>  - Initially, we might see different build instabilities than before
>  - There's a higher maintenance overhead for the custom build machines
>  (keeping them up to date etc.)
>  - We can not use the build status integration of AZP, because they
> >>> require
>  write access to the repository's source. The foundation does not allow
> >>> that
>  [2].
>  I propose to extend flinkbot / the flink-ci 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-04 Thread Chesnay Schepler
@robert Can you expand how the azure setup interacts with CiBot? Do we 
have to continue mirroring builds into flink-ci? How will the cronjob 
configuration work? We should have a general idea on how to implement 
this before proceeding.
Additionally, moving /all /jobs into flink-ci requires setting up the 
environment variables we have; can we set these up via files or will we 
have to give all committers permissions for flink-ci/flink?


On 04/12/2019 12:55, Chesnay Schepler wrote:
From what I've seen so far Azure will provide us a better experience, 
so I'd say +1 for the transition as a whole.


I'd delay merge at least until the feature branch is cut.
Given the parental leave it may even make sense to only start merging 
in January afterwards, to reduce the total time taken for the transition.


Reviews could maybe be made earlier, but I'm wondering whether anyone 
would even have the time at the moment to do so.


On 04/12/2019 12:35, Kurt Young wrote:
Thanks Robert for driving this. There is another big pain point of 
current

travis,
which is its cache mechanism will fail from time to time. Almost 
around 50%

of
the build fails are caused by cache problem. I opened this issue to 
travis

but
got no response yet. So big +1 from my side.

Just one comment, it's close to 1.10 feature freeze and we will spend 
some

time
to make tests stable before release. I wish this replacement can happen
after
1.10 release, otherwise it will be a unstable factor during release
testing.

Best,
Kurt


On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:


Thanks Robert for the updates! And thanks a lot for all the efforts to
investigate, experiment and tune Azure Pipelines for Flink building.
Big +1 for it.

It would be great that the community building can be extended with 
custom
machines so that the tests would not be queued for long with daily 
growing

PRs.

The increased timeout would be also very helpful.
The 50min timeout for free travis accounts is a pain currently, 
especially
when we'd like to run e2e tests in our own travis. And I had to 
manually

split the jobs to make it possible to pass.

Thanks,
Zhu Zhu

Robert Metzger  于2019年12月4日周三 下午6:36写道:


Hi all,

as a follow up from our discussion on reducing the build time [1], I

would

like to propose migrating our build infrastructure to Azure Pipelines

(away

from Travis).

I believe that we have reached the limits of what Travis can 
provide the
Flink community, and I don't want the build system to limit or 
influence

the project's growth.

*Benefits:*
1. The free Travis account are limited to 5 parallel builds, with a

timeout
of 50 minutes. Azure offers *10 parallel builds with 300 minute 
timeouts

*for
free for open source projects.
2. Azure Pipelines allows us to *add custom build machines* to the 
pool

of

10 free parallel builders.
This will allow the Flink community to scale the available build 
capacity

as the project grows. We are dependent on donations from supporting
companies, but I believe that it is easier for companies to donate

machines

than money.
Alibaba is willing to provide 10 machines, with 32 cores each to the

Flink

project for this purpose.
In addition, Xiyuan, who's working on adding ARM support for Flink

provided

me with 2 ARM machines (16 cores each).
I want to use the custom, more efficient build machines for building
Flink's pull requests and master-pushes.
3. *Azure Pipelines is a more feature-rich tool*, allowing for 
example to
transfer intermediate build artifacts between pipeline stages. This 
will

allow us to make the build more reliable (we are currently abusing the
caching mechanism in Travis for this).
It also has some basic analytics on test results / flaky tests etc.

*Known problems:*
- Initially, we might see different build instabilities than before
- There's a higher maintenance overhead for the custom build machines
(keeping them up to date etc.)
- We can not use the build status integration of AZP, because they

require

write access to the repository's source. The foundation does not allow

that

[2].
I propose to extend flinkbot / the flink-ci repository.

*Current Status:*
- I'm able [3] to execute [4] the current custom build scripts on 
Azure
Pipelines: This means that we will have one compile stage, and N 
testing

jobs in the 2nd stage. Currently, we have N=10 testing jobs.
The time from the start of a build till all tests have completed is 
1h22

minutes.
- I'm working on getting the nightly end to end tests to run on the 
new

infrastructure.
- I'm working on getting the build to work on our pool of custom 
machines

as well
- I'm working on setting up the full matrix of builds (different 
scala,

hadoop etc. versions) for the nightlies

*Next Steps:*
- I propose to document the entire build system in the Flink Wiki
- Once Azure can cover the same pull request tests as Travis, I 
would set
it up to run in parallel (including Flinkbot posting links to 
Azure). I
hope that this phase lasts for 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-04 Thread Chesnay Schepler
From what I've seen so far Azure will provide us a better experience, 
so I'd say +1 for the transition as a whole.


I'd delay merge at least until the feature branch is cut.
Given the parental leave it may even make sense to only start merging in 
January afterwards, to reduce the total time taken for the transition.


Reviews could maybe be made earlier, but I'm wondering whether anyone 
would even have the time at the moment to do so.


On 04/12/2019 12:35, Kurt Young wrote:

Thanks Robert for driving this. There is another big pain point of current
travis,
which is its cache mechanism will fail from time to time. Almost around 50%
of
the build fails are caused by cache problem. I opened this issue to travis
but
got no response yet. So big +1 from my side.

Just one comment, it's close to 1.10 feature freeze and we will spend some
time
to make tests stable before release. I wish this replacement can happen
after
1.10 release, otherwise it will be a unstable factor during release
testing.

Best,
Kurt


On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:


Thanks Robert for the updates! And thanks a lot for all the efforts to
investigate, experiment and tune Azure Pipelines for Flink building.
Big +1 for it.

It would be great that the community building can be extended with custom
machines so that the tests would not be queued for long with daily growing
PRs.

The increased timeout would be also very helpful.
The 50min timeout for free travis accounts is a pain currently, especially
when we'd like to run e2e tests in our own travis. And I had to manually
split the jobs to make it possible to pass.

Thanks,
Zhu Zhu

Robert Metzger  于2019年12月4日周三 下午6:36写道:


Hi all,

as a follow up from our discussion on reducing the build time [1], I

would

like to propose migrating our build infrastructure to Azure Pipelines

(away

from Travis).

I believe that we have reached the limits of what Travis can provide the
Flink community, and I don't want the build system to limit or influence
the project's growth.

*Benefits:*
1. The free Travis account are limited to 5 parallel builds, with a

timeout

of 50 minutes. Azure offers *10 parallel builds with 300 minute timeouts
*for
free for open source projects.
2. Azure Pipelines allows us to *add custom build machines* to the pool

of

10 free parallel builders.
This will allow the Flink community to scale the available build capacity
as the project grows. We are dependent on donations from supporting
companies, but I believe that it is easier for companies to donate

machines

than money.
Alibaba is willing to provide 10 machines, with 32 cores each to the

Flink

project for this purpose.
In addition, Xiyuan, who's working on adding ARM support for Flink

provided

me with 2 ARM machines (16 cores each).
I want to use the custom, more efficient build machines for building
Flink's pull requests and master-pushes.
3. *Azure Pipelines is a more feature-rich tool*, allowing for example to
transfer intermediate build artifacts between pipeline stages. This will
allow us to make the build more reliable (we are currently abusing the
caching mechanism in Travis for this).
It also has some basic analytics on test results / flaky tests etc.

*Known problems:*
- Initially, we might see different build instabilities than before
- There's a higher maintenance overhead for the custom build machines
(keeping them up to date etc.)
- We can not use the build status integration of AZP, because they

require

write access to the repository's source. The foundation does not allow

that

[2].
I propose to extend flinkbot / the flink-ci repository.

*Current Status:*
- I'm able [3] to execute [4] the current custom build scripts on Azure
Pipelines: This means that we will have one compile stage, and N testing
jobs in the 2nd stage. Currently, we have N=10 testing jobs.
The time from the start of a build till all tests have completed is 1h22
minutes.
- I'm working on getting the nightly end to end tests to run on the new
infrastructure.
- I'm working on getting the build to work on our pool of custom machines
as well
- I'm working on setting up the full matrix of builds (different scala,
hadoop etc. versions) for the nightlies

*Next Steps:*
- I propose to document the entire build system in the Flink Wiki
- Once Azure can cover the same pull request tests as Travis, I would set
it up to run in parallel (including Flinkbot posting links to Azure). I
hope that this phase lasts for 1-2 weeks only, so that we do not have to
maintain things concurrently. I will monitor the build stability closely,
but would expect some support with debugging potential issues from the
contributors.
- Once there are no problems with the new setup, we remove the Travis
setup.
- Independently, I will work on triggering builds from master / release -
branch pushes, as well as cron builds from the master branch ... all this
will be described in the Wiki.


*Timeline:*- Once I have the feeling that people are supportive of 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-04 Thread Kurt Young
Thanks Robert for driving this. There is another big pain point of current
travis,
which is its cache mechanism will fail from time to time. Almost around 50%
of
the build fails are caused by cache problem. I opened this issue to travis
but
got no response yet. So big +1 from my side.

Just one comment, it's close to 1.10 feature freeze and we will spend some
time
to make tests stable before release. I wish this replacement can happen
after
1.10 release, otherwise it will be a unstable factor during release
testing.

Best,
Kurt


On Wed, Dec 4, 2019 at 7:16 PM Zhu Zhu  wrote:

> Thanks Robert for the updates! And thanks a lot for all the efforts to
> investigate, experiment and tune Azure Pipelines for Flink building.
> Big +1 for it.
>
> It would be great that the community building can be extended with custom
> machines so that the tests would not be queued for long with daily growing
> PRs.
>
> The increased timeout would be also very helpful.
> The 50min timeout for free travis accounts is a pain currently, especially
> when we'd like to run e2e tests in our own travis. And I had to manually
> split the jobs to make it possible to pass.
>
> Thanks,
> Zhu Zhu
>
> Robert Metzger  于2019年12月4日周三 下午6:36写道:
>
> > Hi all,
> >
> > as a follow up from our discussion on reducing the build time [1], I
> would
> > like to propose migrating our build infrastructure to Azure Pipelines
> (away
> > from Travis).
> >
> > I believe that we have reached the limits of what Travis can provide the
> > Flink community, and I don't want the build system to limit or influence
> > the project's growth.
> >
> > *Benefits:*
> > 1. The free Travis account are limited to 5 parallel builds, with a
> timeout
> > of 50 minutes. Azure offers *10 parallel builds with 300 minute timeouts
> > *for
> > free for open source projects.
> > 2. Azure Pipelines allows us to *add custom build machines* to the pool
> of
> > 10 free parallel builders.
> > This will allow the Flink community to scale the available build capacity
> > as the project grows. We are dependent on donations from supporting
> > companies, but I believe that it is easier for companies to donate
> machines
> > than money.
> > Alibaba is willing to provide 10 machines, with 32 cores each to the
> Flink
> > project for this purpose.
> > In addition, Xiyuan, who's working on adding ARM support for Flink
> provided
> > me with 2 ARM machines (16 cores each).
> > I want to use the custom, more efficient build machines for building
> > Flink's pull requests and master-pushes.
> > 3. *Azure Pipelines is a more feature-rich tool*, allowing for example to
> > transfer intermediate build artifacts between pipeline stages. This will
> > allow us to make the build more reliable (we are currently abusing the
> > caching mechanism in Travis for this).
> > It also has some basic analytics on test results / flaky tests etc.
> >
> > *Known problems:*
> > - Initially, we might see different build instabilities than before
> > - There's a higher maintenance overhead for the custom build machines
> > (keeping them up to date etc.)
> > - We can not use the build status integration of AZP, because they
> require
> > write access to the repository's source. The foundation does not allow
> that
> > [2].
> > I propose to extend flinkbot / the flink-ci repository.
> >
> > *Current Status:*
> > - I'm able [3] to execute [4] the current custom build scripts on Azure
> > Pipelines: This means that we will have one compile stage, and N testing
> > jobs in the 2nd stage. Currently, we have N=10 testing jobs.
> > The time from the start of a build till all tests have completed is 1h22
> > minutes.
> > - I'm working on getting the nightly end to end tests to run on the new
> > infrastructure.
> > - I'm working on getting the build to work on our pool of custom machines
> > as well
> > - I'm working on setting up the full matrix of builds (different scala,
> > hadoop etc. versions) for the nightlies
> >
> > *Next Steps:*
> > - I propose to document the entire build system in the Flink Wiki
> > - Once Azure can cover the same pull request tests as Travis, I would set
> > it up to run in parallel (including Flinkbot posting links to Azure). I
> > hope that this phase lasts for 1-2 weeks only, so that we do not have to
> > maintain things concurrently. I will monitor the build stability closely,
> > but would expect some support with debugging potential issues from the
> > contributors.
> > - Once there are no problems with the new setup, we remove the Travis
> > setup.
> > - Independently, I will work on triggering builds from master / release -
> > branch pushes, as well as cron builds from the master branch ... all this
> > will be described in the Wiki.
> >
> >
> > *Timeline:*- Once I have the feeling that people are supportive of the
> > idea, I will start documenting in the Wiki. The first pull requests
> should
> > show up after a few more days.
> > I will do a one month parental leave starting some time 

Re: [DISCUSS] Migrate build infrastructure from Travis CI to Azure Pipelines

2019-12-04 Thread Zhu Zhu
Thanks Robert for the updates! And thanks a lot for all the efforts to
investigate, experiment and tune Azure Pipelines for Flink building.
Big +1 for it.

It would be great that the community building can be extended with custom
machines so that the tests would not be queued for long with daily growing
PRs.

The increased timeout would be also very helpful.
The 50min timeout for free travis accounts is a pain currently, especially
when we'd like to run e2e tests in our own travis. And I had to manually
split the jobs to make it possible to pass.

Thanks,
Zhu Zhu

Robert Metzger  于2019年12月4日周三 下午6:36写道:

> Hi all,
>
> as a follow up from our discussion on reducing the build time [1], I would
> like to propose migrating our build infrastructure to Azure Pipelines (away
> from Travis).
>
> I believe that we have reached the limits of what Travis can provide the
> Flink community, and I don't want the build system to limit or influence
> the project's growth.
>
> *Benefits:*
> 1. The free Travis account are limited to 5 parallel builds, with a timeout
> of 50 minutes. Azure offers *10 parallel builds with 300 minute timeouts
> *for
> free for open source projects.
> 2. Azure Pipelines allows us to *add custom build machines* to the pool of
> 10 free parallel builders.
> This will allow the Flink community to scale the available build capacity
> as the project grows. We are dependent on donations from supporting
> companies, but I believe that it is easier for companies to donate machines
> than money.
> Alibaba is willing to provide 10 machines, with 32 cores each to the Flink
> project for this purpose.
> In addition, Xiyuan, who's working on adding ARM support for Flink provided
> me with 2 ARM machines (16 cores each).
> I want to use the custom, more efficient build machines for building
> Flink's pull requests and master-pushes.
> 3. *Azure Pipelines is a more feature-rich tool*, allowing for example to
> transfer intermediate build artifacts between pipeline stages. This will
> allow us to make the build more reliable (we are currently abusing the
> caching mechanism in Travis for this).
> It also has some basic analytics on test results / flaky tests etc.
>
> *Known problems:*
> - Initially, we might see different build instabilities than before
> - There's a higher maintenance overhead for the custom build machines
> (keeping them up to date etc.)
> - We can not use the build status integration of AZP, because they require
> write access to the repository's source. The foundation does not allow that
> [2].
> I propose to extend flinkbot / the flink-ci repository.
>
> *Current Status:*
> - I'm able [3] to execute [4] the current custom build scripts on Azure
> Pipelines: This means that we will have one compile stage, and N testing
> jobs in the 2nd stage. Currently, we have N=10 testing jobs.
> The time from the start of a build till all tests have completed is 1h22
> minutes.
> - I'm working on getting the nightly end to end tests to run on the new
> infrastructure.
> - I'm working on getting the build to work on our pool of custom machines
> as well
> - I'm working on setting up the full matrix of builds (different scala,
> hadoop etc. versions) for the nightlies
>
> *Next Steps:*
> - I propose to document the entire build system in the Flink Wiki
> - Once Azure can cover the same pull request tests as Travis, I would set
> it up to run in parallel (including Flinkbot posting links to Azure). I
> hope that this phase lasts for 1-2 weeks only, so that we do not have to
> maintain things concurrently. I will monitor the build stability closely,
> but would expect some support with debugging potential issues from the
> contributors.
> - Once there are no problems with the new setup, we remove the Travis
> setup.
> - Independently, I will work on triggering builds from master / release -
> branch pushes, as well as cron builds from the master branch ... all this
> will be described in the Wiki.
>
>
> *Timeline:*- Once I have the feeling that people are supportive of the
> idea, I will start documenting in the Wiki. The first pull requests should
> show up after a few more days.
> I will do a one month parental leave starting some time later in December,
> which will probably delay things a bit. I hope to have everything finished
> by end of January.
>
> I'm happy to hear your thoughts on this work.
> If nobody objects, I will start documenting the system and prepare
> everything for the migration.
>
> Best,
> Robert
>
>
>
> [1]
>
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> [2] https://issues.apache.org/jira/browse/INFRA-17030
> [3] https://github.com/rmetzger/flink/tree/azure_playground
> [4] https://dev.azure.com/rmetzger/Flink/_build?definitionId=4&_a=summary
>