Re: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc1

2020-07-22 Thread Patrick Mu
+ 1

Test custom operators: all examples using custom operators are passing, no 
error or regression found

Ziyi

On 2020/07/22 06:56:46, Kshitij Kalambarkar  
wrote: 
> + 1
> 
> * Built from source on Ubuntu 18.04 with CUDA, CUDNN
> * Verified test_higher_order_grad.py
> 
> Great job!
> 
> On Wed, Jul 22, 2020 at 12:02 PM Chaitanya Bapat 
> wrote:
> 
> > +1
> >
> > - Built from source on Ubuntu18 with CUDA ON, USE_INT64_TENSOR_SIZE ON
> > - Verified large tensor tests work as expected on a p3.16xl instance [with
> > 8 Tesla V100 GPUs]
> > - Verified OpPerf utility works as expected.
> >
> > Steps followed:
> > https://gist.github.com/ChaiBapchya/8a5131932693d4ca47281368c752b726
> >
> > Thanks Ciyong for leading with the releases. Incredible job.
> >
> > Regards,
> > Chai
> >
> >
> > On Tue, 21 Jul 2020 at 23:05, Karan Jariwala 
> > wrote:
> >
> > > +1
> > >
> > > Build from source on Ubuntu 18 with CUDA/CUDNN/NCCL ON and verified with
> > > Horovod 0.19.5 by running unittest and integration tests.
> > >
> > > Thanks,
> > > Karan
> > >
> > > On Tue, Jul 21, 2020 at 10:23 PM Sheng Zha  wrote:
> > >
> > > > +1. I checked:
> > > >
> > > > [x] Are release files in correct location? Yes
> > > > [x] Do release files have the word incubating in their name? Yes
> > > > [x] Are the digital signature and hashes correct? Yes
> > > > [x] Does DISCLAIMER file exist? Yes, DISCLAIMER-WIP
> > > > [x] Do LICENSE and NOTICE files exists? Yes
> > > > [x] Is the LICENSE and NOTICE text correct?
> > > > Yes, though the license still reads "Copyright [] [name of
> > copyright
> > > > owner]", which needs correction.
> > > >
> > > > [x] Is the NOTICE year correct? Yes
> > > > [x] Un-included software dependencies are not mentioned in LICENSE or
> > > > NOTICE?
> > > > No. mshadow is now contributed to MXNet via software grant and should
> > be
> > > > removed from NOTICE.
> > > >
> > > > [x] License information is not mentioned in NOTICE? Confirmed
> > > >
> > > > Is there any 3rd party code contained inside the release? If so:
> > > > [x] Does the software have a compatible license? Yes. Minor issue:
> > > > Dual license in cmake/Modules/FindJeMalloc.cmake.
> > > >
> > > > [x] Are all software licenses mentioned in LICENSE? Yes
> > > > [x] Is the full text of the licenses (or pointers to it) in LICENSE?
> > Yes
> > > >
> > > > Is any of this code Apache licensed? Do they have NOTICE files? If so:
> > > > [x] Have relevant parts of those NOTICE files been added to this NOTICE
> > > > file?
> > > > No. TVM NOTICE file hasn't been included.
> > > >
> > > > [x] Do all source files have ASF headers?
> > > > Yes, except those in 3rdparty folder and those mentioned in license.
> > > > [x] Do the contents of the release match with what's tagged in version
> > > > control? Yes
> > > > [x] Are there any unexpected binary files in the release? No
> > > > [x] Can you compile from source? Are the instruction clear? Yes,
> > Makefile
> > > > is present and is straightforward.
> > > > Is the issue minor? Yes
> > > > Could it possibly be fixed in the next release? Yes
> > > >
> > > > I vote with:
> > > > [x] +1 release the software
> > > >
> > > >
> > > > On 2020/07/20 17:25:50, "Skalicky, Sam" 
> > > > wrote:
> > > > > +1
> > > > >
> > > > > Tested:
> > > > > - Make flow building from source, verified all example/extensions/*
> > > work
> > > > correctly
> > > > > - staticbuild flow cpu & cu102 variants producing the pip wheels,
> > > tested
> > > > with custom extension library
> > > > >
> > > > > Sam
> > > > >
> > > > > On 7/20/20, 4:07 AM, "Chen, Ciyong"  wrote:
> > > > >
> > > > > CAUTION: This email originated from outside of the organization.
> > Do
> > > > not click links or open attachments unless you can confirm the sender
> > and
> > > > know the content is safe.
> > > > >
> > > > >
> > > > >
> > > > > Thanks Aston, Patric for the vote.
> > > > >
> > > > > Hi Community,
> > > > >
> > > > > I would like to call for action to test/validate/vote for the
> > > > release candidate (1.7.0.rc1).
> > > > > As we've not reached the quorum, I would like to extend the
> > voting
> > > > process to July 22, 23:59:59 PST.
> > > > > Please prepare your time and provide feedback if you've tried
> > with
> > > > the pre-released code base, thanks!
> > > > >
> > > > > Best Regards,
> > > > > Ciyong
> > > > >
> > > > > -Original Message-
> > > > > From: Zhao, Patric 
> > > > > Sent: Monday, July 20, 2020 11:36 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Cc: d...@mxnet.apache.org; Bob Paulin ; Henri
> > > > Yandell ; Jason Dai ; Markus
> > > > Weimer ; Michael Wall 
> > > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> > > > 1.7.0.rc1
> > > > >
> > > > > +1
> > > > >
> > > > > Passed the performance benchmarking for CPU tests and no
> > regression
> > > > is found.
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > >

Re: RE: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0

2020-07-13 Thread Patrick Mu
It happens only on CPU, and I did more runs and found that the runtime 
fluctuates very badly, but the average regression is ~10%. 

Through the previous benchmarks I also found some worse regression comparing 
1.6 to 1.5 like inception inference on CPU and those regression was not caught. 

My 2-cent is it might not be a blocker for the release, and we can have room 
for improvement for upcoming 2.0 and 1.7.1 if necessary

Ziyi

On 2020/07/13 08:40:32, "Chen, Ciyong"  wrote: 
> Thanks Ziyi,
> 
> May I know which platform did you notice the performance regression, CPU or 
> GPU? ~20% regression would be a large gap.
> 
> Thanks,
> -Ciyong
> 
> -Original Message-
> From: Patrick Mu  
> Sent: Monday, July 13, 2020 4:13 PM
> To: d...@mxnet.apache.org
> Subject: Re: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> 
> Hi Ciyong,
> 
> I have reverted the commit, and I am able to train Yolov3 with no problem.
> 
> However I also noticed there is a ~20% regression in 1.7 comparing with 1.6 
> in inference Yolov3 with Module API, so we are going to discuss tomorrow if 
> that would be an issue for 1.7.
> 
> Thanks,
> Ziyi
> 
> On 2020/07/13 02:19:28, "Chen, Ciyong"  wrote: 
> > Hi Ziyi, Xingjian,
> > 
> > Thanks for reporting the issues from GluonCV/AutoGluon perspective.
> > I just did a quick try by reverting the 
> > https://github.com/apache/incubator-mxnet/pull/18358, then the behavior is 
> > same as 1.6.0 with the cases in the gist 
> > (https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890).
> > 
> > Considering there's many end-users using Gluon based API/models, and 
> > introducing a new patch to fix this issue could be risky, so I agree that 
> > reverting this PR (#18358) might be the best option for the 1.7.0 release.
> > But I'm considering is there any other test cases to cover this feature, 
> > which could be helpful to track this kind of code changes in future, or can 
> > you help to verify if this revert do resolve the broken issue at your side?
> > 
> > > Thus, the real issue is: Should we supporting pickling a Gluon Block? If 
> > > not, should we support combining multiprocessing.pool with the Gluon 
> > > Block?
> > Seems it's more like a new feature for MXNet Gluon Block, probably we can 
> > make it available in the next patch/minor release?
> > 
> > Thanks,
> > -Ciyong
> > 
> > -Original Message-
> > From: Xingjian SHI  
> > Sent: Saturday, July 11, 2020 4:27 AM
> > To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
> > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> > 
> > Thanks Ziyi,
> > 
> > I've discovered the same issue when I'm trying to use AutoGluon with 
> > 1.7.0rc0 and would like to share my finding:
> > 
> > Basically, I don't think Gluon Block is designed to be pickleble. But 
> > pickling do work for some cases in the old version:
> > 
> > I've included two cases in the gist 
> > (https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890).
> > 
> > - Case1: we construct a gluon block, hybridize it and feed one NDArray to 
> > help initialize the block. After that, it will no longer be pickleble. 
> > - Case2: we just construct a gluon block and it will be pickleble in 1.6.0, 
> > but won't be pickleble in 1.7.0.
> > 
> > Thus, the real issue is: Should we supporting pickling a Gluon Block? If 
> > not, should we support combining multiprocessing.pool with the Gluon Block? 
> > For reference, PyTorch supports pickling the nn.Module as shown in: 
> > https://gist.github.com/sxjscience/90b812a66d445e759c55eedc3ef93668 and 
> > also in the doc 
> > (https://pytorch.org/tutorials/beginner/saving_loading_models.html). 
> > 
> > Best,
> > Xingjian
> > 
> > 
> > On 7/10/20, 11:31 AM, "Patrick Mu"  wrote:
> > 
> > Hi Ciyong, 
> > 
> > I just discovered an issue with the 1.7, which causes the Yolo training 
> > with latest Gluon CV Yolo to fail.
> > 
> > The PR that causes the failure is 
> > https://github.com/apache/incubator-mxnet/pull/18358, which modifies  basic 
> > blocks of Gluon to fix a memory leak issue.
> > 
> > Talked with Leonard, the author of the PR, and he said he found the 
> > root cause, but patching that PR would modifies those Gluon basic blocks 
> > further, which might be risky towards existing models and various customer 
> > models.
> > 
> > So my 2-cents i

Re: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0

2020-07-13 Thread Patrick Mu
Hi Ciyong,

I have reverted the commit, and I am able to train Yolov3 with no problem.

However I also noticed there is a ~20% regression in 1.7 comparing with 1.6 in 
inference Yolov3 with Module API, so we are going to discuss tomorrow if that 
would be an issue for 1.7.

Thanks,
Ziyi

On 2020/07/13 02:19:28, "Chen, Ciyong"  wrote: 
> Hi Ziyi, Xingjian,
> 
> Thanks for reporting the issues from GluonCV/AutoGluon perspective.
> I just did a quick try by reverting the 
> https://github.com/apache/incubator-mxnet/pull/18358, then the behavior is 
> same as 1.6.0 with the cases in the gist 
> (https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890).
> 
> Considering there's many end-users using Gluon based API/models, and 
> introducing a new patch to fix this issue could be risky, so I agree that 
> reverting this PR (#18358) might be the best option for the 1.7.0 release.
> But I'm considering is there any other test cases to cover this feature, 
> which could be helpful to track this kind of code changes in future, or can 
> you help to verify if this revert do resolve the broken issue at your side?
> 
> > Thus, the real issue is: Should we supporting pickling a Gluon Block? If 
> > not, should we support combining multiprocessing.pool with the Gluon Block?
> Seems it's more like a new feature for MXNet Gluon Block, probably we can 
> make it available in the next patch/minor release?
> 
> Thanks,
> -Ciyong
> 
> -Original Message-
> From: Xingjian SHI  
> Sent: Saturday, July 11, 2020 4:27 AM
> To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> 
> Thanks Ziyi,
> 
> I've discovered the same issue when I'm trying to use AutoGluon with 1.7.0rc0 
> and would like to share my finding:
> 
> Basically, I don't think Gluon Block is designed to be pickleble. But 
> pickling do work for some cases in the old version:
> 
> I've included two cases in the gist 
> (https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890).
> 
> - Case1: we construct a gluon block, hybridize it and feed one NDArray to 
> help initialize the block. After that, it will no longer be pickleble. 
> - Case2: we just construct a gluon block and it will be pickleble in 1.6.0, 
> but won't be pickleble in 1.7.0.
> 
> Thus, the real issue is: Should we supporting pickling a Gluon Block? If not, 
> should we support combining multiprocessing.pool with the Gluon Block? For 
> reference, PyTorch supports pickling the nn.Module as shown in: 
> https://gist.github.com/sxjscience/90b812a66d445e759c55eedc3ef93668 and also 
> in the doc 
> (https://pytorch.org/tutorials/beginner/saving_loading_models.html). 
> 
> Best,
> Xingjian
> 
> 
> On 7/10/20, 11:31 AM, "Patrick Mu"  wrote:
> 
> Hi Ciyong, 
> 
> I just discovered an issue with the 1.7, which causes the Yolo training 
> with latest Gluon CV Yolo to fail.
> 
> The PR that causes the failure is 
> https://github.com/apache/incubator-mxnet/pull/18358, which modifies  basic 
> blocks of Gluon to fix a memory leak issue.
> 
> Talked with Leonard, the author of the PR, and he said he found the root 
> cause, but patching that PR would modifies those Gluon basic blocks further, 
> which might be risky towards existing models and various customer models.
> 
> So my 2-cents is reverting this PR in 1.7, and try patching the PR in 1.x 
> and 2.0, meaning that the 1.7 won't have memory usage optimized by that 
> feature.
> 
> I'd like to hear what you think about this issue.
> 
> Thanks,
> Ziyi
> 
> 
> On 2020/07/10 06:18:02, "Chen, Ciyong"  wrote: 
> > Hi Community,
> > 
> > I would like to call for action to test/validate/vote for the release 
> candidate (1.7.0.rc0)
> > As there's not any voting result during the scheduled time window, I 
> would like to extend the time windows to July 13, 23:59:59 PST.
> > Please prepare your time and provide feedback if you've tried with the 
> pre-release code bases, thanks!
> > 
> > Best regards,
> > Ciyong
> > 
> > -Original Message-
> > From: Chen, Ciyong  
> > Sent: Monday, July 6, 2020 10:48 PM
> > To: d...@mxnet.apache.org
> > Cc: Bob Paulin ; Henri Yandell ; 
> Jason Dai ; Markus Weimer ; Michael 
> Wall 
> > Subject: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> > 
> > For the language bindings and windows platform, may I have your support 
> to help verify these fea

Re: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0

2020-07-10 Thread Patrick Mu
Hi Ciyong, 

I just discovered an issue with the 1.7, which causes the Yolo training with 
latest Gluon CV Yolo to fail.

The PR that causes the failure is 
https://github.com/apache/incubator-mxnet/pull/18358, which modifies  basic 
blocks of Gluon to fix a memory leak issue.

Talked with Leonard, the author of the PR, and he said he found the root cause, 
but patching that PR would modifies those Gluon basic blocks further, which 
might be risky towards existing models and various customer models.

So my 2-cents is reverting this PR in 1.7, and try patching the PR in 1.x and 
2.0, meaning that the 1.7 won't have memory usage optimized by that feature.

I'd like to hear what you think about this issue.

Thanks,
Ziyi


On 2020/07/10 06:18:02, "Chen, Ciyong"  wrote: 
> Hi Community,
> 
> I would like to call for action to test/validate/vote for the release 
> candidate (1.7.0.rc0)
> As there's not any voting result during the scheduled time window, I would 
> like to extend the time windows to July 13, 23:59:59 PST.
> Please prepare your time and provide feedback if you've tried with the 
> pre-release code bases, thanks!
> 
> Best regards,
> Ciyong
> 
> -Original Message-
> From: Chen, Ciyong  
> Sent: Monday, July 6, 2020 10:48 PM
> To: d...@mxnet.apache.org
> Cc: Bob Paulin ; Henri Yandell ; Jason 
> Dai ; Markus Weimer ; Michael Wall 
> 
> Subject: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> 
> For the language bindings and windows platform, may I have your support to 
> help verify these features? Thanks!
> 
> @lanking520 to help verify the Scala/Java @gigasquid to help verify the 
> Clojure
> @hetong007 to help verify the R
> @yajiedesign to help verify the windows platform
> 
> Best regards,
> Ciyong Chen
> 
> -Original Message-
> From: Chen, Ciyong 
> Sent: Monday, July 6, 2020 10:39 PM
> To: d...@mxnet.apache.org
> Cc: Bob Paulin ; Henri Yandell ; Jason 
> Dai ; Markus Weimer ; Michael Wall 
> 
> Subject: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> 
> Dear MXNet community,
> 
> This is the vote to release Apache MXNet (incubating) version 1.7.0. Voting 
> will start July 6, 23:59:59 PST and close on July 9, 23:59:59 PST.
> 
> Link to release notes:
> https://cwiki.apache.org/confluence/display/MXNET/1.7.0+Release+notes
> 
> Link to release candidate:
> https://github.com/apache/incubator-mxnet/releases/tag/1.7.0.rc0
> 
> Link to source and signatures on apache dist server:
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.7.0.rc0
> 
> Please remember to TEST first before voting accordingly:
> +1 = approve
> +0 = no opinion
> -1 = disapprove (provide reason)
> 
> Additional notes:
> 
>   *   There was an issue and discussion[1] regarding on a few numpy operators 
> failed due to numpy 1.19.0 released on Jun 20, 2020, which exists in all 
> branches (works with numpy <= 1.18.5). As numpy operator is still an 
> experimental feature in 1.7.0 release and mainly targeting in MXNet 2.0 
> release, so I decided to not block the voting and instead let the Community 
> decide whether this is a blocker for the release.
> 
> [1] https://github.com/apache/incubator-mxnet/issues/18600
> 
> Best regards,
> Ciyong Chen
> 
> 


Re: RE: Updates for 1.7.0 minor release

2020-05-13 Thread Patrick Mu
Hi Ciyong,

We found a GPU memory usage regression issue triggered by PR 
https://github.com/apache/incubator-mxnet/pull/17767, which was pushed to both 
2.0, 1.x and 1.7 branches

I have reverted this commit in 2.0, but we should revert this in 1.x and 1.7 
branches. I have made a reverting PR on 1.x 
https://github.com/apache/incubator-mxnet/pull/18309.

I am thinking if you can help to merge the reverting into 1.x and 1.7 before 
making the rc0 tag?

Thanks,
Ziyi

On 2020/05/12 00:58:22, "Chen, Ciyong"  wrote: 
> Hi Chai,
> 
> Thanks a lot for your kindly help to fix this 😊
> I will continue the rest steps of release process.
> 
> Thanks,
> -Ciyong
> 
> -Original Message-
> From: Chaitanya Bapat  
> Sent: Tuesday, May 12, 2020 8:14 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Updates for 1.7.0 minor release
> 
> Hello Ciyong,
> 
> With the https://github.com/apache/incubator-mxnet/pull/18261 merged, nightly 
> pipeline passes for 1.7.x So as far as the 2 nightly test pipelines are 
> concerned [NightlyTests and NightlyTestsForBinaries] 1.7.x is good to go!
> 
> Thanks,
> Chai
> 
> On Sun, 10 May 2020 at 04:53, Chen, Ciyong  wrote:
> 
> > Hi MXNet Community,
> >
> > Here's some updates after the code freeze.
> > 1. Nightly tests[1] and nightly binaries tests[2] were enabled, many 
> > thanks to Chaitanya who helped to create and activate these jobs for 
> > v1.7.x branch.
> > 2. A nightly test failure (incorrect with_seed path) was fixed by 
> > Chaitanya [3] 3. A bug fix for external graph pass by Sam [4] 4. 
> > Recently, there's another failed cased (test_large_vector.test_nn) in 
> > nightly test[5], and Chaitanya is helping to address this issue[6]
> >
> > I'll keep monitoring the nightly test before making a rc0 tag.
> > Please let me know if you have any other issues that should be 
> > included/fixed in this release.
> >
> > Thanks,
> > -Ciyong
> >
> > ---
> > [1]
> > http://jenkins.mxnet-ci.amazon-ml.com/view/Nightly%20Tests/job/Nightly
> > Tests/job/v1.7.x/
> > [2]
> > http://jenkins.mxnet-ci.amazon-ml.com/view/Nightly%20Tests/job/Nightly
> > TestsForBinaries/job/v1.7.x/ [3] 
> > https://github.com/apache/incubator-mxnet/pull/18220
> > [4] https://github.com/apache/incubator-mxnet/pull/18237
> > [5]
> > http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTestsForBinaries/job/
> > v1.7.x/2/execution/node/232/log/ [6] 
> > https://github.com/apache/incubator-mxnet/pull/18261
> >
> >
> > -Original Message-
> > From: Chen, Ciyong 
> > Sent: Sunday, April 26, 2020 3:29 PM
> > To: dev@mxnet.incubator.apache.org
> > Cc: Marco de Abreu 
> > Subject: Code freeze for 1.7.0 minor release
> >
> > Hi MXNet Community,
> >
> > Code freeze for 1.7.0 minor release is in effect (last commit: 38e6634)!
> > Which means there're no more NEW features going to be accepted for 
> > this release.
> >
> > Many thanks to everyone who helped submitting/back porting/reviewing 
> > the PRs targeting this release.
> > I've created a draft Release Notes for 1.7.0 release[1], please take a 
> > review, any comments/suggestions are highly appreciated.
> >
> > Currently, the nightly test pipeline [2][3] for v1.7.x is not 
> > triggered, cc @Marco de Abreu  > marco.g.ab...@gmail.com> to help take a look.
> > I will keep monitoring the nightly test result for the current code 
> > base, and continue to go through the rest of releasing process.
> >
> > [1] 
> > https://cwiki.apache.org/confluence/display/MXNET/1.7.0+Release+Notes
> > [2]
> > http://jenkins.mxnet-ci.amazon-ml.com/view/Nightly%20Tests/job/Nightly
> > Tests/job/v1.7.x/
> > [3]
> > http://jenkins.mxnet-ci.amazon-ml.com/view/Nightly%20Tests/job/Nightly
> > TestsForBinaries/job/v1.7.x/
> >
> >
> > Thanks,
> > -Ciyong
> >
> >
> 
> --
> *Chaitanya Prakash Bapat*
> *+1 (973) 953-6299*
> 
> [image: https://www.linkedin.com//in/chaibapat25]
> [image: https://www.facebook.com/chaibapat]
> [image:
> https://twitter.com/ChaiBapchya] [image:
> https://www.linkedin.com//in/chaibapat25]
> 
> 


[NOTIFICATION] Degraded CI

2020-03-19 Thread Patrick Mu
Dear Community,

Our developers are still investigating "Cannot contact " issue in 
our CI system. Previously we have upgraded our CI master instance, but 
unfortunately it didn't fix the issue. Currently ~50% of unix-gpu jobs are 
failing due to that issue, and require re-triggering of the jobs to pass the CI.

We are actively root causing this issue. Sorry for any inconvenience caused.

Best Regards,
Ziyi


[NOTIFICATION] CI BACK ONLINE

2020-03-18 Thread Patrick Mu
Dear Community,

We have restarted the CI master, with a more powerful instance with larger 
network and IO bandwidth.
 
Now CI is fully back online, and you can retrigger any pending PRs now.

Thanks,
Ziyi


[NOTIFICATION] CI Restart

2020-03-18 Thread Patrick Mu
Dear Community,

Our developers have identified frequently occurrence of "Cannot contact 
" issue 
in our CI system. Sheng and Leonard have helped to investigate this and have 
found the CI master's network bandwidth reaching limit is probably the culprit 
of the issue. To remove the burden of repeated CI retriggering from developers, 
we decided to take the following steps:

1) Stop the CI Jenkins master
2) Resize the CI master instance to a larger instance for more network 
bandwidth capacity
3) Restart the master

The workflow will take less than 1 hour to complete (ideally 5-10 mins).

In the meanwhile, if you already have PRs currently running in the CI, please 
resubmit your PRs to make sure they will run the pipeline after restart.

We are sorry for any inconvenience caused.

Best Regards,

Ziyi