Hello everyone, the move has just been completed and the old big pipeline as well as the according job have been disabled. From now on, you will see the details status messages below your PRs.
Some people wanted to make modifications to the Jenkinsfiles recently. In that case, your PR will show a merge conflict. The new Jenkinsfiles are available at [1]. Yesterday, I have indexed all PRs with our CI system to make sure that each one gets properly validated and our merge processes don't get impaired. Everything looks good so far, but due to the flakyness of our tests, it's quite unlikely that every single tests has passed. If your particular PR shows a failure for a certain test, please follow the same procedure as usual and retrigger it by pushing another commit. From now on, you can also trigger partial runs of the CI. For this, just hit up a committer and they will be happy to trigger that specific job on your behalf. If somebody in the community is interested, we would also be happy to collaborate on a bot that allows to control CI runs like retriggering certain jobs, requesting additional non-PR jobs to run - e.g. when you made changes to nightly, etc. Thanks everybody for being patient and so collaborative during this transisition time. I'm looking forward to everybodys contributions. Best regards, Marco [1]: https://github.com/apache/incubator-mxnet/tree/master/ci/jenkins On Sat, Dec 1, 2018 at 4:27 AM Marco de Abreu <marco.g.ab...@googlemail.com> wrote: > Thanks Naveen and Gavin! > > #1 has been completed and every job has finished its processing. > > #2 is the ticket with infra: > https://issues.apache.org/jira/browse/INFRA-17346 > > I'm now waiting for their response. > > -Marco > > On Fri, Nov 30, 2018 at 8:25 PM Naveen Swamy <mnnav...@gmail.com> wrote: > >> Hi Marco/Gavin, >> >> Thanks for the clarification. I was not aware that it has been tested on a >> separate test environment(this is what I was suggesting and make the >> changes in a more controlled manner), last time the change was made, many >> PRs were left dangling and developers had to go trigger and I triggered >> them at least 5 times before it succeeded today. >> >> Appreciate all the hard work to make CI better. >> >> -Naveen >> >> On Fri, Nov 30, 2018 at 8:50 AM Gavin M. Bell <gavin.max.b...@gmail.com> >> wrote: >> >> > Hey Folks, >> > >> > Marco has been running this change in dev, with flying colors, for some >> > time. This is not an experiment but a roll out that was announced. We >> also >> > decided to make this change post the release cut so limit the blast >> radius >> > from any critical obligations to the community. Marco is accountable >> for >> > this work and will address any issues that may occur as he has been put >> > on-call. We have, to our best ability, mitigated as much risk as >> possible >> > and now it is time to pull the trigger. The community will enjoy a bit >> > more visibility and clarity into the test process which will be >> > advantageous, as well as allowing us to extend our infrastructure in a >> way >> > that affords us more flexibility. >> > >> > No pending PRs will be impacted. >> > >> > Thank you for your support as we evolve this system to better serve the >> > community. >> > >> > -Gavin >> > >> > On Fri, Nov 30, 2018 at 5:23 PM Marco de Abreu >> > <marco.g.ab...@googlemail.com.invalid> wrote: >> > >> > > Hello Naveen, this is not an experiment. Everything has been tested in >> > our >> > > test system and is considered working 100%. This is not a test but >> > actually >> > > the move into production - the merge into master happened a week ago. >> We >> > > now just have to put all PRs into the catalogue, which means that all >> PRs >> > > have to be analyzed with the new pipelines - the only thing that will >> be >> > > noticeable is that the CI is under higher load. >> > > >> > > The pending PRs will not be impacted. The existing pipeline is still >> > > running in parallel and everything will behave as before. >> > > >> > > -Marco >> > > >> > > On Fri, Nov 30, 2018 at 4:41 PM Naveen Swamy <mnnav...@gmail.com> >> wrote: >> > > >> > > > Marco, run your experiments on a branch - set up, test it well and >> then >> > > > bring it to the master. >> > > > >> > > > > On Nov 30, 2018, at 6:53 AM, Marco de Abreu < >> > > > marco.g.ab...@googlemail.com.INVALID> wrote: >> > > > > >> > > > > Hello, >> > > > > >> > > > > I'm now moving forward with #1. I will try to get to #3 as soon as >> > > > possible >> > > > > to reduce parallel jobs in our CI. You might notice some >> unfinished >> > > > jobs. I >> > > > > will let you know as soon as this process has been completed. >> Until >> > > then, >> > > > > please bare with me since we have hundreds of jobs to run in >> order to >> > > > > validate all PRs. >> > > > > >> > > > > Best regards, >> > > > > Marco >> > > > > >> > > > > On Fri, Nov 30, 2018 at 1:36 AM Marco de Abreu < >> > > > marco.g.ab...@googlemail.com> >> > > > > wrote: >> > > > > >> > > > >> Hello, >> > > > >> >> > > > >> since the release branch has now been cut, I would like to move >> > > forward >> > > > >> with the CI improvements for the master branch. This would >> include >> > the >> > > > >> following actions: >> > > > >> 1. Re-enable the new Jenkins job >> > > > >> 2. Request Apache Infra to move the protected branch check from >> the >> > > main >> > > > >> pipeline to our new ones >> > > > >> 3. Merge https://github.com/apache/incubator-mxnet/pull/13474 - >> > this >> > > > >> finalizes the deprecation process >> > > > >> >> > > > >> If nobody objects, I would like to start with #1 soon. Mentors, >> > could >> > > > you >> > > > >> please assist to create the Apache Infra ticket? I would then >> take >> > it >> > > > from >> > > > >> there and talk to Infra. >> > > > >> >> > > > >> Best regards, >> > > > >> Marco >> > > > >> >> > > > >> On Mon, Nov 26, 2018 at 2:47 AM kellen sunderland < >> > > > >> kellen.sunderl...@gmail.com> wrote: >> > > > >> >> > > > >>> Sorry, [1] meant to reference >> > > > >>> https://issues.jenkins-ci.org/browse/JENKINS-37984 . >> > > > >>> >> > > > >>> On Sun, Nov 25, 2018 at 5:41 PM kellen sunderland < >> > > > >>> kellen.sunderl...@gmail.com> wrote: >> > > > >>> >> > > > >>>> Marco and I ran into another urgent issue over the weekend that >> > was >> > > > >>>> causing builds to fail. This issue was unrelated to any >> feature >> > > > >>>> development work, or other CI fixes applied recently, but it >> did >> > > > require >> > > > >>>> quite a bit of work from Marco (and a little from me) to fix. >> > > > >>>> >> > > > >>>> We spent enough time on the problem that it caused us to take a >> > step >> > > > >>> back >> > > > >>>> and consider how we could both fix issues in CI and support the >> > 1.4 >> > > > >>> release >> > > > >>>> with the least impact possible on MXNet devs. Marco had >> planned >> > to >> > > > >>> make a >> > > > >>>> significant change to the CI to fix a long-standing Jenkins >> error >> > > [1], >> > > > >>> but >> > > > >>>> we feel that most developers would prioritize having a stable >> > build >> > > > >>>> environment for the next few weeks over having this fix in >> place. >> > > > >>>> >> > > > >>>> To properly introduce a new CI system the intent was to do a >> > gradual >> > > > >>>> blue/green roll out of the fix. To manage this rollout would >> have >> > > > taken >> > > > >>>> operational effort and double compute load as we run systems in >> > > > >>> parallel. >> > > > >>>> This risks outages due to scaling limits, and we’d rather make >> > this >> > > > >>> change >> > > > >>>> during a period of low-developer activity, i.e. shortly after >> the >> > > 1.4 >> > > > >>>> release. >> > > > >>>> >> > > > >>>> This means that from now until the 1.4 release, in order to >> reduce >> > > > >>>> complexity MXNet developers should only see a single Jenkins >> > > > >>> verification >> > > > >>>> check, and a single Travis check. >> > > > >>>> >> > > > >>>> >> > > > >>> >> > > > >> >> > > > >> > > >> > >> > >> > -- >> > Sincerely, >> > Gavin M. Bell >> > >> > "Never mistake a clear view for a short distance." >> > -Paul Saffo >> > >> >