Thanks for the update Marco and all the hard work put into the CI! On Sat, Dec 1, 2018 at 1:21 PM Marco de Abreu <marco.g.ab...@googlemail.com.invalid> wrote:
> Hello everyone, > > the move has just been completed and the old big pipeline as well as the > according job have been disabled. From now on, you will see the details > status messages below your PRs. > > Some people wanted to make modifications to the Jenkinsfiles recently. In > that case, your PR will show a merge conflict. The new Jenkinsfiles are > available at [1]. > > Yesterday, I have indexed all PRs with our CI system to make sure that each > one gets properly validated and our merge processes don't get impaired. > Everything looks good so far, but due to the flakyness of our tests, it's > quite unlikely that every single tests has passed. If your particular PR > shows a failure for a certain test, please follow the same procedure as > usual and retrigger it by pushing another commit. From now on, you can also > trigger partial runs of the CI. For this, just hit up a committer and they > will be happy to trigger that specific job on your behalf. > > If somebody in the community is interested, we would also be happy to > collaborate on a bot that allows to control CI runs like retriggering > certain jobs, requesting additional non-PR jobs to run - e.g. when you made > changes to nightly, etc. > > Thanks everybody for being patient and so collaborative during this > transisition time. I'm looking forward to everybodys contributions. > > Best regards, > Marco > > [1]: https://github.com/apache/incubator-mxnet/tree/master/ci/jenkins > > On Sat, Dec 1, 2018 at 4:27 AM Marco de Abreu < > marco.g.ab...@googlemail.com> > wrote: > > > Thanks Naveen and Gavin! > > > > #1 has been completed and every job has finished its processing. > > > > #2 is the ticket with infra: > > https://issues.apache.org/jira/browse/INFRA-17346 > > > > I'm now waiting for their response. > > > > -Marco > > > > On Fri, Nov 30, 2018 at 8:25 PM Naveen Swamy <mnnav...@gmail.com> wrote: > > > >> Hi Marco/Gavin, > >> > >> Thanks for the clarification. I was not aware that it has been tested > on a > >> separate test environment(this is what I was suggesting and make the > >> changes in a more controlled manner), last time the change was made, > many > >> PRs were left dangling and developers had to go trigger and I triggered > >> them at least 5 times before it succeeded today. > >> > >> Appreciate all the hard work to make CI better. > >> > >> -Naveen > >> > >> On Fri, Nov 30, 2018 at 8:50 AM Gavin M. Bell <gavin.max.b...@gmail.com > > > >> wrote: > >> > >> > Hey Folks, > >> > > >> > Marco has been running this change in dev, with flying colors, for > some > >> > time. This is not an experiment but a roll out that was announced. We > >> also > >> > decided to make this change post the release cut so limit the blast > >> radius > >> > from any critical obligations to the community. Marco is accountable > >> for > >> > this work and will address any issues that may occur as he has been > put > >> > on-call. We have, to our best ability, mitigated as much risk as > >> possible > >> > and now it is time to pull the trigger. The community will enjoy a > bit > >> > more visibility and clarity into the test process which will be > >> > advantageous, as well as allowing us to extend our infrastructure in a > >> way > >> > that affords us more flexibility. > >> > > >> > No pending PRs will be impacted. > >> > > >> > Thank you for your support as we evolve this system to better serve > the > >> > community. > >> > > >> > -Gavin > >> > > >> > On Fri, Nov 30, 2018 at 5:23 PM Marco de Abreu > >> > <marco.g.ab...@googlemail.com.invalid> wrote: > >> > > >> > > Hello Naveen, this is not an experiment. Everything has been tested > in > >> > our > >> > > test system and is considered working 100%. This is not a test but > >> > actually > >> > > the move into production - the merge into master happened a week > ago. > >> We > >> > > now just have to put all PRs into the catalogue, which means that > all > >> PRs > >> > > have to be analyzed with the new pipelines - the only thing that > will > >> be > >> > > noticeable is that the CI is under higher load. > >> > > > >> > > The pending PRs will not be impacted. The existing pipeline is still > >> > > running in parallel and everything will behave as before. > >> > > > >> > > -Marco > >> > > > >> > > On Fri, Nov 30, 2018 at 4:41 PM Naveen Swamy <mnnav...@gmail.com> > >> wrote: > >> > > > >> > > > Marco, run your experiments on a branch - set up, test it well and > >> then > >> > > > bring it to the master. > >> > > > > >> > > > > On Nov 30, 2018, at 6:53 AM, Marco de Abreu < > >> > > > marco.g.ab...@googlemail.com.INVALID> wrote: > >> > > > > > >> > > > > Hello, > >> > > > > > >> > > > > I'm now moving forward with #1. I will try to get to #3 as soon > as > >> > > > possible > >> > > > > to reduce parallel jobs in our CI. You might notice some > >> unfinished > >> > > > jobs. I > >> > > > > will let you know as soon as this process has been completed. > >> Until > >> > > then, > >> > > > > please bare with me since we have hundreds of jobs to run in > >> order to > >> > > > > validate all PRs. > >> > > > > > >> > > > > Best regards, > >> > > > > Marco > >> > > > > > >> > > > > On Fri, Nov 30, 2018 at 1:36 AM Marco de Abreu < > >> > > > marco.g.ab...@googlemail.com> > >> > > > > wrote: > >> > > > > > >> > > > >> Hello, > >> > > > >> > >> > > > >> since the release branch has now been cut, I would like to move > >> > > forward > >> > > > >> with the CI improvements for the master branch. This would > >> include > >> > the > >> > > > >> following actions: > >> > > > >> 1. Re-enable the new Jenkins job > >> > > > >> 2. Request Apache Infra to move the protected branch check from > >> the > >> > > main > >> > > > >> pipeline to our new ones > >> > > > >> 3. Merge https://github.com/apache/incubator-mxnet/pull/13474 > - > >> > this > >> > > > >> finalizes the deprecation process > >> > > > >> > >> > > > >> If nobody objects, I would like to start with #1 soon. Mentors, > >> > could > >> > > > you > >> > > > >> please assist to create the Apache Infra ticket? I would then > >> take > >> > it > >> > > > from > >> > > > >> there and talk to Infra. > >> > > > >> > >> > > > >> Best regards, > >> > > > >> Marco > >> > > > >> > >> > > > >> On Mon, Nov 26, 2018 at 2:47 AM kellen sunderland < > >> > > > >> kellen.sunderl...@gmail.com> wrote: > >> > > > >> > >> > > > >>> Sorry, [1] meant to reference > >> > > > >>> https://issues.jenkins-ci.org/browse/JENKINS-37984 . > >> > > > >>> > >> > > > >>> On Sun, Nov 25, 2018 at 5:41 PM kellen sunderland < > >> > > > >>> kellen.sunderl...@gmail.com> wrote: > >> > > > >>> > >> > > > >>>> Marco and I ran into another urgent issue over the weekend > that > >> > was > >> > > > >>>> causing builds to fail. This issue was unrelated to any > >> feature > >> > > > >>>> development work, or other CI fixes applied recently, but it > >> did > >> > > > require > >> > > > >>>> quite a bit of work from Marco (and a little from me) to fix. > >> > > > >>>> > >> > > > >>>> We spent enough time on the problem that it caused us to > take a > >> > step > >> > > > >>> back > >> > > > >>>> and consider how we could both fix issues in CI and support > the > >> > 1.4 > >> > > > >>> release > >> > > > >>>> with the least impact possible on MXNet devs. Marco had > >> planned > >> > to > >> > > > >>> make a > >> > > > >>>> significant change to the CI to fix a long-standing Jenkins > >> error > >> > > [1], > >> > > > >>> but > >> > > > >>>> we feel that most developers would prioritize having a stable > >> > build > >> > > > >>>> environment for the next few weeks over having this fix in > >> place. > >> > > > >>>> > >> > > > >>>> To properly introduce a new CI system the intent was to do a > >> > gradual > >> > > > >>>> blue/green roll out of the fix. To manage this rollout would > >> have > >> > > > taken > >> > > > >>>> operational effort and double compute load as we run systems > in > >> > > > >>> parallel. > >> > > > >>>> This risks outages due to scaling limits, and we’d rather > make > >> > this > >> > > > >>> change > >> > > > >>>> during a period of low-developer activity, i.e. shortly after > >> the > >> > > 1.4 > >> > > > >>>> release. > >> > > > >>>> > >> > > > >>>> This means that from now until the 1.4 release, in order to > >> reduce > >> > > > >>>> complexity MXNet developers should only see a single Jenkins > >> > > > >>> verification > >> > > > >>>> check, and a single Travis check. > >> > > > >>>> > >> > > > >>>> > >> > > > >>> > >> > > > >> > >> > > > > >> > > > >> > > >> > > >> > -- > >> > Sincerely, > >> > Gavin M. Bell > >> > > >> > "Never mistake a clear view for a short distance." > >> > -Paul Saffo > >> > > >> > > >