Thanks for the update Marco and all the hard work put into the CI!

On Sat, Dec 1, 2018 at 1:21 PM Marco de Abreu
<marco.g.ab...@googlemail.com.invalid> wrote:

> Hello everyone,
>
> the move has just been completed and the old big pipeline as well as the
> according job have been disabled. From now on, you will see the details
> status messages below your PRs.
>
> Some people wanted to make modifications to the Jenkinsfiles recently. In
> that case, your PR will show a merge conflict. The new Jenkinsfiles are
> available at [1].
>
> Yesterday, I have indexed all PRs with our CI system to make sure that each
> one gets properly validated and our merge processes don't get impaired.
> Everything looks good so far, but due to the flakyness of our tests, it's
> quite unlikely that every single tests has passed. If your particular PR
> shows a failure for a certain test, please follow the same procedure as
> usual and retrigger it by pushing another commit. From now on, you can also
> trigger partial runs of the CI. For this, just hit up a committer and they
> will be happy to trigger that specific job on your behalf.
>
> If somebody in the community is interested, we would also be happy to
> collaborate on a bot that allows to control CI runs like retriggering
> certain jobs, requesting additional non-PR jobs to run - e.g. when you made
> changes to nightly, etc.
>
> Thanks everybody for being patient and so collaborative during this
> transisition time. I'm looking forward to everybodys contributions.
>
> Best regards,
> Marco
>
> [1]: https://github.com/apache/incubator-mxnet/tree/master/ci/jenkins
>
> On Sat, Dec 1, 2018 at 4:27 AM Marco de Abreu <
> marco.g.ab...@googlemail.com>
> wrote:
>
> > Thanks Naveen and Gavin!
> >
> > #1 has been completed and every job has finished its processing.
> >
> > #2 is the ticket with infra:
> > https://issues.apache.org/jira/browse/INFRA-17346
> >
> > I'm now waiting for their response.
> >
> > -Marco
> >
> > On Fri, Nov 30, 2018 at 8:25 PM Naveen Swamy <mnnav...@gmail.com> wrote:
> >
> >> Hi Marco/Gavin,
> >>
> >> Thanks for the clarification. I was not aware that it has been tested
> on a
> >> separate test environment(this is what I was suggesting and make the
> >> changes in a more controlled manner), last time the change was made,
> many
> >> PRs were left dangling and developers had to go trigger and I triggered
> >> them at least 5 times before it succeeded today.
> >>
> >> Appreciate all the hard work to make CI better.
> >>
> >> -Naveen
> >>
> >> On Fri, Nov 30, 2018 at 8:50 AM Gavin M. Bell <gavin.max.b...@gmail.com
> >
> >> wrote:
> >>
> >> > Hey Folks,
> >> >
> >> > Marco has been running this change in dev, with flying colors, for
> some
> >> > time. This is not an experiment but a roll out that was announced.  We
> >> also
> >> > decided to make this change post the release cut so limit the blast
> >> radius
> >> > from any critical obligations to the community.  Marco is accountable
> >> for
> >> > this work and will address any issues that may occur as he has been
> put
> >> > on-call.  We have, to our best ability, mitigated as much risk as
> >> possible
> >> > and now it is time to pull the trigger.  The community will enjoy a
> bit
> >> > more visibility and clarity into the test process which will be
> >> > advantageous, as well as allowing us to extend our infrastructure in a
> >> way
> >> > that affords us more flexibility.
> >> >
> >> > No pending PRs will be impacted.
> >> >
> >> > Thank you for your support as we evolve this system to better serve
> the
> >> > community.
> >> >
> >> > -Gavin
> >> >
> >> > On Fri, Nov 30, 2018 at 5:23 PM Marco de Abreu
> >> > <marco.g.ab...@googlemail.com.invalid> wrote:
> >> >
> >> > > Hello Naveen, this is not an experiment. Everything has been tested
> in
> >> > our
> >> > > test system and is considered working 100%. This is not a test but
> >> > actually
> >> > > the move into production - the merge into master happened a week
> ago.
> >> We
> >> > > now just have to put all PRs into the catalogue, which means that
> all
> >> PRs
> >> > > have to be analyzed with the new pipelines - the only thing that
> will
> >> be
> >> > > noticeable is that the CI is under higher load.
> >> > >
> >> > > The pending PRs will not be impacted. The existing pipeline is still
> >> > > running in parallel and everything will behave as before.
> >> > >
> >> > > -Marco
> >> > >
> >> > > On Fri, Nov 30, 2018 at 4:41 PM Naveen Swamy <mnnav...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Marco, run your experiments on a branch - set up, test it well and
> >> then
> >> > > > bring it to the master.
> >> > > >
> >> > > > > On Nov 30, 2018, at 6:53 AM, Marco de Abreu <
> >> > > > marco.g.ab...@googlemail.com.INVALID> wrote:
> >> > > > >
> >> > > > > Hello,
> >> > > > >
> >> > > > > I'm now moving forward with #1. I will try to get to #3 as soon
> as
> >> > > > possible
> >> > > > > to reduce parallel jobs in our CI. You might notice some
> >> unfinished
> >> > > > jobs. I
> >> > > > > will let you know as soon as this process has been completed.
> >> Until
> >> > > then,
> >> > > > > please bare with me since we have hundreds of jobs to run in
> >> order to
> >> > > > > validate all PRs.
> >> > > > >
> >> > > > > Best regards,
> >> > > > > Marco
> >> > > > >
> >> > > > > On Fri, Nov 30, 2018 at 1:36 AM Marco de Abreu <
> >> > > > marco.g.ab...@googlemail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > >> Hello,
> >> > > > >>
> >> > > > >> since the release branch has now been cut, I would like to move
> >> > > forward
> >> > > > >> with the CI improvements for the master branch. This would
> >> include
> >> > the
> >> > > > >> following actions:
> >> > > > >> 1. Re-enable the new Jenkins job
> >> > > > >> 2. Request Apache Infra to move the protected branch check from
> >> the
> >> > > main
> >> > > > >> pipeline to our new ones
> >> > > > >> 3. Merge https://github.com/apache/incubator-mxnet/pull/13474
> -
> >> > this
> >> > > > >> finalizes the deprecation process
> >> > > > >>
> >> > > > >> If nobody objects, I would like to start with #1 soon. Mentors,
> >> > could
> >> > > > you
> >> > > > >> please assist to create the Apache Infra ticket? I would then
> >> take
> >> > it
> >> > > > from
> >> > > > >> there and talk to Infra.
> >> > > > >>
> >> > > > >> Best regards,
> >> > > > >> Marco
> >> > > > >>
> >> > > > >> On Mon, Nov 26, 2018 at 2:47 AM kellen sunderland <
> >> > > > >> kellen.sunderl...@gmail.com> wrote:
> >> > > > >>
> >> > > > >>> Sorry, [1] meant to reference
> >> > > > >>> https://issues.jenkins-ci.org/browse/JENKINS-37984 .
> >> > > > >>>
> >> > > > >>> On Sun, Nov 25, 2018 at 5:41 PM kellen sunderland <
> >> > > > >>> kellen.sunderl...@gmail.com> wrote:
> >> > > > >>>
> >> > > > >>>> Marco and I ran into another urgent issue over the weekend
> that
> >> > was
> >> > > > >>>> causing builds to fail.  This issue was unrelated to any
> >> feature
> >> > > > >>>> development work, or other CI fixes applied recently, but it
> >> did
> >> > > > require
> >> > > > >>>> quite a bit of work from Marco (and a little from me) to fix.
> >> > > > >>>>
> >> > > > >>>> We spent enough time on the problem that it caused us to
> take a
> >> > step
> >> > > > >>> back
> >> > > > >>>> and consider how we could both fix issues in CI and support
> the
> >> > 1.4
> >> > > > >>> release
> >> > > > >>>> with the least impact possible on MXNet devs.  Marco had
> >> planned
> >> > to
> >> > > > >>> make a
> >> > > > >>>> significant change to the CI to fix a long-standing Jenkins
> >> error
> >> > > [1],
> >> > > > >>> but
> >> > > > >>>> we feel that most developers would prioritize having a stable
> >> > build
> >> > > > >>>> environment for the next few weeks over having this fix in
> >> place.
> >> > > > >>>>
> >> > > > >>>> To properly introduce a new CI system the intent was to do a
> >> > gradual
> >> > > > >>>> blue/green roll out of the fix.  To manage this rollout would
> >> have
> >> > > > taken
> >> > > > >>>> operational effort and double compute load as we run systems
> in
> >> > > > >>> parallel.
> >> > > > >>>> This risks outages due to scaling limits, and we’d rather
> make
> >> > this
> >> > > > >>> change
> >> > > > >>>> during a period of low-developer activity, i.e. shortly after
> >> the
> >> > > 1.4
> >> > > > >>>> release.
> >> > > > >>>>
> >> > > > >>>> This means that from now until the 1.4 release, in order to
> >> reduce
> >> > > > >>>> complexity MXNet developers should only see a single Jenkins
> >> > > > >>> verification
> >> > > > >>>> check, and a single Travis check.
> >> > > > >>>>
> >> > > > >>>>
> >> > > > >>>
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Sincerely,
> >> > Gavin M. Bell
> >> >
> >> >  "Never mistake a clear view for a short distance."
> >> >               -Paul Saffo
> >> >
> >>
> >
>

Reply via email to