Sandeep and Meghna have been working in background collecting input and
preparing a doc. I suggest to drive discussion forward and would like to
ask everybody to contribute to
https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDkdlavUDASzUmLjk/edit?usp=sharing

Lets converge on requirements and architecture, so we can move forward with
implementation.

I would like to suggest for Pedro  and Meghna to lead the discussion and
help to resolve suggestions.

I assume we need a vote once we are converged on a good draft to call it a
plan and move forward with implementation. As we all are unhappy with the
current CI situation I would also suggest a phased approach, so we can get
back to reliable and efficient basic CI quickly and add advanced
capabilities over time.

Steffen

On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Hey Henri, I think that's what a few of us are advocating.  Running a set
> of quick tests as part of the PR process, and then a more detailed
> regression test suite periodically (say every 4 hours). This fits nicely
> into a tagging or 2 branch development system.  Commits will be tagged (or
> merged into a stable branch) as soon as they pass the detailed regression
> testing.
>
> On Wed, Nov 1, 2017 at 9:07 PM, Hen <bay...@apache.org> wrote:
>
> > Random question - can the CI be split such that the Apache CI is doing a
> > basic set of checks on that hardware, and is hooked to a PR, while there
> is
> > a larger "Is trunk good for release?" test that is running periodically
> > rather than on every PR?
> >
> > ie: do we need each PR to be run on varied hardware, or can we have this
> > two tier approach?
> >
> > Hen
> >
> > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
> > sandeep.krishn...@gmail.com> wrote:
> >
> > > Hello all,
> > >
> > > I am hereby opening up a discussion thread on how we can stabilize
> Apache
> > > MXNet CI build system.
> > >
> > > Problems:
> > >
> > > ========
> > >
> > > Recently, we have seen following issues with Apache MXNet CI build
> > systems:
> > >
> > >    1. Apache Jenkins master is overloaded and we see issues like -
> unable
> > >    to trigger builds, difficult to load and view the blue ocean and
> other
> > >    Jenkins build status page.
> > >    2. We are generating too many request/interaction on Apache Infra
> > team.
> > >       1. Addition/deletion of new slave: Caused from scaling activity,
> > >       recycling, troubleshooting or any actions leading to change of
> > slave
> > >       machines.
> > >       2. Plugins / other Jenkins Master configurations.
> > >       3. Experimentation on CI pipelines.
> > >    3. Harder to debug and resolve issues - Since access to master and
> > slave
> > >    is not with the same community, it requires Infra and community to
> > dive
> > >    deep together on all action items.
> > >
> > > Possible Solutions:
> > >
> > > ==============
> > >
> > >    1. Can we set up a separate Jenkins CI build system for Apache MXNet
> > >    outside Apache Infra?
> > >    2. Can we have a separate Jenkins Master in Apache Infra for MXNet?
> > >    3. Review design of current setup, refine and fill the gaps.
> > >
> > > @ Mentors/Infra team/Community:
> > >
> > > ==========================
> > >
> > > Please provide your suggestions on how we can proceed further and work
> on
> > > stabilizing the CI build systems for MXNet.
> > >
> > > Also, if the community decides on separate Jenkins CI build system,
> what
> > > important points should be taken care of apart from the below:
> > >
> > >    1. Community being able to access the build page for build statuses.
> > >    2. Committers being able to login with apache credentials.
> > >    3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
> > >
> > >
> > > Irrespective of the solution we come up, I think we should initiate a
> > > technical design discussion on how to setup the CI build system.
> > Probably 1
> > > or 2 pager documents with the architecture and review with Infra and
> > > community members.
> > >
> > > ***There were few proposal and discussion on the slack channel, to
> reach
> > > wider community members, moving that discussion formally to this list.
> > >
> > >
> > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
> > >
> > > Thanks,
> > >
> > > Sandeep
> > >
> > >
> > >
> > > --
> > > Sandeep Krishnamurthy
> > >
> >
>

Reply via email to