Thanks for setting up the document guys, looks like a solid basis to
start to work on!

Marco, Kellen and I have already added some comments.

Pedro


On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal
<meghnabaijal2...@gmail.com> wrote:
> Kellen, Thank you for your comments in the doc.
> Sure Steffen, I will continue to merge everyone’s comments into the doc and
> work with Pedro to finalize it.
> And then we can vote on the options.
>
> Thanks,
> Meghna Baijal
>
>
> On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <steffenroc...@gmail.com>
> wrote:
>
>> Sandeep and Meghna have been working in background collecting input and
>> preparing a doc. I suggest to drive discussion forward and would like to
>> ask everybody to contribute to
>> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk
>> dlavUDASzUmLjk/edit?usp=sharing
>>
>> Lets converge on requirements and architecture, so we can move forward with
>> implementation.
>>
>> I would like to suggest for Pedro  and Meghna to lead the discussion and
>> help to resolve suggestions.
>>
>> I assume we need a vote once we are converged on a good draft to call it a
>> plan and move forward with implementation. As we all are unhappy with the
>> current CI situation I would also suggest a phased approach, so we can get
>> back to reliable and efficient basic CI quickly and add advanced
>> capabilities over time.
>>
>> Steffen
>>
>> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland <
>> kellen.sunderl...@gmail.com> wrote:
>>
>> > Hey Henri, I think that's what a few of us are advocating.  Running a set
>> > of quick tests as part of the PR process, and then a more detailed
>> > regression test suite periodically (say every 4 hours). This fits nicely
>> > into a tagging or 2 branch development system.  Commits will be tagged
>> (or
>> > merged into a stable branch) as soon as they pass the detailed regression
>> > testing.
>> >
>> > On Wed, Nov 1, 2017 at 9:07 PM, Hen <bay...@apache.org> wrote:
>> >
>> > > Random question - can the CI be split such that the Apache CI is doing
>> a
>> > > basic set of checks on that hardware, and is hooked to a PR, while
>> there
>> > is
>> > > a larger "Is trunk good for release?" test that is running periodically
>> > > rather than on every PR?
>> > >
>> > > ie: do we need each PR to be run on varied hardware, or can we have
>> this
>> > > two tier approach?
>> > >
>> > > Hen
>> > >
>> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy <
>> > > sandeep.krishn...@gmail.com> wrote:
>> > >
>> > > > Hello all,
>> > > >
>> > > > I am hereby opening up a discussion thread on how we can stabilize
>> > Apache
>> > > > MXNet CI build system.
>> > > >
>> > > > Problems:
>> > > >
>> > > > ========
>> > > >
>> > > > Recently, we have seen following issues with Apache MXNet CI build
>> > > systems:
>> > > >
>> > > >    1. Apache Jenkins master is overloaded and we see issues like -
>> > unable
>> > > >    to trigger builds, difficult to load and view the blue ocean and
>> > other
>> > > >    Jenkins build status page.
>> > > >    2. We are generating too many request/interaction on Apache Infra
>> > > team.
>> > > >       1. Addition/deletion of new slave: Caused from scaling
>> activity,
>> > > >       recycling, troubleshooting or any actions leading to change of
>> > > slave
>> > > >       machines.
>> > > >       2. Plugins / other Jenkins Master configurations.
>> > > >       3. Experimentation on CI pipelines.
>> > > >    3. Harder to debug and resolve issues - Since access to master and
>> > > slave
>> > > >    is not with the same community, it requires Infra and community to
>> > > dive
>> > > >    deep together on all action items.
>> > > >
>> > > > Possible Solutions:
>> > > >
>> > > > ==============
>> > > >
>> > > >    1. Can we set up a separate Jenkins CI build system for Apache
>> MXNet
>> > > >    outside Apache Infra?
>> > > >    2. Can we have a separate Jenkins Master in Apache Infra for
>> MXNet?
>> > > >    3. Review design of current setup, refine and fill the gaps.
>> > > >
>> > > > @ Mentors/Infra team/Community:
>> > > >
>> > > > ==========================
>> > > >
>> > > > Please provide your suggestions on how we can proceed further and
>> work
>> > on
>> > > > stabilizing the CI build systems for MXNet.
>> > > >
>> > > > Also, if the community decides on separate Jenkins CI build system,
>> > what
>> > > > important points should be taken care of apart from the below:
>> > > >
>> > > >    1. Community being able to access the build page for build
>> statuses.
>> > > >    2. Committers being able to login with apache credentials.
>> > > >    3. Hook setup from apache/incubator-mxnet repo to Jenkins master.
>> > > >
>> > > >
>> > > > Irrespective of the solution we come up, I think we should initiate a
>> > > > technical design discussion on how to setup the CI build system.
>> > > Probably 1
>> > > > or 2 pager documents with the architecture and review with Infra and
>> > > > community members.
>> > > >
>> > > > ***There were few proposal and discussion on the slack channel, to
>> > reach
>> > > > wider community members, moving that discussion formally to this
>> list.
>> > > >
>> > > >
>> > > > My Proposal: Option 1 - Set up separate Jenkins CI build system.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Sandeep
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Sandeep Krishnamurthy
>> > > >
>> > >
>> >
>>

Reply via email to