Sandeep and Meghna have been working in background collecting input and preparing a doc. I suggest to drive discussion forward and would like to ask everybody to contribute to https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDkdlavUDASzUmLjk/edit?usp=sharing
Lets converge on requirements and architecture, so we can move forward with implementation. I would like to suggest for Pedro and Meghna to lead the discussion and help to resolve suggestions. I assume we need a vote once we are converged on a good draft to call it a plan and move forward with implementation. As we all are unhappy with the current CI situation I would also suggest a phased approach, so we can get back to reliable and efficient basic CI quickly and add advanced capabilities over time. Steffen On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < kellen.sunderl...@gmail.com> wrote: > Hey Henri, I think that's what a few of us are advocating. Running a set > of quick tests as part of the PR process, and then a more detailed > regression test suite periodically (say every 4 hours). This fits nicely > into a tagging or 2 branch development system. Commits will be tagged (or > merged into a stable branch) as soon as they pass the detailed regression > testing. > > On Wed, Nov 1, 2017 at 9:07 PM, Hen <bay...@apache.org> wrote: > > > Random question - can the CI be split such that the Apache CI is doing a > > basic set of checks on that hardware, and is hooked to a PR, while there > is > > a larger "Is trunk good for release?" test that is running periodically > > rather than on every PR? > > > > ie: do we need each PR to be run on varied hardware, or can we have this > > two tier approach? > > > > Hen > > > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > > sandeep.krishn...@gmail.com> wrote: > > > > > Hello all, > > > > > > I am hereby opening up a discussion thread on how we can stabilize > Apache > > > MXNet CI build system. > > > > > > Problems: > > > > > > ======== > > > > > > Recently, we have seen following issues with Apache MXNet CI build > > systems: > > > > > > 1. Apache Jenkins master is overloaded and we see issues like - > unable > > > to trigger builds, difficult to load and view the blue ocean and > other > > > Jenkins build status page. > > > 2. We are generating too many request/interaction on Apache Infra > > team. > > > 1. Addition/deletion of new slave: Caused from scaling activity, > > > recycling, troubleshooting or any actions leading to change of > > slave > > > machines. > > > 2. Plugins / other Jenkins Master configurations. > > > 3. Experimentation on CI pipelines. > > > 3. Harder to debug and resolve issues - Since access to master and > > slave > > > is not with the same community, it requires Infra and community to > > dive > > > deep together on all action items. > > > > > > Possible Solutions: > > > > > > ============== > > > > > > 1. Can we set up a separate Jenkins CI build system for Apache MXNet > > > outside Apache Infra? > > > 2. Can we have a separate Jenkins Master in Apache Infra for MXNet? > > > 3. Review design of current setup, refine and fill the gaps. > > > > > > @ Mentors/Infra team/Community: > > > > > > ========================== > > > > > > Please provide your suggestions on how we can proceed further and work > on > > > stabilizing the CI build systems for MXNet. > > > > > > Also, if the community decides on separate Jenkins CI build system, > what > > > important points should be taken care of apart from the below: > > > > > > 1. Community being able to access the build page for build statuses. > > > 2. Committers being able to login with apache credentials. > > > 3. Hook setup from apache/incubator-mxnet repo to Jenkins master. > > > > > > > > > Irrespective of the solution we come up, I think we should initiate a > > > technical design discussion on how to setup the CI build system. > > Probably 1 > > > or 2 pager documents with the architecture and review with Infra and > > > community members. > > > > > > ***There were few proposal and discussion on the slack channel, to > reach > > > wider community members, moving that discussion formally to this list. > > > > > > > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > > > > > > Thanks, > > > > > > Sandeep > > > > > > > > > > > > -- > > > Sandeep Krishnamurthy > > > > > >