Thanks for setting up the document guys, looks like a solid basis to start to work on!
Marco, Kellen and I have already added some comments. Pedro On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal <meghnabaijal2...@gmail.com> wrote: > Kellen, Thank you for your comments in the doc. > Sure Steffen, I will continue to merge everyone’s comments into the doc and > work with Pedro to finalize it. > And then we can vote on the options. > > Thanks, > Meghna Baijal > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <steffenroc...@gmail.com> > wrote: > >> Sandeep and Meghna have been working in background collecting input and >> preparing a doc. I suggest to drive discussion forward and would like to >> ask everybody to contribute to >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk >> dlavUDASzUmLjk/edit?usp=sharing >> >> Lets converge on requirements and architecture, so we can move forward with >> implementation. >> >> I would like to suggest for Pedro and Meghna to lead the discussion and >> help to resolve suggestions. >> >> I assume we need a vote once we are converged on a good draft to call it a >> plan and move forward with implementation. As we all are unhappy with the >> current CI situation I would also suggest a phased approach, so we can get >> back to reliable and efficient basic CI quickly and add advanced >> capabilities over time. >> >> Steffen >> >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < >> kellen.sunderl...@gmail.com> wrote: >> >> > Hey Henri, I think that's what a few of us are advocating. Running a set >> > of quick tests as part of the PR process, and then a more detailed >> > regression test suite periodically (say every 4 hours). This fits nicely >> > into a tagging or 2 branch development system. Commits will be tagged >> (or >> > merged into a stable branch) as soon as they pass the detailed regression >> > testing. >> > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen <bay...@apache.org> wrote: >> > >> > > Random question - can the CI be split such that the Apache CI is doing >> a >> > > basic set of checks on that hardware, and is hooked to a PR, while >> there >> > is >> > > a larger "Is trunk good for release?" test that is running periodically >> > > rather than on every PR? >> > > >> > > ie: do we need each PR to be run on varied hardware, or can we have >> this >> > > two tier approach? >> > > >> > > Hen >> > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < >> > > sandeep.krishn...@gmail.com> wrote: >> > > >> > > > Hello all, >> > > > >> > > > I am hereby opening up a discussion thread on how we can stabilize >> > Apache >> > > > MXNet CI build system. >> > > > >> > > > Problems: >> > > > >> > > > ======== >> > > > >> > > > Recently, we have seen following issues with Apache MXNet CI build >> > > systems: >> > > > >> > > > 1. Apache Jenkins master is overloaded and we see issues like - >> > unable >> > > > to trigger builds, difficult to load and view the blue ocean and >> > other >> > > > Jenkins build status page. >> > > > 2. We are generating too many request/interaction on Apache Infra >> > > team. >> > > > 1. Addition/deletion of new slave: Caused from scaling >> activity, >> > > > recycling, troubleshooting or any actions leading to change of >> > > slave >> > > > machines. >> > > > 2. Plugins / other Jenkins Master configurations. >> > > > 3. Experimentation on CI pipelines. >> > > > 3. Harder to debug and resolve issues - Since access to master and >> > > slave >> > > > is not with the same community, it requires Infra and community to >> > > dive >> > > > deep together on all action items. >> > > > >> > > > Possible Solutions: >> > > > >> > > > ============== >> > > > >> > > > 1. Can we set up a separate Jenkins CI build system for Apache >> MXNet >> > > > outside Apache Infra? >> > > > 2. Can we have a separate Jenkins Master in Apache Infra for >> MXNet? >> > > > 3. Review design of current setup, refine and fill the gaps. >> > > > >> > > > @ Mentors/Infra team/Community: >> > > > >> > > > ========================== >> > > > >> > > > Please provide your suggestions on how we can proceed further and >> work >> > on >> > > > stabilizing the CI build systems for MXNet. >> > > > >> > > > Also, if the community decides on separate Jenkins CI build system, >> > what >> > > > important points should be taken care of apart from the below: >> > > > >> > > > 1. Community being able to access the build page for build >> statuses. >> > > > 2. Committers being able to login with apache credentials. >> > > > 3. Hook setup from apache/incubator-mxnet repo to Jenkins master. >> > > > >> > > > >> > > > Irrespective of the solution we come up, I think we should initiate a >> > > > technical design discussion on how to setup the CI build system. >> > > Probably 1 >> > > > or 2 pager documents with the architecture and review with Infra and >> > > > community members. >> > > > >> > > > ***There were few proposal and discussion on the slack channel, to >> > reach >> > > > wider community members, moving that discussion formally to this >> list. >> > > > >> > > > >> > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. >> > > > >> > > > Thanks, >> > > > >> > > > Sandeep >> > > > >> > > > >> > > > >> > > > -- >> > > > Sandeep Krishnamurthy >> > > > >> > > >> > >>