After a decision is reached, i am willing to add tasks to Apache MXNet JIRA
On Mon, Nov 6, 2017 at 6:15 AM, Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > Thanks for setting up the document guys, looks like a solid basis to > start to work on! > > Marco, Kellen and I have already added some comments. > > Pedro > > > On Sun, Nov 5, 2017 at 3:43 AM, Meghna Baijal > <meghnabaijal2...@gmail.com> wrote: > > Kellen, Thank you for your comments in the doc. > > Sure Steffen, I will continue to merge everyone’s comments into the doc > and > > work with Pedro to finalize it. > > And then we can vote on the options. > > > > Thanks, > > Meghna Baijal > > > > > > On Sat, Nov 4, 2017 at 6:34 AM, Steffen Rochel <steffenroc...@gmail.com> > > wrote: > > > >> Sandeep and Meghna have been working in background collecting input and > >> preparing a doc. I suggest to drive discussion forward and would like to > >> ask everybody to contribute to > >> https://docs.google.com/document/d/17PEasQ2VWrXi2Cf7IGZSWGZMawxDk > >> dlavUDASzUmLjk/edit?usp=sharing > >> > >> Lets converge on requirements and architecture, so we can move forward > with > >> implementation. > >> > >> I would like to suggest for Pedro and Meghna to lead the discussion and > >> help to resolve suggestions. > >> > >> I assume we need a vote once we are converged on a good draft to call > it a > >> plan and move forward with implementation. As we all are unhappy with > the > >> current CI situation I would also suggest a phased approach, so we can > get > >> back to reliable and efficient basic CI quickly and add advanced > >> capabilities over time. > >> > >> Steffen > >> > >> On Wed, Nov 1, 2017 at 1:14 PM kellen sunderland < > >> kellen.sunderl...@gmail.com> wrote: > >> > >> > Hey Henri, I think that's what a few of us are advocating. Running a > set > >> > of quick tests as part of the PR process, and then a more detailed > >> > regression test suite periodically (say every 4 hours). This fits > nicely > >> > into a tagging or 2 branch development system. Commits will be tagged > >> (or > >> > merged into a stable branch) as soon as they pass the detailed > regression > >> > testing. > >> > > >> > On Wed, Nov 1, 2017 at 9:07 PM, Hen <bay...@apache.org> wrote: > >> > > >> > > Random question - can the CI be split such that the Apache CI is > doing > >> a > >> > > basic set of checks on that hardware, and is hooked to a PR, while > >> there > >> > is > >> > > a larger "Is trunk good for release?" test that is running > periodically > >> > > rather than on every PR? > >> > > > >> > > ie: do we need each PR to be run on varied hardware, or can we have > >> this > >> > > two tier approach? > >> > > > >> > > Hen > >> > > > >> > > On Fri, Oct 20, 2017 at 1:01 PM, sandeep krishnamurthy < > >> > > sandeep.krishn...@gmail.com> wrote: > >> > > > >> > > > Hello all, > >> > > > > >> > > > I am hereby opening up a discussion thread on how we can stabilize > >> > Apache > >> > > > MXNet CI build system. > >> > > > > >> > > > Problems: > >> > > > > >> > > > ======== > >> > > > > >> > > > Recently, we have seen following issues with Apache MXNet CI build > >> > > systems: > >> > > > > >> > > > 1. Apache Jenkins master is overloaded and we see issues like - > >> > unable > >> > > > to trigger builds, difficult to load and view the blue ocean > and > >> > other > >> > > > Jenkins build status page. > >> > > > 2. We are generating too many request/interaction on Apache > Infra > >> > > team. > >> > > > 1. Addition/deletion of new slave: Caused from scaling > >> activity, > >> > > > recycling, troubleshooting or any actions leading to change > of > >> > > slave > >> > > > machines. > >> > > > 2. Plugins / other Jenkins Master configurations. > >> > > > 3. Experimentation on CI pipelines. > >> > > > 3. Harder to debug and resolve issues - Since access to master > and > >> > > slave > >> > > > is not with the same community, it requires Infra and > community to > >> > > dive > >> > > > deep together on all action items. > >> > > > > >> > > > Possible Solutions: > >> > > > > >> > > > ============== > >> > > > > >> > > > 1. Can we set up a separate Jenkins CI build system for Apache > >> MXNet > >> > > > outside Apache Infra? > >> > > > 2. Can we have a separate Jenkins Master in Apache Infra for > >> MXNet? > >> > > > 3. Review design of current setup, refine and fill the gaps. > >> > > > > >> > > > @ Mentors/Infra team/Community: > >> > > > > >> > > > ========================== > >> > > > > >> > > > Please provide your suggestions on how we can proceed further and > >> work > >> > on > >> > > > stabilizing the CI build systems for MXNet. > >> > > > > >> > > > Also, if the community decides on separate Jenkins CI build > system, > >> > what > >> > > > important points should be taken care of apart from the below: > >> > > > > >> > > > 1. Community being able to access the build page for build > >> statuses. > >> > > > 2. Committers being able to login with apache credentials. > >> > > > 3. Hook setup from apache/incubator-mxnet repo to Jenkins > master. > >> > > > > >> > > > > >> > > > Irrespective of the solution we come up, I think we should > initiate a > >> > > > technical design discussion on how to setup the CI build system. > >> > > Probably 1 > >> > > > or 2 pager documents with the architecture and review with Infra > and > >> > > > community members. > >> > > > > >> > > > ***There were few proposal and discussion on the slack channel, to > >> > reach > >> > > > wider community members, moving that discussion formally to this > >> list. > >> > > > > >> > > > > >> > > > My Proposal: Option 1 - Set up separate Jenkins CI build system. > >> > > > > >> > > > Thanks, > >> > > > > >> > > > Sandeep > >> > > > > >> > > > > >> > > > > >> > > > -- > >> > > > Sandeep Krishnamurthy > >> > > > > >> > > > >> > > >> >