Few comments/suggestions: 1) Can we have this nice list of todo items on the Apache MXNet wiki page to track them better?
2) Can we have a set of owners for each set of tests and source code directory? One of the problems I have observed is that when there is a test failure, it is difficult to find an owner who will take the responsibility of fixing the test OR identifying the culprit code promptly -- this causes the master to continue to fail for many days. 3) Specifically, we need an owner for the Windows setup -- nobody seems to know much about it -- please feel free to correct me if required. 4) +1 to have a list of all feature requests on Jira or a similar commonly and easily accessible system. 5) -1 to the branching model -- I was the gatekeeper for the branching model at Informix for the database kernel code to be merged to master along with my day-job of being a database kernel engineer for around 9 months and hence have the opinion that a branching model just shifts the burden from one place to another. We don't have a dedicated team to do the branching model. If we really need a buildable master everyday, then we could just tag every successful build as last_clean_build on master -- use this tag to get a clean master at any time. How many Apache projects are doing development on separate branches? 6) FYI: Rahul (rahul003@) has fixed various warnings with this PR: https://github.com/apache/incubator-mxnet/pull/7109 and has a test added that fails for any warning found. We can build on top of his work. 7) FYI: For the unit-tests problems, Meghna identified that some of the unit-test run times have increased significantly in the recent builds. We need volunteers to help diagnose the root-cause here: Unit Test Task Build #337 Build #500 Build #556 Python 2: GPU win 25 38 40 Python 3: GPU Win 15 38 46 Python2: CPU 25 35 80 Python3: CPU 14 28 72 R: CPU 20 34 24 R: GPU 5 24 24 8) Ensure that all PRs submitted have corresponding documentation on http://mxnet.io for it. It may be fine to have documentation follow the code changes as long as there is ownership that this task will be done in a timely manner. For example, I have requested the Nvidia team to submit PRs to update documentation on http://mxnet.io for the Volta changes to MXNet. 9) Ensure that mega-PRs have some level of design or architecture document(s) shared on the Apache MXNet wiki. The mega-PR must have both unit-tests and nightly/integration tests submitted to demonstrate high-quality level. 10) Finally, how do we get ownership for code submitted to MXNet? When something fails in a code segment that only a small set of folks know about, what is the expected SLA for a response from them? When users deploy MXNet in production environments, they will expect some form of SLA for support and a patch release. Regards, Bhavin Thaker. On Wed, Nov 1, 2017 at 8:20 AM, Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > +1 That would be great. > > On Mon, Oct 30, 2017 at 5:35 PM, Hen <bay...@apache.org> wrote: > > How about we ask for a new mxnet repo to store all the config in? > > > > On Fri, Oct 27, 2017 at 05:30 Pedro Larroy <pedro.larroy.li...@gmail.com > > > > wrote: > > > >> Just to provide a high level overview of the ideas and proposals > >> coming from different sources for the requirements for testing and > >> validation of builds: > >> > >> * Have terraform files for the testing infrastructure. Infrastructure > >> as code (IaC). Minus not emulated / nor cloud based, embedded > >> hardware. ("single command" replication of the testing infrastructure, > >> no manual steps). > >> > >> * CI software based on Jenkins, unless someone thinks there's a better > >> alternative. > >> > >> * Use autoscaling groups and improve staggered build + test steps to > >> achieve higher parallelism and shorter feedback times. > >> > >> * Switch to a branching model based on stable master + integration > >> branch. PRs are merged into dev/integration which runs extended > >> nightly tests, which are > >> then merged into master, preferably in an automated way after > >> successful extended testing. > >> Master is always tested, and always buildable. Release branches or > >> tags in master as usual for releases. > >> > >> * Build + test feedback time targeting less than 15 minutes. > >> (Currently a build in a 16x core takes 7m). This involves lot of > >> refactoring of tests, move expensive tests / big smoke tests to > >> nightlies on the integration branch, also tests on IoT devices / power > >> and performance regressions... > >> > >> * Add code coverage and other quality metrics. > >> > >> * Eliminate warnings and treat warnings as errors. We have spent time > >> tracking down "undefined behaviour" bugs that could have been caught > >> by compiler warnings. > >> > >> Is there something I'm missing or additional things that come to your > >> mind that you would wish to add? > >> > >> Pedro. > >> >