we haven’t seemed to touch on this yet, but what’s the vision on how we
“encourage” people to fix their tests (assuming we have a rough idea who is
responsible)? honor system? complaining on dev? blocking PR merges?  prayer?

it’s been pointed out a few times that not one disabled test has been fixed
in all this time...


On Sun, Jan 14, 2018 at 12:53 PM Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Sheng, could you provide a list of tests which you would cover with the
> flaky-plugin? I totally agree with the point that we should not create a
> release if we have reduced test coverage and it should be our highest
> priority to restore it properly. I'd propose that if a test takes less than
> 5 seconds, it can be covered by the flaky-plugin with a retry-count of 5.
> Flaky tests which take longer than 5 seconds have to be fixed before
> reenabling and must not be using the flaky-plugin in order to address
> Bhavins concerns.
>
> I'd propose against the nightly solution as this basically limits
> visibility of results to Amazon-employees - nobody else really interacts
> with that CI system and results are not directly reported (except if we
> take some effort to create notifications etc, but the time is better spent
> in actually fixing the tests).
>
> -Marco
>
> On Sun, Jan 14, 2018 at 9:49 PM, Sheng Zha <zhash...@apache.org> wrote:
>
> > Hi Bhavin,
> >
> > Thank you for the support. Running it nightly is a great idea in that it
> > doesn't compromise the coverage and we can still get notified fairly soon
> > when things are breaking. Is there a way to subscribe to its result
> report?
> >
> > -sz
> >
> > On 2018-01-14 12:28, Bhavin Thaker <bhavintha...@gmail.com> wrote:
> > > Hi Sheng,
> > >
> > > I agree with doubling-down on the efforts to fix the flaky tests but do
> > not
> > > agree with compromising the stability of the test automation.
> > >
> > > As a compromise, we could probably run the flaky tests as part of the
> > > nightly test automation -- would that work?
> > >
> > > I like your suggestion of using this:
> https://pypi.python.org/pypi/flaky
> > in
> > > another email thread. May be we could have a higher rerun count as part
> > of
> > > the nightly test to have better test automation stability.
> > >
> > > Bhavin Thaker.
> > >
> > > On Sun, Jan 14, 2018 at 12:21 PM, Sheng Zha <zhash...@apache.org>
> wrote:
> > >
> > > > Hi Bhavin,
> > > >
> > > > Thanks for sharing your thoughts. Regarding the usage of 'flaky'
> plugin
> > > > for retrying flaky tests, it's proposed as a compromise, given that
> it
> > will
> > > > take time to properly fix the tests and we still need coverage in the
> > > > meantime.
> > > >
> > > > I'm not sure if releasing before these tests are re-enabled should be
> > the
> > > > way, as it's not a good practice to release features that are not
> > covered
> > > > by tests. Having done it before doesn't make it right. In that sense,
> > > > release efforts shouldn't be a blocker for re-enabling tests. Rather,
> > it
> > > > should be the other way around, and release should happen only after
> we
> > > > recover the lost test coverage.
> > > >
> > > > I hope that we would do the right thing for our users. Thanks.
> > > >
> > > > -sz
> > > >
> > > > On 2018-01-14 11:00, Bhavin Thaker <bhavintha...@gmail.com> wrote:
> > > > > Hi Sheng,
> > > > >
> > > > > Thank you for your efforts and this proposal to improve the tests.
> > Here
> > > > are
> > > > > my thoughts.
> > > > >
> > > > > Shouldn’t the focus be to _engineer_ each test to be reliable
> > instead of
> > > > > compromising and discussing the relative tradeoffs in re-enabling
> > flaky
> > > > > tests? Is the test failure probability really 10%?
> > > > >
> > > > > As you correctly mention, the experiences in making the tests
> > reliable
> > > > will
> > > > > then serve as the standard for adding new tests rather than
> > continuing to
> > > > > chase the elusive goal of reliable tests.
> > > > >
> > > > > Hence, my non-binding vote is:
> > > > > -1 for proposal #1 for renabling flaky tests.
> > > > > +1 for proposal #2 for setting the standard for adding reliable
> > tests.
> > > > >
> > > > > I suggest to NOT compromise on the quality and reliability of the
> > tests,
> > > > > similar to the high bar maintained for the MXNet source code.
> > > > >
> > > > > If the final vote is to re-enable flaky tests, then I propose that
> we
> > > > > enable them immediately AFTER the next MXNet release instead of
> > doing it
> > > > > during the upcoming release.
> > > > >
> > > > > Bhavin Thaker.
> > > > >
> > > > > On Sat, Jan 13, 2018 at 2:20 PM, Marco de Abreu <
> > > > > marco.g.ab...@googlemail.com> wrote:
> > > > >
> > > > > > Hello Sheng,
> > > > > >
> > > > > > thanks a lot for leading this task!
> > > > > >
> > > > > > +1 for both points. Additionally, I'd propose to add the
> > requirement to
> > > > > > specify a reason if a new test takes more than X seconds (say 10)
> > or
> > > > adds
> > > > > > an external dependency.
> > > > > >
> > > > > > Looking forward to getting these tests fixed :)
> > > > > >
> > > > > > Best regards,
> > > > > > Marco
> > > > > >
> > > > > > On Sat, Jan 13, 2018 at 11:14 PM, Sheng Zha <zhash...@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi MXNet community,
> > > > > > >
> > > > > > > Thanks to the efforts of several community members, we
> identified
> > > > many
> > > > > > > flaky tests. These tests are currently disabled to ensure the
> > smooth
> > > > > > > execution of continuous integration (CI). As a result, we lost
> > > > coverage
> > > > > > on
> > > > > > > those features. They need fixing and to be re-enabled to ensure
> > the
> > > > > > quality
> > > > > > > of our releases. I'd like to propose the following:
> > > > > > >
> > > > > > > 1, Re-enable flaky python tests with retries if feasible
> > > > > > > Although the tests are unstable, they would still be able to
> > catch
> > > > > > breaking
> > > > > > > changes. For example, suppose a test fails randomly with 10%
> > > > probability,
> > > > > > > the probability of three failed retries become 0.1%. On the
> other
> > > > hand, a
> > > > > > > breaking change would result in 100% failure. Although this
> could
> > > > > > increase
> > > > > > > the testing time, it's a compromise that can help avoid bigger
> > > > problem.
> > > > > > >
> > > > > > > 2, Set standard for new tests
> > > > > > > I think having criteria that new tests should follow can help
> > > > improve the
> > > > > > > quality of tests, but also the quality of code. I propose the
> > > > following
> > > > > > > standard for tests.
> > > > > > > - Reliably passing with good coverage
> > > > > > > - Avoid randomness unless necessary
> > > > > > > - Avoid external dependency unless necessary (e.g. due to
> > license)
> > > > > > > - Not resource-intensive unless necessary (e.g. scaling tests)
> > > > > > >
> > > > > > > In addition, I'd like to call for volunteers on helping with
> the
> > fix
> > > > of
> > > > > > > tests. New members are especially welcome, as it's a good
> > > > opportunity to
> > > > > > > familiarize with MXNet. Also, I'd like to request that members
> > who
> > > > wrote
> > > > > > > the feature/test could help either by fixing, or by helping
> > others
> > > > > > > understand the issues.
> > > > > > >
> > > > > > > The effort on fixing the tests is tracked at:
> > > > > > > https://github.com/apache/incubator-mxnet/issues/9412
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Sheng
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to