Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

Fergal Daly Wed, 05 Dec 2007 13:47:20 -0800

On 05/12/2007, Michael G Schwern <[EMAIL PROTECTED]> wrote:
> I'm going to sum up this reply, because it got long but kept on the same 
> themes.
>
> *  TODO tests provide you with information about what tests the author decided
> to ignore.
> **  Commented out tests provide you with NO information.
> **  Most TODO tests would have otherwise been commented out.
>
> *  How you interpret that information is up to you.
> **  Most folks don't care, so the default is to be quiet.
>
> *  The decision for what is success and what is failure lies with the author
> **  There's nothing we can do to stop that.
> **  But TODO tests allow you to reinterpret the author's desires.
>
> *  TAP::Harness (aka Test::Harness 3) has fairly easy ways to control how
>    TODO tests are interpreted.
> **  It could be made easier, especially WRT controlling "make test"
> **  CPAN::Reporter could be made aware of TODO passes.
>
>
> Fergal Daly wrote:
> > On 05/12/2007, Michael G Schwern <[EMAIL PROTECTED]> wrote:
> >> This this whole discussion has unhinged a bit from reality, maybe you can 
> >> give
> >> some concrete examples of the problems you're talking about?  You obviously
> >> have some specific breakdowns in mind.
> >
> > I don't, I'm arguing against what has been put forward as good
> > practice when there are other better practices that are approximately
> > as easy and don't have the same downsides.
> >
> > In fairness though these bad practices were far more strongly
> > advocated in the previous thread on this topic than in this one.
>
> I don't know what thread that was, or if I was involved, so maybe I'm not the
> best person to be arguing with.
>
>
> >> The final choice, incrementing the dependency version to one that does not 
> >> yet
> >> exist, boils down to "it won't work".  It's also ill advised to anticipate
> >> that version X+1 will fix a given bug as on more than one occasion an
> >> anticipated bug has not been fixed in the next version.
> >
> > As I said earlier though, in Module::Build you have the option of
> > saying version < X and then when it's finally fixed, you can say !X
> > (and !X+1 if that didn't fix it).
>
> Yep, rich dependencies are helpful.
>
>
> >> There is also the "I don't think feature X works in Y environment" problem.
> >> For example, say you have something that depends on symlinks.  You could 
> >> hard
> >> code in your test to skip if on Windows or some such, but that's often too
> >> broad.  Maybe they'll add them in a later version, or with a different
> >> filesystem (it's happened on VMS) or with some fancy 3rd party hack.  It's
> >> nice to get that information back.
> >
> > How do you get this information back? Unexpected passes are not
> > reported to you. If you want to be informed about things like this a
> > TODO is not a very good way to do it.
>
> The TODO test is precisely the way to do it, it provides all the information
> needed.  We just don't have the infrastructure to report it back.


Right so arguments that it is useful to because you get information
are not yet true and in fact the general consensus is that we don't
want reports of unexpected success so there is no plan to ever get
this information.

> As discussed before, what's needed is a higher resolution then just "pass" and
> "fail" for the complete test run.  That's the "Result: PASS/TODO" discussed
> earlier.  Things like CPAN::Reporter could then send that information back to
> the author.  It's a fairly trivial change for Test::Harness.
>
> The important thing is that "report back" is no longer locked to "fail".

Yes, this is the crux. People appear to be using TODO as way of avoid
failure reports that are outside their control and already known. This
is at the expense of installing untested code onto users machines
(when I say untested here, I admit that the tests have run but their
results have been ignored).

This use of TODO trades developer convenience for user safety. As we
have agreed that the developer is not in a position to decide which
tests are important for a given user, you can't argue that a developer
would only do this for unimportant tests.

It seems very odd to me that a test can worth running this week but
not worth running next week due to things that may not even exists in
the users environment (they may not have the known-bad version of Foo
install).

The importance of the test has not changed. Only the worth of the
failure report has changed.

This could be solved by having another classification of test, the
"not my fault" test used as follows

BLAME: {
  $foo_broken = test_Foo(); # might just be a version check or might
be a feature check
  local $BLAME = "Foo is broken, see RT #12345" if $foo_broken;

  ok(Foo::thing());
}

The module would install just fine in the presence of a working Foo,
the module would fail to install in the presence of a broken Foo but
no report should be sent to the author.

This gives both safety for users and convenience for developers. This
is what I meant by smarter tools.

...snip...

> > A far easier way to be notified when Foo starts working is
> > to write an explicit test for Foo's functionality and run it whenever
> > you see a new Foo.
>
> Humans are really awful at rote work, especially over long periods of time,
> and I don't want to waste my brainpower remembering to manually run special
> tests for special conditions.  Bleh.

This is not the first time I've heard this argument but as far as I
know nothing currently automates what you propose either. So I don't
see how you can argue that my method is rote work when your method is
also rote work.

To automate my method all you do is write a test or Foo and then every
day download the latest Foo and run the test for the breakage. With
your method you download Foo, run Bar's test suite (making sure it
picks up the latest Foo which you presumably didn't install into
/usr/lib/perl because it has a good chance of being broken) and you
check Bar's test output to see if any TODO tests have started passing.
Of course you don't know which TODO tests are relevant and you don't
know whether you should take action if just 1 of them starts passing
or only if all of them start passing.

It seems to me that the custom test method is easier to automate and
gives a clearer signal.

> >> So I think the problem you're concerned with is poor release decisions.  
> >> TODO
> >> tests are just a tool being employed therein.
> >
> > The point is that you have no idea what functionality is important for
> > your users. Disabling (with TODO or any other means) tests that test
> > previously working functionality that might be critical for a given
> > user is always a poor release decision in my book.
>
> Your opinion, and generally mine, too.  I do agree with the philosophy of
> "version X+1 should be no worse than version X", but reality intervenes.  And
> authoring a general purpose testing library means to be realistic, not
> idealistic (while always nudging towards the idealistic).
>
> If one does have to release with breakage, rather than have folks make the
> choice between "release with failing tests and get lots of redundant reports
> and stop automated installers" and "comment out failing tests (and probably
> forget to uncomment them later)" TODO tests give a third more palatable option
> to allow tests to remain and still be active.
>
> Consider also that you're not always blotting out previously working
> functionality.  Often you get a bug report of something that never quite
> worked.  Good practice says you write a test.  If you can't fix it before
> release, wrap it in a TODO test.

That's fine, I'm not arguing against this as this is the proper use of TODO.

F

...unobjectionable stuff snipped...

Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

Reply via email to