Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-08 Thread nadim khemir
FYI: http://socialtext.useperl.at/woc/index.cgi?todo_test_bounties


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-05 Thread Michael G Schwern
Fergal Daly wrote:
> The importance of the test has not changed. Only the worth of the
> failure report has changed.
> 
> This could be solved by having another classification of test, the
> "not my fault" test used as follows
> 
> BLAME: {
>   $foo_broken = test_Foo(); # might just be a version check or might
> be a feature check
>   local $BLAME = "Foo is broken, see RT #12345" if $foo_broken;
> 
>   ok(Foo::thing());
> }
> 
> The module would install just fine in the presence of a working Foo,
> the module would fail to install in the presence of a broken Foo but
> no report should be sent to the author.
> 
> This gives both safety for users and convenience for developers. This
> is what I meant by smarter tools.

I hope you don't mind if I cut out the rest of the increasingly head-butting
argument and jump straight to this interesting bit.

As much as my brain screams "DO NOT WANT!!!" [1] because it smacks of
"expected failure" it might be just what we're looking for.  The allows the
author to program in "I know this is broken, don't bug me about it" without
completely silencing the test.

However, I think it will be very open to abuse.  I'm also not sure how this
will be different from simply having the option of making failing TODO tests
fail for the user but not report back to the author.

It still boils down to trusting the author.


[1] http://www.mgroves.com/images/do_not_want_star_wars.jpg


-- 
There will be snacks.


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-05 Thread Fergal Daly
On 05/12/2007, Michael G Schwern <[EMAIL PROTECTED]> wrote:
> I'm going to sum up this reply, because it got long but kept on the same 
> themes.
>
> *  TODO tests provide you with information about what tests the author decided
> to ignore.
> **  Commented out tests provide you with NO information.
> **  Most TODO tests would have otherwise been commented out.
>
> *  How you interpret that information is up to you.
> **  Most folks don't care, so the default is to be quiet.
>
> *  The decision for what is success and what is failure lies with the author
> **  There's nothing we can do to stop that.
> **  But TODO tests allow you to reinterpret the author's desires.
>
> *  TAP::Harness (aka Test::Harness 3) has fairly easy ways to control how
>TODO tests are interpreted.
> **  It could be made easier, especially WRT controlling "make test"
> **  CPAN::Reporter could be made aware of TODO passes.
>
>
> Fergal Daly wrote:
> > On 05/12/2007, Michael G Schwern <[EMAIL PROTECTED]> wrote:
> >> This this whole discussion has unhinged a bit from reality, maybe you can 
> >> give
> >> some concrete examples of the problems you're talking about?  You obviously
> >> have some specific breakdowns in mind.
> >
> > I don't, I'm arguing against what has been put forward as good
> > practice when there are other better practices that are approximately
> > as easy and don't have the same downsides.
> >
> > In fairness though these bad practices were far more strongly
> > advocated in the previous thread on this topic than in this one.
>
> I don't know what thread that was, or if I was involved, so maybe I'm not the
> best person to be arguing with.
>
>
> >> The final choice, incrementing the dependency version to one that does not 
> >> yet
> >> exist, boils down to "it won't work".  It's also ill advised to anticipate
> >> that version X+1 will fix a given bug as on more than one occasion an
> >> anticipated bug has not been fixed in the next version.
> >
> > As I said earlier though, in Module::Build you have the option of
> > saying version < X and then when it's finally fixed, you can say !X
> > (and !X+1 if that didn't fix it).
>
> Yep, rich dependencies are helpful.
>
>
> >> There is also the "I don't think feature X works in Y environment" problem.
> >> For example, say you have something that depends on symlinks.  You could 
> >> hard
> >> code in your test to skip if on Windows or some such, but that's often too
> >> broad.  Maybe they'll add them in a later version, or with a different
> >> filesystem (it's happened on VMS) or with some fancy 3rd party hack.  It's
> >> nice to get that information back.
> >
> > How do you get this information back? Unexpected passes are not
> > reported to you. If you want to be informed about things like this a
> > TODO is not a very good way to do it.
>
> The TODO test is precisely the way to do it, it provides all the information
> needed.  We just don't have the infrastructure to report it back.

Right so arguments that it is useful to because you get information
are not yet true and in fact the general consensus is that we don't
want reports of unexpected success so there is no plan to ever get
this information.

> As discussed before, what's needed is a higher resolution then just "pass" and
> "fail" for the complete test run.  That's the "Result: PASS/TODO" discussed
> earlier.  Things like CPAN::Reporter could then send that information back to
> the author.  It's a fairly trivial change for Test::Harness.
>
> The important thing is that "report back" is no longer locked to "fail".

Yes, this is the crux. People appear to be using TODO as way of avoid
failure reports that are outside their control and already known. This
is at the expense of installing untested code onto users machines
(when I say untested here, I admit that the tests have run but their
results have been ignored).

This use of TODO trades developer convenience for user safety. As we
have agreed that the developer is not in a position to decide which
tests are important for a given user, you can't argue that a developer
would only do this for unimportant tests.

It seems very odd to me that a test can worth running this week but
not worth running next week due to things that may not even exists in
the users environment (they may not have the known-bad version of Foo
install).

The importance of the test has not changed. Only the worth of the
failure report has changed.

This could be solved by having another classification of test, the
"not my fault" test used as follows

BLAME: {
  $foo_broken = test_Foo(); # might just be a version check or might
be a feature check
  local $BLAME = "Foo is broken, see RT #12345" if $foo_broken;

  ok(Foo::thing());
}

The module would install just fine in the presence of a working Foo,
the module would fail to install in the presence of a broken Foo but
no report should be sent to the author.

This gives both safety for users and convenience for developers. This
is w

Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-05 Thread Nicholas Clark
On Tue, Dec 04, 2007 at 08:25:10PM -0800, Michael G Schwern wrote:

> Fergal Daly wrote:

> > You have no idea what version of Foo they're using
> 
> Well, you do with version dependency declarations so you control the range.
> New versions are, of course, open to breakage but at some point you have to
> trust something.

I'm also missing why no-one has suggested

local $TODO = "Frobulator low on pie" if Foo->VERSION < 3.14;

which ought to mean that TODO tests stop being TODO if the user has a new
enough version of Foo installed. (ie, you *do* know what version of Foo
they're using, at least in the general case)

Nicholas Clark


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-05 Thread Paul Johnson
On Tue, Dec 04, 2007 at 04:05:07PM -0800, Michael G Schwern wrote:

I've written a couple of replies to this thread with similar content,
but not sent them for one reason or another.  Perhaps I can be more
succinct here.

> Then there are folks who embrace the whole test first thing and write out lots
> and lots of tests beforehand.  Maybe you decide not to implement them all
> before shipping.  Rather than delete or comment out those tests, just wrap
> them in TODO blocks.  Then you don't have to do any fiddling with the tests
> before and after release, something which leads to an annoying shear between
> the code the author uses and the code users use.

I believe that everyone is comfortable with this use of TODOs.

> There is also the "I don't think feature X works in Y environment" problem.
> For example, say you have something that depends on symlinks.  You could hard
> code in your test to skip if on Windows or some such, but that's often too
> broad.  Maybe they'll add them in a later version, or with a different
> filesystem (it's happened on VMS) or with some fancy 3rd party hack.  It's
> nice to get that information back.

Here is where opinion seems to diverge.  I tend to agree with Fergal
here and say you should check whether symlinks are available and skip
the test if they are not.  But I can see your use case and wouldn't
necessarily want to forbid it.

When people use a variable for two different purposes ...
we tell them to use another variable.

When people use a subroutine to do two different things ...
we tell them to split it up.

When people use a TODO test for two different situations ...
we can't agree on its behaviour.

So perhaps we need another type of test.  Eric seems to be suggesting we
need another N types of test.  I suspect YAGNI, but what do I know?  I'd
have said that to this "second" use of TODO too.

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-04 Thread Michael G Schwern
I'm going to sum up this reply, because it got long but kept on the same themes.

*  TODO tests provide you with information about what tests the author decided
to ignore.
**  Commented out tests provide you with NO information.
**  Most TODO tests would have otherwise been commented out.

*  How you interpret that information is up to you.
**  Most folks don't care, so the default is to be quiet.

*  The decision for what is success and what is failure lies with the author
**  There's nothing we can do to stop that.
**  But TODO tests allow you to reinterpret the author's desires.

*  TAP::Harness (aka Test::Harness 3) has fairly easy ways to control how
   TODO tests are interpreted.
**  It could be made easier, especially WRT controlling "make test"
**  CPAN::Reporter could be made aware of TODO passes.


Fergal Daly wrote:
> On 05/12/2007, Michael G Schwern <[EMAIL PROTECTED]> wrote:
>> This this whole discussion has unhinged a bit from reality, maybe you can 
>> give
>> some concrete examples of the problems you're talking about?  You obviously
>> have some specific breakdowns in mind.
> 
> I don't, I'm arguing against what has been put forward as good
> practice when there are other better practices that are approximately
> as easy and don't have the same downsides.
> 
> In fairness though these bad practices were far more strongly
> advocated in the previous thread on this topic than in this one.

I don't know what thread that was, or if I was involved, so maybe I'm not the
best person to be arguing with.


>> The final choice, incrementing the dependency version to one that does not 
>> yet
>> exist, boils down to "it won't work".  It's also ill advised to anticipate
>> that version X+1 will fix a given bug as on more than one occasion an
>> anticipated bug has not been fixed in the next version.
> 
> As I said earlier though, in Module::Build you have the option of
> saying version < X and then when it's finally fixed, you can say !X
> (and !X+1 if that didn't fix it).

Yep, rich dependencies are helpful.


>> There is also the "I don't think feature X works in Y environment" problem.
>> For example, say you have something that depends on symlinks.  You could hard
>> code in your test to skip if on Windows or some such, but that's often too
>> broad.  Maybe they'll add them in a later version, or with a different
>> filesystem (it's happened on VMS) or with some fancy 3rd party hack.  It's
>> nice to get that information back.
> 
> How do you get this information back? Unexpected passes are not
> reported to you. If you want to be informed about things like this a
> TODO is not a very good way to do it.

The TODO test is precisely the way to do it, it provides all the information
needed.  We just don't have the infrastructure to report it back.

As discussed before, what's needed is a higher resolution then just "pass" and
"fail" for the complete test run.  That's the "Result: PASS/TODO" discussed
earlier.  Things like CPAN::Reporter could then send that information back to
the author.  It's a fairly trivial change for Test::Harness.

The important thing is that "report back" is no longer locked to "fail".


>>> I'm talking about people converting tests that were working just fine
>>> to be TODO tests because the latest version of Foo (an external
>>> module) has a new bug. While Foo is broken, they don't want lots of
>>> bug reports from CPAN testers that they can't do anything about.
>>>
>>> This use of TODO allows you to silence the alarm and also gives you a
>>> way to spot when the alarm condition has passed. It's convenient for
>>> developers but it's 2 fingers to users who can now get false passes
>>> from the test suites,
>> It still boils down to what known bugs the author is willing to release with.
>>  Once the author has decided they don't want to hear about a broken
>> dependency,  and that the breakage isn't important, the damage is done.  The
>> TODO test is orthogonal.
>>
>> Again, consider the alternative which is to comment the test out.  Then you
>> have NO information.
> 
> Who's "you"?

You == user.


> If you==user then a failing TODO test and commented out test are
> indistinguishable unless you go digging in the code or TAP stream.

As they say, "works as designed".  The author decided the failures aren't
important.  Don't like it?  Take it up with the author.  Most folks don't care
about that information, they just want the thing installed.

You (meaning Fergal Daly) can dig them out with some Test::Harness hackery,
and maybe that should be easier if you really care about it.  The important
thing is the information is there, encoded in the tests, and you can get at it
programatically.

The alternative is to comment the failing test out in which case you have *no*
information and those who are interested cannot get it out.


> A passing TODO is just confusing.

That's a function of how it's displayed.  "UNEXPECTEDLY SUCCEEDED", I agree,
was confusing.  No question.  TH 3's 

Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-04 Thread Fergal Daly
On 05/12/2007, Michael G Schwern <[EMAIL PROTECTED]> wrote:
> This this whole discussion has unhinged a bit from reality, maybe you can give
> some concrete examples of the problems you're talking about?  You obviously
> have some specific breakdowns in mind.

I don't, I'm arguing against what has been put forward as good
practice when there are other better practices that are approximately
as easy and don't have the same downsides.

In fairness though these bad practices were far more strongly
advocated in the previous thread on this topic than in this one.

> Fergal Daly wrote:
> >> Modules do not have a binary state of working or not working.  They're
> >> composed of piles of (often too many) features.  Code can be shippable 
> >> without
> >> every single thing working.
> >
> > You're right, I was being binary, but you were being unary. There are 3 
> > cases,
> >
> > 1 the breakage was not so important, so you don't bail no matter what
> > version you find.
> > 2 it's fuzzy, maybe it's OK to use Foo version X but once Foo version
> > X+1 has been released you want to force people to use it
> > 3 the breakage is serious, you always want to bail if you find Foo
> > version X (and so you definitely don't switch the tests to TODO).
> >
> > You claimed 2 is always the case.  I claimed that 1 and 3 occur.
>
> If I did, that wasn't my intent.  I only talked about #2 because it's the only
> one that results in the user seeing passing TODO tests, which is what we were
> talking about.
>
>
> > I'm
> > happy to say admit that 2 can also occur. The point remains, you would
> > not necessarily change your modules requirements as a reaction to X+1
> > being released. You might, or you might change it beforehand if it
> > really matters or you might not change it at all.
>
> And I might dip my head in whipped cream and go give a random stranger a foot
> bath.  You seem to have covered all possibilities, good and bad.  I'm not sure
> to what end.
>
> The final choice, incrementing the dependency version to one that does not yet
> exist, boils down to "it won't work".  It's also ill advised to anticipate
> that version X+1 will fix a given bug as on more than one occasion an
> anticipated bug has not been fixed in the next version.

As I said earlier though, in Module::Build you have the option of
saying version < X and then when it's finally fixed, you can say !X
(and !X+1 if that didn't fix it).

> Anyhow, to get back to the point, it boils down to an author's decision how to
> deal with a known bug.  TODO tests are orthogonal.
>
>
> >> Maybe we're arguing two different situations.  Yours seems to be when 
> >> there is
> >> a broken version of a dependency, but a known working version exists.  In 
> >> this
> >> case, you're right, it's better resolved with a rich dependency system.
> >
> > I think maybe we are.
> >
> > You're talking about where someone writes a TODO for a feature that
> > has never worked. That's legit, although I still think there's
> > something odd about it as you personally have nothing "to do". I agree
> > it's not dangerous.
>
> Sure you do, you have to watch for when the dependency fixes its bug.  But
> that's boring and rote, what computers are for!  So you write a TODO test to
> automate the process.  [1]

That's back on the other case, I'm only talking about taking an
existing, previously passing test and marking it TODO.

> In a large project, sometimes things get implemented when you implement other
> things.  This is generally more applicable to bugs, but sometimes to minor
> features.
>
> Then there are folks who embrace the whole test first thing and write out lots
> and lots of tests beforehand.  Maybe you decide not to implement them all
> before shipping.  Rather than delete or comment out those tests, just wrap
> them in TODO blocks.  Then you don't have to do any fiddling with the tests
> before and after release, something which leads to an annoying shear between
> the code the author uses and the code users use.

That's all fine.

> There is also the "I don't think feature X works in Y environment" problem.
> For example, say you have something that depends on symlinks.  You could hard
> code in your test to skip if on Windows or some such, but that's often too
> broad.  Maybe they'll add them in a later version, or with a different
> filesystem (it's happened on VMS) or with some fancy 3rd party hack.  It's
> nice to get that information back.

How do you get this information back? Unexpected passes are not
reported to you. If you want to be informed about things like this a
TODO is not a very good way to do it.

I would say you should test if the feature is there. If it is, run the
tests and enable the feature, if not don't run the tests and disable
the feature.

I think conditional enabling of not-so-important features depending on
test results is actually a far better way to do this, although we have
no infrastructure for that at the moment.

> > I'm talking about peopl

Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-04 Thread Michael G Schwern
This this whole discussion has unhinged a bit from reality, maybe you can give
some concrete examples of the problems you're talking about?  You obviously
have some specific breakdowns in mind.


Fergal Daly wrote:
>> Modules do not have a binary state of working or not working.  They're
>> composed of piles of (often too many) features.  Code can be shippable 
>> without
>> every single thing working.
> 
> You're right, I was being binary, but you were being unary. There are 3 cases,
> 
> 1 the breakage was not so important, so you don't bail no matter what
> version you find.
> 2 it's fuzzy, maybe it's OK to use Foo version X but once Foo version
> X+1 has been released you want to force people to use it
> 3 the breakage is serious, you always want to bail if you find Foo
> version X (and so you definitely don't switch the tests to TODO).
>
> You claimed 2 is always the case.  I claimed that 1 and 3 occur.

If I did, that wasn't my intent.  I only talked about #2 because it's the only
one that results in the user seeing passing TODO tests, which is what we were
talking about.


> I'm
> happy to say admit that 2 can also occur. The point remains, you would
> not necessarily change your modules requirements as a reaction to X+1
> being released. You might, or you might change it beforehand if it
> really matters or you might not change it at all.

And I might dip my head in whipped cream and go give a random stranger a foot
bath.  You seem to have covered all possibilities, good and bad.  I'm not sure
to what end.

The final choice, incrementing the dependency version to one that does not yet
exist, boils down to "it won't work".  It's also ill advised to anticipate
that version X+1 will fix a given bug as on more than one occasion an
anticipated bug has not been fixed in the next version.

Anyhow, to get back to the point, it boils down to an author's decision how to
deal with a known bug.  TODO tests are orthogonal.


>> Maybe we're arguing two different situations.  Yours seems to be when there 
>> is
>> a broken version of a dependency, but a known working version exists.  In 
>> this
>> case, you're right, it's better resolved with a rich dependency system.
> 
> I think maybe we are.
> 
> You're talking about where someone writes a TODO for a feature that
> has never worked. That's legit, although I still think there's
> something odd about it as you personally have nothing "to do". I agree
> it's not dangerous.

Sure you do, you have to watch for when the dependency fixes its bug.  But
that's boring and rote, what computers are for!  So you write a TODO test to
automate the process.  [1]

In a large project, sometimes things get implemented when you implement other
things.  This is generally more applicable to bugs, but sometimes to minor
features.

Then there are folks who embrace the whole test first thing and write out lots
and lots of tests beforehand.  Maybe you decide not to implement them all
before shipping.  Rather than delete or comment out those tests, just wrap
them in TODO blocks.  Then you don't have to do any fiddling with the tests
before and after release, something which leads to an annoying shear between
the code the author uses and the code users use.

There is also the "I don't think feature X works in Y environment" problem.
For example, say you have something that depends on symlinks.  You could hard
code in your test to skip if on Windows or some such, but that's often too
broad.  Maybe they'll add them in a later version, or with a different
filesystem (it's happened on VMS) or with some fancy 3rd party hack.  It's
nice to get that information back.


> I'm talking about people converting tests that were working just fine
> to be TODO tests because the latest version of Foo (an external
> module) has a new bug. While Foo is broken, they don't want lots of
> bug reports from CPAN testers that they can't do anything about.
> 
> This use of TODO allows you to silence the alarm and also gives you a
> way to spot when the alarm condition has passed. It's convenient for
> developers but it's 2 fingers to users who can now get false passes
> from the test suites,

It still boils down to what known bugs the author is willing to release with.
 Once the author has decided they don't want to hear about a broken
dependency,  and that the breakage isn't important, the damage is done.  The
TODO test is orthogonal.

Again, consider the alternative which is to comment the test out.  Then you
have NO information.

So I think the problem you're concerned with is poor release decisions.  TODO
tests are just a tool being employed therein.


[1] Don't get to hung up on names, things only get one even though they can do
lots of things.  I'm sure you've written lots of perl programs that didn't do
much extracting or reporting.


-- 
Reality is that which, when you stop believing in it, doesn't go away.
-- Phillip K. Dick


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-04 Thread Eric Wilhelm
# from Fergal Daly
# on Tuesday 04 December 2007 15:12:

>I'm talking about people converting tests that were working just fine
>to be TODO tests because the latest version of Foo (an external
>module) has a new bug. While Foo is broken, they don't want lots of
>bug reports from CPAN testers that they can't do anything about.

To me, the fact that we're discussing TODO tests in this context is an 
indicator that the testchain badly needs richer reporting and/or 
there's something missing in the CPAN/CPAN testers system/workflow.

--Eric
-- 
Anyone who has the power to make you believe absurdities has the power
to make you commit injustices.
--Voltaire
---
http://scratchcomputing.com
---


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-04 Thread Fergal Daly
On 02/12/2007, Michael G Schwern <[EMAIL PROTECTED]> wrote:
> Fergal Daly wrote:
> >> As long as you're releasing a new version, why would you not upgrade your
> >> module's dependency to use the version that works?
> >
> > Your module either is or isn't usable with version X of Foo.
> >
> > If it is usable then you would not change your dependency before or
> > after the bug in version X is fixed (maybe I have a good reason not to
> > upgrade Foo and you wouldn't want your module to refuse to install if
> > it is actually usable).
> >
> > If it isn't usable then marking your tests as TODO was the wrong thing
> > to do in the first place, you should have bailed out due to
> > incompatibility with version X and not bothered to run any tests at
> > all. I think Extutils::MM does not have any way to specify complex
> > version dependencies but with Module::Build you could say
>
> ETOOBINARY
>
> Modules do not have a binary state of working or not working.  They're
> composed of piles of (often too many) features.  Code can be shippable without
> every single thing working.

You're right, I was being binary, but you were being unary. There are 3 cases,

1 the breakage was not so important, so you don't bail no matter what
version you find.
2 it's fuzzy, maybe it's OK to use Foo version X but once Foo version
X+1 has been released you want to force people to use it
3 the breakage is serious, you always want to bail if you find Foo
version X (and so you definitely don't switch the tests to TODO).

You claimed 2 is always the case. I claimed that 1 and 3 occur. I'm
happy to say admit that 2 can also occur. The point remains, you would
not necessarily change your modules requirements as a reaction to X+1
being released. You might, or you might change it beforehand if it
really matters or you might not change it at all.

> The TODO test is useful when the working version *does not yet* exist.  If
> it's a minor feature or bug then rather than hold up the whole release waiting
> for someone else to fix their shit, you can mark it TODO and release.  This is
> the author's decision to go ahead and release with a known bug.  We do it all
> the time, just not necessarily with a formal TODO test.
>
> > I am basically against the practice of using TODO to cope with
> > external breakage. Not taking unexpected passes seriously encourages
> > this practice. Apart from there being other ways to handle external
> > breakage that seem easier, using TODO is actually dangerous as it can
> > cause false passes in 2 ways. Says version X of Foo has a non-serious
> > bug so you release version Y of Bar with some tests marked TODO. The
> > we risk
>
> Maybe we're arguing two different situations.  Yours seems to be when there is
> a broken version of a dependency, but a known working version exists.  In this
> case, you're right, it's better resolved with a rich dependency system.

I think maybe we are.

You're talking about where someone writes a TODO for a feature that
has never worked. That's legit, although I still think there's
something odd about it as you personally have nothing "to do". I agree
it's not dangerous.

I'm talking about people converting tests that were working just fine
to be TODO tests because the latest version of Foo (an external
module) has a new bug. While Foo is broken, they don't want lots of
bug reports from CPAN testers that they can't do anything about.

This use of TODO allows you to silence the alarm and also gives you a
way to spot when the alarm condition has passed. It's convenient for
developers but it's 2 fingers to users who can now get false passes
from the test suites,

F

> My case is when a working version of the dependency does not exist, or the
> last working version is so old it's more trouble than it's worth.  In this
> case the author decides the bug is not critical, can't be worked around and
> doesn't want to wait for fix in the dependency.  The decision is whether or
> not to release with a known bug.  After that, wrapping it in a TODO test is
> just an alternative to commenting it out.
>
> Compare with the more common alternative for shipping with a known bug which
> is to simply not have a test at all.
>
>
> > 1 Version X+1 of Foo is even worse and will cause Bar to eat your dog.
> > Sadly for your dog, the test that might have warned him has been
> > marked TODO.
>
> If they release Bar with a known bug against Foo X where your dog's fur is
> merely a bit ruffled, then that's ok.  If version X+1 of Foo causes Bar to eat
> your dog then why didn't their tests catch that?  Was there not a "dog not
> eaten" test?  If not then that's just an incomplete test, the TODO test has
> nothing to do with that.
>
> The "dog not eaten" test wouldn't have been part of the TODO test, that part
> worked fine when the author released and they'd have gotten the "todo passed"
> message and known to move it out of the TODO block.
>
> Or maybe they're just a cat person.
>
> Point is, there's multiple 

Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-03 Thread A. Pagaltzis
* Michael G Schwern <[EMAIL PROTECTED]> [2007-12-04 03:00]:
> So I read two primary statements here.
> 
> 1) Anything unexpected is suspicious. This includes unexpected
>success.
> 
> 2) Anything unexpected should be reported back to the author.
> 
> The first is controversial, and leads to the conclusion that
> TODO passes should fail.

The first doesn’t seem controversial to me; everyone agrees, I
think, that passing TODOs merit investigation in some sense or
other.

> The second is not controversial, but it erroneously leads to
> the conclusion that TODO passes should fail.

The second one wasn’t primarily proposed that I could see at all.
Some people brought it up, but it was not a particularly central
part of the discussion.

Eric mentioned asking the author, as part of investigating TODO
passes (which as I mentioned I think we all agree about). That
seems like a reasonable position to me; how it implies that TODO
passes should count as failures, I don’t understand.

> So what we need is a "pass with caveats" or, as Eric pointed
> out, some way for the harness to communicate it's results in
> a machine parsable way.

I thought that was already the overall thrust of the discussion.

I agree completely, in any case.

> Maybe "Result: TODO_PASS" is enough.

WFM.

Regards,
-- 
Aristotle Pagaltzis // 


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-03 Thread Michael G Schwern
So I read two primary statements here.

1)  Anything unexpected is suspicious.  This includes unexpected success.

2)  Anything unexpected should be reported back to the author.

The first is controversial, and leads to the conclusion that TODO passes
should fail.

The second is not controversial, but it erroneously leads to the conclusion
that TODO passes should fail.  That's the only mechanism we currently have for
telling the user "hey, something weird happened.  Pay attention!"  It's also
how we normally report stuff back to the author.  Also there's only two easily
identifiable states for a test: Pass and fail.

So what we need is a "pass with caveats" or, as Eric pointed out, some way for
the harness to communicate it's results in a machine parsable way.  The very
beginnings of such a hack was put in for CPAN::Reporter in the "Result" line
that is output at the end of the test.  Ideally you'd have the harness
spitting out its full conclusions... somehow... without cluttering up the
human readable output.  But maybe "Result: TODO_PASS" is enough.


-- 
Stabbing you in the face for your own good.


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-03 Thread nadim khemir
On Sunday 02 December 2007 18:48, Chris Dolan wrote:
> ...
>
> In this fashion, we use TODO tests to track when bugs in our
> dependent packages get fixed, and then when we can remove workarounds
> therefore.
>
> Chris


This discussion is interresting and it's always educating to understand why 
other do thing in what might first appear to be a strange way.

One of the things I wrote was that I wanted _my_ tests to fail on a passed 
TODO. This would let other use the process they see fit.

I have no doubt that any of us can deal with the "passed TODO" but we may not 
be the most representative developers. I like the principle of least surprise 
and passing todo are very very surprising, so surprising that (most) people 
writing Todos don't expect them to pass at all (I like to believe) and most 
people using modules might be surprised by that too;

How we deal with passed todos in _our_ modules is easy. The question is how do 
you deal with _other's modules_ passing todo? 

I mail the author, look in the code, mail here and discuss.

Most I believe, and it's regreatable, don't even bother looking. they scan 
for 'OK' and are happy with that. (I used to; now I like to see xx/yyy tests, 
that gives me a , false, sense of security) 

There is a third kind of module users. Entreprise users, people who follow a 
process, people investing money in development and that don't like to take 
risks. For those people, "Unexpectedly passed" screams "BROKEN".

There is not simple solution to this problem. but I would like to be able to 
check my modules more thorowly and I would like to be able to setup cpan so I 
can say "unexpectedly succeeded" == failed so I can avoid some less competent 
manager come and put a list of 'weird modules used in my project' under my 
nose.

Cheers, Nadim.

PS: some of you always send two mails, one to the original mail author and one 
to the list. it's OK but getting the information just once is, IMO, better.





Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-03 Thread Adrian Howard


On 3 Dec 2007, at 04:34, Michael G Schwern wrote:
[snip]

This doesn't mean that people don't use dies_ok() when they should use
throws_ok().  Every tool is open to abuse.  The solution is not to  
remove the
tool, but figure out why it's being abused.  Maybe the answer is as  
simple as

putting throws_ok() first in the Test::Exception documentation?

[snip]

Which it has in the subversion repo for some time now. I just have to  
get my lazy arse in gear to release it. Which I will do once I've  
fixed the extant Win bug. Tonight probably.


Adrian 'in the TODO passes are not failures camp' Howard


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-03 Thread Adrian Howard


On 3 Dec 2007, at 10:26, A. Pagaltzis wrote:


* Ovid <[EMAIL PROTECTED]> [2007-12-02 16:50]:

Breaking the toolchain is bad.


You can almost imagine Curtis murmuring those words even in
his sleep…


I have lost count of the number of times Andy/Ovid said this over the  
LPW weekend :-)


Adrian

Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-03 Thread A. Pagaltzis
* Ovid <[EMAIL PROTECTED]> [2007-12-02 16:50]:
> Breaking the toolchain is bad.

You can almost imagine Curtis murmuring those words even in
his sleep…

Regards,
-- 
Aristotle Pagaltzis // 


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Michael G Schwern
Chris Dolan wrote:
> On Dec 2, 2007, at 1:34 PM, nadim khemir wrote:
>> Because a TODO means that it is not done not:  it might happend to be
>> done but
>> I'm not really sure, maybe I get lucky.
>>
>> Either one removes the TODO and all is fine. Or it might just be a
>> side effect
>> that you haven't planned that makes the test pass. Calling "unexpected
>> things" "features" does not make me feel more sure about quality.

(I missed the above part of the original post)

The red herring is "unexpected", and it's good that wording was removed from
TH 3.  The feature is expected, but that it worked is unexpected.  You're
talking like the TODO test is some slap-dash bit of code.  The TODO test is
crafted to test a specific feature or bug, just like any other test.

What you're asking, essentially, is how do you know that the passing TODO test
is testing what it's supposed to test?

Well, remove "TODO" from that question and it reduces to the more general
question:  how do you know a passing test is testing what it's supposed to
test?  That is a question with plenty of answers.


>> This is exactely why I complained loud about Test::Exception::dies_ok.
>> It died
>> but not for the problem I expected. Other may be better or have more luck
>> than me though.

There's nothing wrong with dies_ok(), if I understand your complaint.  It does
what it says on the tin, tests that the code died.  And that's useful,
sometimes you don't know or don't care or can't accurately predict what the
exception will be.  In these cases, without dies_ok() you'd have to fake up
some static exception for throws_ok().

This doesn't mean that people don't use dies_ok() when they should use
throws_ok().  Every tool is open to abuse.  The solution is not to remove the
tool, but figure out why it's being abused.  Maybe the answer is as simple as
putting throws_ok() first in the Test::Exception documentation?

As an analogy, look at ok() vs is() in Test::More.  ok() just checks that the
statement is true, it doesn't care what it actually is.  is() gets more
specific.  In general, it's better to use is() but often you don't know (or
care) what the value actually is, just that it's true.  Removing ok() from
Test::More would increase the resolution but also remove a lot of flexibility,
just as removing dies_ok() would.


>> I really would like to have -Werror for my tests.

A warning is "hey, you might have done something wrong!"  A passing TODO test
is "hey, you might have done something right!"  :)


-- 
If at first you don't succeed--you fail.
-- "Portal" demo


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Chris Dolan

On Dec 2, 2007, at 4:11 PM, Michael G Schwern wrote:


Fergal Daly wrote:

Another downside of using TODO like this is that when the external
module is fixed, you have to release a new version of your module  
with

the TODOs removed. These tests will start failing for anyone who
upgrades your module but not broken one but in reality nothing has
changed for that user,


As long as you're releasing a new version, why would you not  
upgrade your

module's dependency to use the version that works?


Yep, that's it exactly.  See, Perl::Critic's problem was that the PPI  
SVN repository had important bug fixes for a almost a full year  
before and actual PPI release was pushed to CPAN.  Some of those bugs  
caused significant false negatives in Perl::Critic's code checks.   
The test was right, but the code was wrong which sure seems like a  
perfect definition of a TODO test.  I personally was developing  
against the SVN PPI and adding TODOs and workarounds to support the  
older version.


Since PPI 1.200 was released with all of the bug fixes and we  
released a Perl::Critic that requires 1.200, I fully agree that we  
*should* have removed the TODO flag on those tests.  But there's  
always other work to be done...  Patches welcome.  :-)


The problem with skipped tests is that they're easier for developers  
to ignore than TODO tests.  And you have to now worry about three  
things instead of two: is the code right, is the test right, and is  
the skip conditional right.  I've gotten the latter wrong before and  
lean toward TODO tests as a result.


Chris



Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Chris Dolan

On Dec 2, 2007, at 1:34 PM, nadim khemir wrote:

Because a TODO means that it is not done not:  it might happend to  
be done but

I'm not really sure, maybe I get lucky.

Either one removes the TODO and all is fine. Or it might just be a  
side effect

that you haven't planned that makes the test pass. Calling "unexpected
things" "features" does not make me feel more sure about quality.

This is exactely why I complained loud about  
Test::Exception::dies_ok. It died
but not for the problem I expected. Other may be better or have  
more luck

than me though.

I really would like to have -Werror for my tests.

Cheers, Nadim.


Then how do you prefer to express functionality that might work or  
might not depending on what other modules the user has installed on  
his/her machine?  I think skip() is inferior to TODO in such cases  
because the latter expresses the author's intention to make this  
feature work universally in the future.  For skip(), you don't know  
whether it expectedly failed or unexpectedly succeeded -- it's just  
skipped.


Chris



Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Michael G Schwern
Fergal Daly wrote:
>> As long as you're releasing a new version, why would you not upgrade your
>> module's dependency to use the version that works?
> 
> Your module either is or isn't usable with version X of Foo.
>
> If it is usable then you would not change your dependency before or
> after the bug in version X is fixed (maybe I have a good reason not to
> upgrade Foo and you wouldn't want your module to refuse to install if
> it is actually usable).
> 
> If it isn't usable then marking your tests as TODO was the wrong thing
> to do in the first place, you should have bailed out due to
> incompatibility with version X and not bothered to run any tests at
> all. I think Extutils::MM does not have any way to specify complex
> version dependencies but with Module::Build you could say

ETOOBINARY

Modules do not have a binary state of working or not working.  They're
composed of piles of (often too many) features.  Code can be shippable without
every single thing working.

The TODO test is useful when the working version *does not yet* exist.  If
it's a minor feature or bug then rather than hold up the whole release waiting
for someone else to fix their shit, you can mark it TODO and release.  This is
the author's decision to go ahead and release with a known bug.  We do it all
the time, just not necessarily with a formal TODO test.


> I am basically against the practice of using TODO to cope with
> external breakage. Not taking unexpected passes seriously encourages
> this practice. Apart from there being other ways to handle external
> breakage that seem easier, using TODO is actually dangerous as it can
> cause false passes in 2 ways. Says version X of Foo has a non-serious
> bug so you release version Y of Bar with some tests marked TODO. The
> we risk

Maybe we're arguing two different situations.  Yours seems to be when there is
a broken version of a dependency, but a known working version exists.  In this
case, you're right, it's better resolved with a rich dependency system.

My case is when a working version of the dependency does not exist, or the
last working version is so old it's more trouble than it's worth.  In this
case the author decides the bug is not critical, can't be worked around and
doesn't want to wait for fix in the dependency.  The decision is whether or
not to release with a known bug.  After that, wrapping it in a TODO test is
just an alternative to commenting it out.

Compare with the more common alternative for shipping with a known bug which
is to simply not have a test at all.


> 1 Version X+1 of Foo is even worse and will cause Bar to eat your dog.
> Sadly for your dog, the test that might have warned him has been
> marked TODO.

If they release Bar with a known bug against Foo X where your dog's fur is
merely a bit ruffled, then that's ok.  If version X+1 of Foo causes Bar to eat
your dog then why didn't their tests catch that?  Was there not a "dog not
eaten" test?  If not then that's just an incomplete test, the TODO test has
nothing to do with that.

The "dog not eaten" test wouldn't have been part of the TODO test, that part
worked fine when the author released and they'd have gotten the "todo passed"
message and known to move it out of the TODO block.

Or maybe they're just a cat person.

Point is, there's multiple points where good testing practice has to break
down for this situation to occur.  The use of TODO test is orthogonal.


> 2 You're using version X-1 of Foo, everything is sweet, your dog can
> relax. You upgrade to version Y+1 of Bar which has a newly introduced
> dog-eating bug. This bug goes undetected because the tests are marked
> TODO. So long Fido.

That's the author's (poor) decision to release with a known critical dog
eating bug.  The fact that it's in a TODO test is incidental.


> I still have not seen an example of using TODO in this manner that
> isn't better handled in a different way.
>
> As before, I am not advocating changing the current Test::* behaviour
> to fail on unexpected passes as that would just be a mess. It's just
> that whenever this is discussed it ends up with people advocating what
> I consider wrong and dangerous uses of TODO and so I am pointing this
> out again,

Most of the cases above boil down to "author decided to release with a known
critical bug" or "tests didn't check for a possible critical bug".

You're right in that marking something TODO is not an excuse to release with a
known critical bug, but I don't think anyone's arguing that.


-- 
I do have a cause though. It's obscenity. I'm for it.
- Tom Lehrer


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Fergal Daly
If you reply to this, please make sure you reply to the 2 cases
involving the dog, this is my main objection to using TODO tests in
this manner.

On 02/12/2007, Michael G Schwern <[EMAIL PROTECTED]> wrote:
> Fergal Daly wrote:
> > One of the supposed benefits of using TODO is that you will notice
> > when the external module has been fixed. That's reasonable but I don't
> > see a need to inflict the confusion of unexpectedly passing tests on
> > all your users to achieve this.
>
> Maybe we should just change the wording and presentation so we're not
> inflicting so much.
>
> Part of the problem is it screams "OMG!  UNEXPECTEDLY SUCCEEDED!" and the user
> goes "whoa, all caps" and doesn't know what to do.  It's the most screamingest
> part of Test::Harness 2.
>
> Fortunately, Test::Harness 3 toned it down and made it easier to identify 
> them.
>
> Test Summary Report
> ---
> /Users/schwern/tmp/todo.t (Wstat: 0 Tests: 2 Failed: 0)
>   TODO passed:   1-2
>
> TAP::Parser also has a "todo_passed" test summary method so you can
> potentially customize behavior of passing todo tests at your end.
>
>
> I agree with Eric, these tests are extra credit.  "Unexpectedly working
> better" != "failure" except in the most convoluted situations.  Their
> intention is to act as an alternative to commenting out a test which you can't
> fix right now.  An executable TODO list that tells you when you're done, so
> you don't forget.
>
> It should not halt installation, nothing's wrong as far as the user's
> concerned.  However, it does mean "investigate" and it would be nice if this
> information got back to the author.  It would be nice if CPAN::Reporter
> reported passing TODO tests... somehow.
>
>
> > Another downside of using TODO like this is that when the external
> > module is fixed, you have to release a new version of your module with
> > the TODOs removed. These tests will start failing for anyone who
> > upgrades your module but not broken one but in reality nothing has
> > changed for that user,
>
> As long as you're releasing a new version, why would you not upgrade your
> module's dependency to use the version that works?

Your module either is or isn't usable with version X of Foo.

If it is usable then you would not change your dependency before or
after the bug in version X is fixed (maybe I have a good reason not to
upgrade Foo and you wouldn't want your module to refuse to install if
it is actually usable).

If it isn't usable then marking your tests as TODO was the wrong thing
to do in the first place, you should have bailed out due to
incompatibility with version X and not bothered to run any tests at
all. I think Extutils::MM does not have any way to specify complex
version dependencies but with Module::Build you could say

requires => {
  'Foo' => ' < X '
}

and you leave your tests untouched. When Foo X+1 comes out, you just
update your requires to say Foo => '!X' and everybody's happy.


I am basically against the practice of using TODO to cope with
external breakage. Not taking unexpected passes seriously encourages
this practice. Apart from there being other ways to handle external
breakage that seem easier, using TODO is actually dangerous as it can
cause false passes in 2 ways. Says version X of Foo has a non-serious
bug so you release version Y of Bar with some tests marked TODO. The
we risk

1 Version X+1 of Foo is even worse and will cause Bar to eat your dog.
Sadly for your dog, the test that might have warned him has been
marked TODO.

2 You're using version X-1 of Foo, everything is sweet, your dog can
relax. You upgrade to version Y+1 of Bar which has a newly introduced
dog-eating bug. This bug goes undetected because the tests are marked
TODO. So long Fido.

I still have not seen an example of using TODO in this manner that
isn't better handled in a different way.

As before, I am not advocating changing the current Test::* behaviour
to fail on unexpected passes as that would just be a mess. It's just
that whenever this is discussed it ends up with people advocating what
I consider wrong and dangerous uses of TODO and so I am pointing this
out again,

F


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Michael G Schwern
Fergal Daly wrote:
> One of the supposed benefits of using TODO is that you will notice
> when the external module has been fixed. That's reasonable but I don't
> see a need to inflict the confusion of unexpectedly passing tests on
> all your users to achieve this.

Maybe we should just change the wording and presentation so we're not
inflicting so much.

Part of the problem is it screams "OMG!  UNEXPECTEDLY SUCCEEDED!" and the user
goes "whoa, all caps" and doesn't know what to do.  It's the most screamingest
part of Test::Harness 2.

Fortunately, Test::Harness 3 toned it down and made it easier to identify them.

Test Summary Report
---
/Users/schwern/tmp/todo.t (Wstat: 0 Tests: 2 Failed: 0)
  TODO passed:   1-2

TAP::Parser also has a "todo_passed" test summary method so you can
potentially customize behavior of passing todo tests at your end.


I agree with Eric, these tests are extra credit.  "Unexpectedly working
better" != "failure" except in the most convoluted situations.  Their
intention is to act as an alternative to commenting out a test which you can't
fix right now.  An executable TODO list that tells you when you're done, so
you don't forget.

It should not halt installation, nothing's wrong as far as the user's
concerned.  However, it does mean "investigate" and it would be nice if this
information got back to the author.  It would be nice if CPAN::Reporter
reported passing TODO tests... somehow.


> Another downside of using TODO like this is that when the external
> module is fixed, you have to release a new version of your module with
> the TODOs removed. These tests will start failing for anyone who
> upgrades your module but not broken one but in reality nothing has
> changed for that user,

As long as you're releasing a new version, why would you not upgrade your
module's dependency to use the version that works?


-- 
I am somewhat preoccupied telling the laws of physics to shut up and sit down.
-- Vaarsuvius, "Order of the Stick"
   http://www.giantitp.com/comics/oots0107.html


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Fergal Daly
http://www.mail-archive.com/perl-qa@perl.org/msg06865.html

has the previous round on this topic. My memory is hazy but my view
was that people are using TODO in strange ways and making this a
failure would break that. The strange way I remember (and has been
brought up again by Chris Dolan) is related to dealing with external
modules that are broken.

The idea is that you mark a test TODO when it  depends on an external
module whose latest version is broken. To me, this seems far better
handled by checking if the external dep is working correctly or not
and if not, SKIPing the affected tests.

One of the supposed benefits of using TODO is that you will notice
when the external module has been fixed. That's reasonable but I don't
see a need to inflict the confusion of unexpectedly passing tests on
all your users to achieve this. You could do that with a
developer-only test and that test should also be sent to the
maintainer of the broken module.

Another downside of using TODO like this is that when the external
module is fixed, you have to release a new version of your module with
the TODOs removed. These tests will start failing for anyone who
upgrades your module but not broken one but in reality nothing has
changed for that user, the installed modules are still identical but
the tests that were considered "ok to fail" have now morphed into
"must pass".

Again this is avoided by simply skipping the tests if you find the
well-known breakage in the external module.

So I agree with you but a lot of other people don't,

F

On 02/12/2007, nadim khemir <[EMAIL PROTECTED]> wrote:
> The subject says it all. IE:
>
> All tests successful (2 subtests UNEXPECTEDLY SUCCEEDED), 7 tests skipped.
> Passed TODO Stat Wstat TODOs Pass  List of Passed
> ---
> t/20_policies.t   152  578 583
>
> (nice reporting though)
>
> Nadim.
>


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Eric Wilhelm
# from nadim khemir
# on Sunday 02 December 2007 11:34:

>> How is "extra credit" *ever* any sort of failure?
>
>Because a TODO means that it is not done not:  it might happend to be
> done but I'm not really sure, maybe I get lucky.

No, the latter is almost exactly what "todo" means.  More precisely: it 
might pass in some situations/platforms/days-of-week, but addressing 
the case where it fails is an item on the todo list.

  ok 1 # TODO blah blah blah

That looks like success.

You can't treat passing TODO tests as failures.  Treat them as a 
different sort of success, flag them as "unexpected" or "requires 
attention" or "wibble" or whatever, but don't mark them as failures.  
Ever.

Is the problem simply that we have only "PASSED" and "FAILED" to choose 
from as answers?  If so, fix the problem where the problem lies instead 
of trying to unintuitively wedge more information into a boolean.

--Eric
-- 
To succeed in the world, it is not enough to be stupid, you must also be
well-mannered.
--Voltaire
---
http://scratchcomputing.com
---


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread nadim khemir
On Sunday 02 December 2007 18:51, Eric Wilhelm wrote:
> # from Ovid
>
> # on Sunday 02 December 2007 07:47:
> >--- nadim khemir <[EMAIL PROTECTED]> wrote:
> >> The subject says it all. IE:
>
> ...
>
> >> t/20_policies.t   152  578 583
> >
> >It just means "you need to investigate this further".  Personally, I
> >would like to see it optionally mean failure,...
>
> How is "extra credit" *ever* any sort of failure?

Because a TODO means that it is not done not:  it might happend to be done but 
I'm not really sure, maybe I get lucky.

Either one removes the TODO and all is fine. Or it might just be a side effect 
that you haven't planned that makes the test pass. Calling "unexpected 
things" "features" does not make me feel more sure about quality.

This is exactely why I complained loud about Test::Exception::dies_ok. It died 
but not for the problem I expected. Other may be better or have more luck 
than me though.

I really would like to have -Werror for my tests.

Cheers, Nadim.


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Paul Johnson
On Sun, Dec 02, 2007 at 09:51:49AM -0800, Eric Wilhelm wrote:

> How is "extra credit" *ever* any sort of failure?

How is "didn't do what I expected" *ever* any sort of success?

Just playing devil's advocate here really, but experience has taught me
to be rather conservative when it comes to tests and that I'm not clever
enough to predict all the things that can go wrong, so the fewer I let
through the better.

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Chris Dolan

On Dec 2, 2007, at 9:37 AM, nadim khemir wrote:


The subject says it all. IE:

All tests successful (2 subtests UNEXPECTEDLY SUCCEEDED), 7 tests  
skipped.

Passed TODO Stat Wstat TODOs Pass  List of Passed
-- 
-

t/20_policies.t   152  578 583

(nice reporting though)

Nadim.


No.  In this case, it means that we coded against PPI v1.118 which  
had bugs which were fixed in PPI v1.200. So, the passing TODO tests  
mean that Perl::Critic is behaving correctly for the more modern PPI  
which you presumably have installed.


In this fashion, we use TODO tests to track when bugs in our  
dependent packages get fixed, and then when we can remove workarounds  
therefore.


Chris



Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Eric Wilhelm
# from Ovid
# on Sunday 02 December 2007 07:47:

>--- nadim khemir <[EMAIL PROTECTED]> wrote:
>> The subject says it all. IE:
...
>> t/20_policies.t   152  578 583
>
>It just means "you need to investigate this further".  Personally, I
>would like to see it optionally mean failure,...

How is "extra credit" *ever* any sort of failure?

It does mean "the tests need to be adjusted", and maybe "the report 
should contain more info than just a boolean 'true'."

But it should definitely not even be an option to report it as a 
failure.

In an automated publishing tool, I would treat it as non-shippable by 
default but have an option to ship it anyway.  For instance:  a tricky 
cross-platform bug might fail on machines that the author doesn't have.

In nightly smoke reports, you might have the "everything's ok" set to 
silent, but would want to send mail when TODO()s pass.

Perhaps that means prove's return code is non-zero, or that the $answer 
ne "Result: PASS\n", but only if those are treated as "not failure".  
YAML?

--Eric
-- 
"Beware of bugs in the above code; I have only proved it correct, not
tried it."
--Donald Knuth
---
http://scratchcomputing.com
---


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread nadim khemir
On Sunday 02 December 2007 16:47, Ovid wrote:
> --- nadim khemir <[EMAIL PROTECTED]> wrote:
> > The subject says it all. IE:
> >
> > All tests successful (2 subtests UNEXPECTEDLY SUCCEEDED), 7 tests
> > skipped.
> > Passed TODO Stat Wstat TODOs Pass  List of Passed
>
> ---
>
> > t/20_policies.t   152  578 583
>
> It just means "you need to investigate this further".  Personally, I
> would like to see it optionally mean failure, but you'd break a huge
> number of modules on the CPAN which have passing TODO tests if you made
> that mandatory.  Breaking the toolchain is bad.
>
> Cheers,
> Ovid

Not when the tool chain is bad ;)

But fine by me, if there was an option that I could use to force that kind of 
behavior on my modules, I'd be happy enough.

Nadim.

PS: I'll also report the passing tests to the module author.


Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

2007-12-02 Thread Ovid
--- nadim khemir <[EMAIL PROTECTED]> wrote:

> The subject says it all. IE:
> 
> All tests successful (2 subtests UNEXPECTEDLY SUCCEEDED), 7 tests
> skipped.
> Passed TODO Stat Wstat TODOs Pass  List of Passed
>
---
> t/20_policies.t   152  578 583

It just means "you need to investigate this further".  Personally, I
would like to see it optionally mean failure, but you'd break a huge
number of modules on the CPAN which have passing TODO tests if you made
that mandatory.  Breaking the toolchain is bad.

Cheers,
Ovid

--
Buy the book  - http://www.oreilly.com/catalog/perlhks/
Perl and CGI  - http://users.easystreet.com/ovid/cgi_course/
Personal blog - http://publius-ovidius.livejournal.com/
Tech blog - http://use.perl.org/~Ovid/journal/