Andreas J. Koenig wrote:
>>>>>> On Mon, 10 Dec 2007 21:12:51 -0800, Michael G Schwern <[EMAIL 
>>>>>> PROTECTED]> said:
> 
>   > Adam Kennedy posed me a stumper on #toolchain tonight.  In short, having a
>   > test which checks your signature doesn't appear to be an actual deterrent 
> to
>   > tampering.  The man-in-the-middle can just delete the test, or just the
>   > SIGNATURE file since it's not required.  So why ship a signature test?
> 
> Asking the wrong question. None of our testsuites is there to protect
> against spoof or attacks. That's simply not the goal. Same thing for
> 00-signature.t

We would seem to be agreeing.  If the goal of the test suite is not to protect
against spoofing, and if it doesn't accomplish that anyway, why put a
signature check in there?


>   > The only thing I can think of is to ensure the author that the signature
>   > they're about to ship is valid, but that's not something that needs to be 
> shipped.
> 
> Has the world changed over night? Are we now questioning tests instead
> of encouraging them? Do now suddenly authors have to justify their
> testing efforts?
>
> I don't mind if we set up a few rules what tests should and should not
> do, but then this topic needs to be put into perspective.
> 
>   > It appears that a combination of a CHECKSUMS check against another CPAN 
> mirror
>   > and a SIGNATURE check by a utility external to the code being checked is
>   > effective, and that's what the CPAN shell does.  The CHECKSUMS check makes
>   > sure the distribution hasn't been tampered with.  Checking against a CPAN
>   > mirror other than the one you downloaded the distribution from checks 
> that the
>   > mirror has not been compromised.  Checking the SIGNATURE ensures that the
>   > module is from who you think its from.
> 
> Yupp. And testing the signature in a test is better than not testing
> it because a bug in a signature or in crypto software is as alarming
> as a bug in perl or a module.

I believe this to be outside the scope of a given module's tests.  It's not
the responsibility of every CPAN module to make sure that your crypto software
is working.  Or perl.  Or the C compiler.  Or make.  That's the job of the
toolchain modules which more directly use them (CPAN, Module::Signature,
MakeMaker, Module::Build, etc...). [1]

At some point you have to trust that the tools work, you can't test the whole
universe.  You simply don't have the time.

That brings me to the central reason why we've started to examine tests for
removal.  There's a certain cost/benefit ratio to be considered.  What's the
cost of implementing and maintaining a test, what's the benefit and does the
benefit justify the cost?  What's the opportunity cost, could you be doing
something more useful with that time and effort?  Finally, what's the cost in
terms of test suite confidence?  How many false negatives are your users
willing to endure before they lose confidence?

The fixed cost of a test is in writing it.  This includes both writing the
test itself and possibly altering the code being tested to make it testable.
It's a fixed cost because you do it once and then you're done.

The reoccurring costs include diagnosing failures.  The user loses time due to
a halted installation.  They contact the author who has to diagnose the
failure and communicate back the results back to the user.  If the test found
a bug, then the cost has a benefit and it's worthwhile.  But if the test
failed because it's a bad test, or because of something out of the author's
control and/or the user doesn't care about, then there's little or no benefit.

Then there's the cost of confidence.  Tests are only useful if someone pays
attention to them.  A failed test should be a clear indication of an actual
problem.  This is why "expected failures" (and their related "expected
warnings") are so insidious.  False failures erode the mental link between
"test failure" and "bug".  Get enough of them, and it doesn't take much, and
people start to ignore any failure.  This is one of the most dangerous social
problems for a test suite.

A test that results in a lot of false negatives has a high reoccurring cost to
no benefit.

Finally there's the question of opportunity cost.  Instead of writing and
maintaining a faulty test, what else could you have been doing with that time?
 Could you have been doing something with an even higher benefit?  If so, you
should do it instead.


Let's look at the example of Test::More.  The last release has 120 passes and
just 4 failures.
http://cpantesters.perl.org/show/Test-Simple.html#Test-Simple-0.74

What are those four failures?  Three are due to a threading bug in certain
vendor patched versions of perl, one is due to the broken signature test.

Look at the previous gamma release, 0.72.  256 passes, 9 failures.
5 due to the threading bug, 4 from the signature test.

0.71:  73 passes, 2 failures.  1 signature, 1 threads

0.70:  221 passes, 12 failures.  3 signature, 9 threads

And so on.  That's nine months with nothing but false negatives.  The
signature test is not actually indicating a failure in Test::More, so it's of
no benefit to me or the users, and the bug has already been reported to
Module::Signature.

The threading test is indicating a perl bug that's very difficult to detect
[2], only seems to exist in vendor patched perls, I can't do anything about
and is unlikely to effect anyone since there's so few threads users.  It's
already been reported to the various vendors but it'll clear up as soon as
they stop mixing bleadperl patches into 5.8.

In short, I'm paying for somebody else's known bugs.  I get nothing.
Test::More gets nothing.  The tools get nothing.  Cost with no benefit.  So
why am I incurring these costs?  Maybe the individual users find out their
tools are broken, but it's not my job to tell them that.

I've kept in the threading test because the perl bug it's tickling does have a
direct effect on Test::More, and it could indicate future threading issues,
but lacking any way to resolve it I'm tempted to pull it.

The signature test, otoh, does not indicate anything that effects Test::More.
 The ability or inability to check the signature has nothing to do with the
operation of Test::More.  So why am I checking it?  The Test::More test suite
isn't a full service gas station.  It's not going to wash the windows and
check the oil and give you directions.  It makes sure Test::More works and
that's that.


As you can see, this is a considered analysis.  In general, redundant tests
are ok.  They're often not truly redundant but just have a large overlap.  And
this all assumes the tests have some benefit.

And often it's more trouble than it's worth to ferret out test redundancies.
Worse yet is the creeping mental attitude of "should I write this test?  Can I
justify the cost?"  This is like the related attitude of "can I justify
cleaning up this code?"  Unchecked, both lead to paralysis.

But in this case it's a test which has no benefit to the module, [for the
signature test] indicates no failure of module functionality, is reporting on
known bugs in other tools and is a test that should be and is done by other
modules more directly.  Cost and redundancy with no benefit.


[1]  There are exceptions.  For example, if you rely on a specific,
questionable (possibly undocumented) feature you should test that it's still
there to make diagnosing and debugging easier when it inevitably changes out
from under you.

[2]  You have to run the test dozens or hundreds of times to get it to fail on
an effected perl.


-- 
We do what we must because we can.
For the good of all of us,
Except the ones who are dead.
    -- Jonathan Coulton, "Still Alive"

Reply via email to