Re: sensitivity vs. specificity in software testing

Ralph Corderoy Sat, 08 Apr 2023 10:26:42 -0700

Hi Branden,

> My personal test procedures, I think, adequately do this for man(7);
> every time I'm about to push I render all of our man pages (about 60
> source documents) to text and compare them to my cache of the ones I
> rendered the last time I pushed.


Yes, that's good as a lone developer.  Making it a hurdle for others
could be nice to have.

> I think there is a risk here of confounding macro package and
> formatter problems with output driver problems.  All should be tested,
> but not necessarily together except for inputs designed as integration
> tests.

I think a distinction between us is I'm not talking about designed
inputs.  ‘.DS .bp .DE’ isn't a typical designed input.

> With the benefit of a few years experience, I would claim that our
> defect rate in output drivers is pretty low compared to that in the
> formatter and (particularly) macro packages.

Yes, I'd have thought that likely.  Though fuzzing device drivers would
be fun.  :-)

> That is why the tests I've written have demonstrated an increasing
> bent toward use of "groff -Z" and groff "-a"; these produce
> device-independent output and "an abstract preview of output",
> respectively.

troff's output is device dependent, as I just mentioned in another
thread, but I know what you mean.

    $ sdiff -s -w64 <(troff <<<foo) <(troff -TX100 <<<bar)
    x T ps                        | x T X100
    x res 72000 1 1               | x res 100 1 1
    s10000                        | s10
    V12000                        | V16
    H72000                        | H100
    tfoo                          | cb07a07rh5
    n12000 0                      | n16 0
    V792000                       | V1100
    $

I agree such formats are useful for written tests.

> I reiterate though, that the bugs we tend to encounter are detectable
> before getting to the output driver.

The encountered bugs are, yes.  There are the bugs unseen.

The aim of formatting a corpus to pixels would be to quickly test a
growing set of real-world documents.  It would be cheap to add another
document.  The output of a preprocessor, troff, or a device driver may
change intentionally.  Eyeballing those changes for the corpus would be
tedious and error prone.  The pixels intentionally change less often.
And eyeballing pixels to see the nature of the change tends to be quick
compared to comprehending what a diff at a stage of the pipeline
represents.

So a corpus diffed as pixels serves a different purpose to hand-written
coverage or regression tests.  Just as fuzzing attacks from yet another
angle.

> Several ways to skin this cat.  :)

Yes.  I'd be tempted to have a standard encoding which gives a readable
rendering but compresses two or more blank lines.

    awk '
        !length   { b++; next }
        b == 1    { print "" }
        b > 1     { print "-" b }
                  { b = 0 }
        /^-[0-9]/ { printf "-" }
        1
    ' 

One could also highlight or encode tabs or end-of-line white-space to
make it obvious to the reader and protect it from incorrect change.

-- 
Cheers, Ralph.

Re: sensitivity vs. specificity in software testing

Reply via email to