On Sep 12, 2006, at 9:24 AM, Salve J Nilsen wrote:
Any metric that catches bad things, particularly bad technical
things, is going to be just fine.
Metrics that try to push "good" behavior are fraught with trouble,
because they start pushing people in odd directions.
Do you have an example on this? (Any pointer would be wonderful.)
I have two: pod.t and pod_coverage.t. These are pointless to run on
an end-user's machine. At best they are redundant to immutable tests
already run on the author's machine and just waste processor cycles.
At worst they fail and cause false negative test reports. The
prevalence of those two tests in CPAN modules is almost entirely due
to the influence of CPANTS.
Despite the criticisms above, the CPANTS POD tests have ultimately
succeeded: they have convinced authors to do a better job of
documenting all methods, or marking private methods as such. I think
nobody can argue that the POD tests, in particular, have had a net
positive effect on the quality of CPAN.
===
Now begins a huge digression on encouraging good behavior vs.
discouraging bad behavior leading to a recommendation for CPANTS.
One flaw in the language of Adam's assertion is that he doesn't
properly distinguish the goals and metrics of CPANTS. Discouraging a
specific bad behavior is just a way encouraging other unspecified
behavior, which could be good or bad. IF FEASIBLE, it's always
better to encourage good behavior. The danger is not metrics that
encourage good behavior, but instead metrics that encourage a
specific good behavior when there are a multitude of equally-valid
good behaviors. In that case, discouraging the bad behaviors is the
best you can do. I believe that's what Adam was trying to say.
I'm going to continue with the specific example of POD Kwalitee. The
CPANTS goal is (obviously) to encourage higher quality
documentation. However, that's a hard thing for a computer to
measure. So, instead we try to discourage specific bad behaviors:
POD syntax errors and undocumented subroutines.
Let me run through an exercise. In the first step, consider how one
would arrive at the need for CPANTS POD tests:
Goal: encourage high-quality CPAN packages
Assertion: high-quality packages have high-quality documentation
Assertion: high-quality documentation is parseable by doc tools
Subgoal: discourage invalid POD
Measure: Is the POD valid for each module in the package?
Assertion: high-quality documentation describes every public
subroutine
Subgoal: discourage undocumented subs
Measure: Does each module in the package have documentation
for every public sub?
The next step in the exercise becomes how to implement those
measures. In the current CPANTS simple proxies are used for those
measures. Namely, we assume that if there is a t/*pod.t file present
then the pod is valid, and if there is a t/*pod_coverage.t present
then all subroutines are documented.
Note that my subgoals are stated as discouraging bad behavior. It's
always easier to test for failures than successes (case in point:
governments usually create laws, not commandments). The CPANTS POD
tests, however, check for good behavior ("Thou shalt add pod.t to thy
package") instead of checking for bad behavior ("Thou shalt not
include invalid POD in thy module").
Wouldn't it be better to just measure the POD validity directly
instead of using a proxy for that measurement? As an outsider, I'll
guess that CPANTS resorts to the proxies for these reasons, in order
of importance:
1) reliability
2) ease of implementation
3) speed of evaluation
Certainly, CPANTS wants to avoid false negatives at all costs. It's
impact on the community is purely voluntary, so it wants to avoid
antagonizing authors. If CPANTS mistakenly says that your module has
incomplete POD coverage when you *know* that you have documented
every method, you're going to be annoyed. Some authors may decline
to participate in CPANTS if they get annoyed enough. So, false
negatives are perilous to the success of the entire project.
I believe the main reason that CPANTS measures t/*pod*.t existence
instead of directly running Test::Pod/Test::Pod::Coverage is that the
latter is harder for the author to judge consistently before he/she
uploads to CPAN. But, with the improved availability of offline
CPANTS analysis (via Module::CPANTS::Analyse), it should be feasible
for authors to get rid of more complex false negatives before
uploading to CPAN.
So, as a technological expedient, CPANTS is encouraging a sub-optimal
good behavior (adding t/*pod*.t to CPAN releases) in the process of
trying to discourage a bad behavior. To fix this, we need to remove
the need for the expedient. That means letting CPANTS perform more
complicate analyses and letting authors test those analyses offline
exactly as they would be performed online on cpants.perl.org.
Thus, I finally get to an action item: CPANTS should encourage
authors to run Module::CPANTS::Analyse offline before uploading to
CPAN. I assert that if we can convince authors to perform more
thorough tests of their packages at author-time, then the quality of
CPAN will improve. And the more closely the metrics match our real
quality goals, the bigger the quality delta we will achieve.
Chris
--
Chris Dolan, Software Developer, Clotho Advanced Media Inc.
608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703
vCard: http://www.chrisdolan.net/ChrisDolan.vcf
Clotho Advanced Media, Inc. - Creators of MediaLandscape Software
(http://www.media-landscape.com/) and partners in the revolutionary
Croquet project (http://www.opencroquet.org/)