Those who don't follow TAP::Parser closely might now know about the issue that's currently being struggled with. It has to do with how TAP::Parser guarantees that STDOUT and STDERR will be in sync, something Test::Harness does not guarantee.
Test::Harness only captures STDOUT from a test script. It leaves STDERR alone and thus it normally goes to the screen. This can cause subtle problems where STDOUT and STDERR get out of sync. STDOUT can be slightly delayed by TH's having to read it, process it and then spit it back out while STDERR is displaying to the screen as fast as the test script can print it. So sometimes when you use verbose test output warnings can appear to come before the test which caused it. This is a pretty common problem with anything that tries to capture and then stitch back together STDOUT and STDERR. Ovid said he had many requests and people begging for him to fix this in TAP::Parser. So he did. TAP::Parser solves this by piping STDOUT and STDERR together into one stream. Since its reading from one stream everything comes in order and thus can be spit out in order. Seems simple and everyone is happy, right? There's a price for this. And I want to make folks aware of this price. Ovid is insistent that syncing the streams is very important to TAP::Parser users, but maybe once they know what they're paying for it folks won't be quite so insistent and we can remove it. First thing is breaks, and probably most important: No warnings. Everything going to STDERR is eaten by TAP::Parser. It cannot know the difference between lines going to STDOUT and STDERR so it cannot tell a warning from some junk printed to STDOUT. $ cat ~/tmp/foo.t #!/usr/bin/perl -w print "1..1\n"; $foo = 1 + undef; print "ok 1\n"; print "This is some diagnostic junk that shouldn't be shown.\n"; $ prove ~/tmp/foo.t /Users/schwern/tmp/foo....Name "main::foo" used only once: possible typo at /Users/schwern/tmp/foo.t line 4. Use of uninitialized value in addition (+) at /Users/schwern/tmp/foo.t line 4. /Users/schwern/tmp/foo....ok All tests successful. Files=1, Tests=1, 0 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU) $ runtests ~/tmp/foo.t /Users/schwern/tmp/foo......ok All tests successful. Files=1, Tests=1, 0 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU) This is really bad, its a severe loss of test information and cripples the ability to debug tests. One alternative is to display all junk lines. This will make passing tests noisy. Noise when there is no problem is bad, for the same reason that "expected" warnings are bad. It desensitizes the user and they won't pay attention to them when there's a real failure. Its an important principle of a good test suite. Number two, it displays either *all* diagnostics, meaning diagnostics printed to STDOUT and failure diagnostics printed to STDERR or it displays *no* diagnostics $ cat ~/tmp/foo.t #!/usr/bin/perl -w $| = 1; print "1..1\n"; print "# This is some info about the next test which should not be shown\n"; print "not ok 1\n"; print STDERR "# This is a failure diagnostic about the test which failed.\n"; $ prove ~/tmp/foo.t /Users/schwern/tmp/foo....# This is a failure diagnostic about the test which failed. /Users/schwern/tmp/foo....FAILED test 1 Failed 1/1 tests, 0.00% okay Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- /Users/schwern/tmp/foo.t 1 1 1 Failed 1/1 test scripts. 1/1 subtests failed. Files=1, Tests=1, 0 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU) Failed 1/1 test programs. 1/1 subtests failed. $ runtests ~/tmp/foo.t /Users/schwern/tmp/foo......# This is some info about the next test which should not be shown /Users/schwern/tmp/foo......1/1 # This is a failure diagnostic about the test which failed. /Users/schwern/tmp/foo...... Failed 1/1 subtests Test Summary Report ------------------- /Users/schwern/tmp/foo.t (Wstat: 0 Tests: 1 Failed: 1) Failed tests: 1 Files=1, Tests=1, 0 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU) $ runtests -q ~/tmp/foo.t /Users/schwern/tmp/foo...... Failed 1/1 subtests Test Summary Report ------------------- /Users/schwern/tmp/foo.t (Wstat: 0 Tests: 1 Failed: 1) Failed tests: 1 Files=1, Tests=1, 0 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU) Again, your choice is either noisy passing tests or no failure information. There is no option to only display the failure diagnostic like prove does. Losing failure diagnostics is worse, you can't debug the test output. The third problem is how TODO tests are handled. TAP::Parser has a heuristic to guess when a set of failure diagnostics are associated with a TODO test so it doesn't display them. The heuristic is simply "don't display anything between the TODO failure and the next test". This mean you can lose important diagnostic information. $ cat ~/tmp/foo.t #!/usr/bin/perl -w $| = 1; print "1..1\n"; print "not ok 1 # TODO\n"; print "# This is a failure diagnostic about the test which failed.\n"; print "# They should not be shown because its TODO.\n"; print STDERR "# This is about the next failure. It should be shown.\n"; print "not ok 2\n"; $ prove ~/tmp/foo.t /Users/schwern/tmp/foo....# This is about the next failure. It should be shown. /Users/schwern/tmp/foo....FAILED test 2 Failed 1/1 tests, 0.00% okay Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- /Users/schwern/tmp/foo.t 1 2 2 Failed 1/1 test scripts. 0/1 subtests failed. Files=1, Tests=1, 0 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU) Failed 1/1 test programs. 0/1 subtests failed. $ runtests ~/tmp/foo.t /Users/schwern/tmp/foo...... All 1 subtests passed Test Summary Report ------------------- /Users/schwern/tmp/foo.t (Wstat: 0 Tests: 2 Failed: 1) Failed tests: 2 Parse errors: Bad plan. You planned 1 tests but ran 2. Files=1, Tests=2, 0 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU) There are various additional heuristics proposed to patch up these things, but they'll always be heuristcs and thus will always have uncertainty. Uncertainty and heuristics should be ruthlessly removed from a testing system. Another suggestion is that the user should run with -q normally and rerun a failing test with -v to get all their information. This is a poor option for several reasons. First, a passing test might have a boatload of warnings which you'll never see and thus won't know to rerun the test. Second, heisenbugs can cause a test to fail one moment and pass another. Thus the second run will not provide the same information. Third, rerunning a test is often slow either because the test itself is slow or because there's a long delay in having it rerun. Consider a user who emails you a bug report. You have to mail them back and ask them to rerun the test and then wait for their replay, an awfully long lag time. It might be impossible to rerun the test as the results were archived somewhere, for example CPAN Testers, and the same environment is no longer available. Finally, a good parser should shield the user from having to manually eyeball raw test output. What it boils down to is this: Displaying STDOUT and STDERR a little out of sync isn't good and should be fixed, but it doesn't lose us important information. Everything that should be displayed is displayed, perhaps slightly out of order. The consequences of fixing it *does* lose important information. It introduces heuristics and uncertainty to decide what should and should not be displayed. You can no longer be sure that your harness is telling you everything, maybe it swallowed something. Uncertainty and heuristics are bad in a testing system. So to sync up STDOUT and STDERR you pay a price. You fix one thing and break three others. This is not a way forward. The simplest solution, and one which unblocks Test::Harness 3.0 because TAP::Parser breaks compatibility in its current state, is to remove the syncing feature.