>>>>> On Mon, 8 Sep 2008 16:36:00 -0700, Eric Wilhelm <[EMAIL PROTECTED]> said:

  > # from Andreas J. Koenig
  > # on Monday 08 September 2008 15:16:

 >> Since yesterday I have downloaded and analysed ~56000 testreports from
 >> cpantesters and found ~135 distros that have been tested by both MB
 >> 0.2808 and 0.2808_03. There is only one result (Test-Group-0.12) that
 >> looks bad but it turns out to be due to broken Test::More 0.81_01. All
 >> others suggest that _03 is doing well.

  > Umm... okay.

  > 1.  I see a lot of m/0.2808_03 +FAIL/ in there.

OK, I walk you through them. First off, there are ten cases in the
file I sent you.

B-Generate-1.13                        0.2808             FAIL           5
B-Generate-1.13                        0.2808             PASS           6
B-Generate-1.13                        0.2808_03          FAIL           1

  So the above is a case where it's impossible to judge without
  looking at the report but at the same time we cannot have any
  expectations about a single event when the previous outcome was
  diverse. Let's call it a case DUNNO.

CGI-Application-Plugin-ValidateRM-2.2  0.2808             FAIL          18
CGI-Application-Plugin-ValidateRM-2.2  0.2808_03          FAIL           2

  Seems like the exact right behaviour. Let's call it a case OK.

Devel-LeakTrace-0.05                   0.2808             FAIL          43
Devel-LeakTrace-0.05                   0.2808             PASS           6
Devel-LeakTrace-0.05                   0.2808_03          FAIL           1

  It's a DUNNO but likelihood is high that we need not look closer on
  this one.

HTTP-Proxy-0.23                        0.2808             FAIL           8
HTTP-Proxy-0.23                        0.2808             PASS           5
HTTP-Proxy-0.23                        0.2808_03          FAIL           6
HTTP-Proxy-0.23                        0.2808_03          PASS           1

  Although it's a DUNNO, the distribution between fail and pass is
  quite good.

Math-BaseCalc-1.012                    0.2808             FAIL           9
Math-BaseCalc-1.012                    0.2808             PASS           9
Math-BaseCalc-1.012                    0.2808_03          FAIL           1

  A DUNNO.

Metaweb-0.05                           0.2808             FAIL          14
Metaweb-0.05                           0.2808             PASS          10
Metaweb-0.05                           0.2808_03          FAIL           1

  DUNNO

Parse-BACKPAN-Packages-0.33            0.2808             FAIL          18
Parse-BACKPAN-Packages-0.33            0.2808             PASS           8
Parse-BACKPAN-Packages-0.33            0.2808_03          FAIL           1

  DUNNO

Template-Plugin-Class-0.13             0.2808             FAIL           6
Template-Plugin-Class-0.13             0.2808             PASS          55
Template-Plugin-Class-0.13             0.2808_03          FAIL           1

  DUNNO

Test-Group-0.12                        0.2808             PASS          47
Test-Group-0.12                        0.2808_03          FAIL           1

  A WHOAA THERE, that seems to indicate that something's wrong. But as I
  explained in the previous mail, this is due to Test-Simple dev release.

Test-JSON-0.06                         0.2808             FAIL          15
Test-JSON-0.06                         0.2808             PASS          44
Test-JSON-0.06                         0.2808_03          FAIL           1

  A DUNNO again.


So to sum up, we have found that two of the ten support the view that
_03 is doing fine, one appears to be against but is proved wrong, so
seven remaining are simply DUNNOs that we can ignore because they do
not indicate that we have to doubt.

  > Did you chase-down several of those?

No.

  > Are you saying that having
  > "some fail" on 0.2808 implies that "some fail" on 0.2808_03 means
  > no regression, or did you manage to somehow correlate the
  > 0.2808_03 fails to the same machines sending 0.2808 fails?

As explained above, I used judgement. If somebody with strong
statistics fu can measure the trustworthyness of the data in fovor of
a releasse, please speak up.

  > 2.  Where are these reports coming from?

I have said it, I have (well, CPAN::Testers::ParseReport has)
downloaded 56000 reports from
http://www.nntp.perl.org/group/perl.cpan.testers/

  > Again, the subject of false 
  > fails:  I would hate for testers to be pummelling other authors with 
  > alpha M::B errors while the M::B maintainers are left blissfully 
  > ignorant.

<plug>
Toolchain maintainers will probably want to use ctgetreports which
comes with CPAN::Testers::ParseReport.
</plug>

If dev releases pummel other authors it's a call for better tests. If
your tests are good, then release early, release often and watch the
results on cpantesters. The point of cpantesters for toolchain
modules: they may not only watch their own but all test results where
they might be involved.

-- 
andreas

Reply via email to