# from Andreas J. Koenig # on Tuesday 09 September 2008 00:09: >OK, I walk you through them. First off, there are ten cases in the >file I sent you. >... > So the above is a case where it's impossible to judge without > looking at the report but at the same time we cannot have any > expectations about a single event when the previous outcome was > diverse. Let's call it a case DUNNO. > >CGI-Application-Plugin-ValidateRM-2.2 0.2808 FAIL 18 >CGI-Application-Plugin-ValidateRM-2.2 0.2808_03 FAIL 2 > > Seems like the exact right behaviour. Let's call it a case OK. > ... >So to sum up, we have found that two of the ten support the view that >_03 is doing fine, one appears to be against but is proved wrong, so >seven remaining are simply DUNNOs that we can ignore because they do >not indicate that we have to doubt. > > > Did you chase-down several of those? > >No.
The judgement makes sense assuming an even scatter of machine profiles and etc. If it is not actually possible to corelate 0.2808_03 results to 0.2808 results, then I suppose judgement is all we can get. (Being the sort that I am, I would feel a lot better with some form of 1:1 comparisons.) >If somebody with strong statistics fu can measure the trustworthyness >of the data in fovor of a releasse, please speak up. As something to consider for cpantesters 2.0, the ability to analyze this sort of thing without statistics would be very useful. > > 2. Where are these reports coming from? > >I have said it, I have (well, CPAN::Testers::ParseReport has) >downloaded 56000 reports from >http://www.nntp.perl.org/group/perl.cpan.testers/ No, I meant what *testers*. How are the alpha versions getting installed? Is it manually, via some option in the automated smoke tools, or what? I have been under the impression that alpha dependencies never got installed automatically. >If dev releases pummel other authors it's a call for better tests. If >your tests are good, then release early, release often and watch the >results on cpantesters. The point of cpantesters for toolchain >modules: they may not only watch their own but all test results where >they might be involved. How does this process work? If I release an alpha of M::B with a bug, how long before that irritates, distracts, and confuses a bunch of other authors? Meanwhile, I have to watch test results for 2000+ other dists to find it? What is triggering the testing of dists that use M::B though? Is it only a newly uploaded dist? Yes, Module::Build needs better tests. It also needs somebody with the time to write them. (If Devel::Cover worked, I imagine it would tell me that the coverage is rather low.) If I had the time, before a release I would run the M::B tests on multiple platforms and perl versions, then for each of those run through a group of installs for dists known to use Module::Build with known results from a previous run. Those results could be pass or fail -- the metric is whether the same dist does exactly the same thing (e.g. build ok and fail test X, etc) with both versions of M::B. That would be what I would consider a controlled test with quantifiable results. Granted, you cannot prove a negative, but the scientific method would give me a lot more confidence than "probably is okay". Thanks, Eric -- I arise in the morning torn between a desire to improve the world and a desire to enjoy the world. This makes it hard to plan the day. --E.B. White --------------------------------------------------- http://scratchcomputing.com ---------------------------------------------------