On 5 Aug 2003 05:53:15 -0700, [EMAIL PROTECTED] (Louis T) wrote: [ snip, previous]
> I have more than 3000 bug reports. Understanding them all is as > complex as the system is. This is why I want to sample it. Let's say > that I will pick one report every 10 reports. This number is not fixed > yet. My problem is here. I still have great doubts about whether a statistical approach is useful. It is going to depend, VERY strongly, on whether the 3000 errors can be treated as "independent" of each other. - They would be pretty much independent of each other if they arose from 3000 programs written by 3000 different people; all in the same computer language, or each in a different computer language; ... and maybe a few other conditions would occur to me if I saw some numbers. Or, they could be pretty much independent if all 3000 arose from the same computer program -- that is the other way to aim for relative independence. Again, you might want to figure on all-from-one-programmer, or 3000 programmers, if you want to assert independence. > > What I have read about the chi square seems quite interesting. I would > like to say that the conclusion deducted from my sample is : > "I have a 90% probability that the real proportion of structural bug > (category 4) is inside the interval 10% to 20%." This interval will be > given by my sampling. > I hope that my knowledge of english does not make this sentence to > foggy (???). > Okay. I think you are asking about describing a small fraction, and putting a Confidence Limit around it. That part of the problem, the numeric part, is not too hard. For a small proportion, we can consider the "counts" to be numbers that are distributed as Poisson: And in that case, the square root of the count is very close to being "normal" with standard error of 1/2. Next step: the usual 95% CI is built by taking the mean, +/- twice the SE -- which would thus be +/- 1.0 added to the square root of the count. Example: If the Poisson count (something under 20% of what was sampled) is 25, then the 95% CI on counts is (16, 36) since that is the range implied by 5.0 +/- 1.0 . You write the CI most readily on 'counts' but it translates directly to fractions. It works as 25 out of 500, or out of 5000, or whatever. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html "Taxes are the price we pay for civilization." Justice Holmes. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
