You may be able to assume a normal approximation to the binomial distribution and perform a one-sided test for proportions:
H0:p1=p2 vs HA:p1>p2 where p1 = proportion of errors in Year 1 p2 = proportion of errors in Year 2 As a rule, the normal approximation to the binomial only applies where np>10 AND n(1-p)>10. This assumption may fail in your case for individual hospitals where the numbers are low/zero. However, you could test for all hospitals combined between Year 1 and Year 2. This at least will give you an idea whether the overall error rate decreased. I found these resources on the Internet that may help: http://www.tufts.edu/~gdallal/p.htm Chapter 7.3.3 of http://www.itl.nist.gov/div898/handbook/index.htm cre wrote: > I've been out of the stats game since my u/grad days but I think I have come > across the need for doing some inferential stats. I'll explain my logic and > see what you think. > > I analyse admissions data for a few hundred hospitals. Last year (Y2) I ran > their data through a piece of software that picked up problems in the > diagnosis codes. When I found problems I sent the data back to the hospitals > for their assessment and they had the option of correcting. The rate of > error was calculated as the number of records that generated a particular > error over the number that potentially could have generated the error. The > previous year (Y1), their data was not run through the software. I want to > show that the application of the treatment (sending the errors back) in Y2 > resulted in significantly fewer errors than in Y1. > Issue 1: > The number of records that could potentially generate the errors (denom) at > each hospital is not constant from one year to the next. For example: > > Hosp Y1 Denom Y1 Num Y2 Denom Y2 Num > A 30017 198 31098 56 > B 378 3 420 > 0 > . > . > . > > > To counter this I used the logic that if a denominator changed by > 31098/30017, then with no treatment applied we should expect the numerator > to change by the same amount. So I ended up with an actual Y2 Numerator and > an expected Y2 Numerator. I wanted to test whether expected was > significantly different to actual. > > Issue 2: The number of records hospitals will report is not normally > (Gaussian) distributed (kurtosis 4.7, skew 2.3 for expected numerator). We > have a lot of very small hospitals (few records), some medium sized and some > extremely large hospitals. When I take the log of the denominators and > numerators, the data fits a linear curve very nicely, so transformation > seems to be in order (rsquared .85 (denom) to .95 (numerator)). BUT a number > of the hospitals will have a zero value in either the expected numerator, or > the actual numerator (or both). You can't take the log of zero. Zero, > however, is very valuable information (the hospital had no errors in the > diagnosis codes in question), and excluding the zero values would compromise > the measurement of improvement. Is it reasonable to add a constant (like 1) > to every value in the actual and expected numerators to get around this? > > To me, a simple T-test on the transformed data would either support or > discredit my hypothesis that sending the errors back to the hospitals and > giving them an opportunity to fix the data contributed to a statistically > significantly smaller number of errors reported at the end of the second > year. BUT, I'm not sure about getting around the log of zero. > > Regards > > GPO > > > > . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
