You may be able to assume a normal approximation to the binomial 
distribution and perform a one-sided test for proportions:

H0:p1=p2 vs HA:p1>p2

where

p1 = proportion of errors in Year 1
p2 = proportion of errors in Year 2

As a rule, the normal approximation to the binomial only applies where 
np>10 AND n(1-p)>10. This assumption may fail in your case for 
individual hospitals where the numbers are low/zero.  However, you could 
test for all hospitals combined between Year 1 and Year 2.  This at 
least will give you an idea whether the overall error rate decreased.

I found these resources on the Internet that may help:

http://www.tufts.edu/~gdallal/p.htm

Chapter 7.3.3 of http://www.itl.nist.gov/div898/handbook/index.htm




cre wrote:
> I've been out of the stats game since my u/grad days but I think I have come
> across the need for doing some inferential stats. I'll explain my logic and
> see what you think.
> 
> I analyse admissions data for a few hundred hospitals. Last year (Y2) I ran
> their data through a piece of software that picked up problems in the
> diagnosis codes. When I found problems I sent the data back to the hospitals
> for their assessment and they had the option of correcting. The rate of
> error was calculated as the number of records that generated a particular
> error over the number that potentially could have generated the error. The
> previous year (Y1), their data was not run through the software. I want to
> show that the application of the treatment (sending the errors back) in Y2
> resulted in significantly fewer errors than in Y1.
> Issue 1:
> The number of records that could potentially generate the errors (denom) at
> each hospital is not constant from one year to the next. For example:
> 
> Hosp    Y1 Denom        Y1 Num        Y2 Denom        Y2 Num
> A            30017               198                31098                56
> B            378                    3                    420
> 0
> .
> .
> .
> 
> 
> To counter this I used the logic that if a denominator changed by
> 31098/30017, then with no treatment applied we should expect the numerator
> to change by the same amount. So I ended up with an actual Y2 Numerator and
> an expected Y2 Numerator. I wanted to test whether expected was
> significantly different to actual.
> 
> Issue 2: The number of records hospitals will report is not normally
> (Gaussian) distributed (kurtosis 4.7, skew 2.3 for expected numerator). We
> have a lot of very small hospitals (few records), some medium sized and some
> extremely large hospitals. When I take the log of the denominators and
> numerators, the data fits a linear curve very nicely, so transformation
> seems to be in order (rsquared .85 (denom) to .95 (numerator)). BUT a number
> of the hospitals will have a zero value in either the expected numerator, or
> the actual numerator (or both). You can't take the log of zero. Zero,
> however, is very valuable information (the hospital had no errors in the
> diagnosis codes in question), and excluding the zero values would compromise
> the measurement of improvement. Is it reasonable to add a constant (like 1)
> to every value in the actual and expected numerators to get around this?
> 
> To me, a simple T-test on the transformed data would either support or
> discredit my hypothesis that sending the errors back to the hospitals and
> giving them an opportunity to fix the data contributed to a statistically
> significantly smaller number of errors reported at the end of the second
> year. BUT, I'm not sure about getting around the log of zero.
> 
> Regards
> 
> GPO
> 
> 
> 
> 

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to