I've been out of the stats game since my u/grad days but I think I have come
across the need for doing some inferential stats. I'll explain my logic and
see what you think.

I analyse admissions data for a few hundred hospitals. Last year (Y2) I ran
their data through a piece of software that picked up problems in the
diagnosis codes. When I found problems I sent the data back to the hospitals
for their assessment and they had the option of correcting. The rate of
error was calculated as the number of records that generated a particular
error over the number that potentially could have generated the error. The
previous year (Y1), their data was not run through the software. I want to
show that the application of the treatment (sending the errors back) in Y2
resulted in significantly fewer errors than in Y1.
Issue 1:
The number of records that could potentially generate the errors (denom) at
each hospital is not constant from one year to the next. For example:

Hosp    Y1 Denom        Y1 Num        Y2 Denom        Y2 Num
A            30017               198                31098                56
B            378                    3                    420
0
.
.
.


To counter this I used the logic that if a denominator changed by
31098/30017, then with no treatment applied we should expect the numerator
to change by the same amount. So I ended up with an actual Y2 Numerator and
an expected Y2 Numerator. I wanted to test whether expected was
significantly different to actual.

Issue 2: The number of records hospitals will report is not normally
(Gaussian) distributed (kurtosis 4.7, skew 2.3 for expected numerator). We
have a lot of very small hospitals (few records), some medium sized and some
extremely large hospitals. When I take the log of the denominators and
numerators, the data fits a linear curve very nicely, so transformation
seems to be in order (rsquared .85 (denom) to .95 (numerator)). BUT a number
of the hospitals will have a zero value in either the expected numerator, or
the actual numerator (or both). You can't take the log of zero. Zero,
however, is very valuable information (the hospital had no errors in the
diagnosis codes in question), and excluding the zero values would compromise
the measurement of improvement. Is it reasonable to add a constant (like 1)
to every value in the actual and expected numerators to get around this?

To me, a simple T-test on the transformed data would either support or
discredit my hypothesis that sending the errors back to the hospitals and
giving them an opportunity to fix the data contributed to a statistically
significantly smaller number of errors reported at the end of the second
year. BUT, I'm not sure about getting around the log of zero.

Regards

GPO




.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to