Hi All, Upon more detailed analysis of the data.
"There are lies and damned statistics!" In summary: THE MEDIAN FATALITY RATES OF GLIDING AND DRIVING ARE NOT SIGNIFICANTLY DIFFERENT THE FATALITY RATE PER 100,000 MEMBERS HAS A SIMILAR DISTRIBUTION IN BOTH GROUPS FATALITY RATES PER 100,000 HOURS, PER 100 MILLION KM, PER MILLION TRIPS COME FROM DIFFERENT DISTRIBUTIONS, WITH GLIDING HAVING FATALITY RATES CLUSTERING AT LOWER VALUES THAN DRIVING GLIDING DEATH RATES HAVE A POISSON DISTRIBUTION (Significance 5% level i.e there is a 5% (1 in 20) chance that these above statements are incorrect.) The following e-mail is a bit heavy, read on if interested (off-list feedback welcome): Yesterday I gave comparative risks, which seem pretty alarming. However these rates mean nothing if there is no statistical significance between the groups. The supposition made was that the fatality rates for glider pilots and drivers are normally distributed (in others words that the fatality rates have a "bell shaped curve" distribution, i.e. that the fatality rates are centred around an average (mean) value with a measure of dispersion (standard deviation)). The fatality rates for car driving are normally distributed (using Lilliefors test for normality), whereas the gliding fatality rates are not normally (Gaussian) distributed (the distributions are quite asymmetric, have a peak at zero with values on the "right side of the curve"). USING PARAMETRIC STATISTICS Hence any statistical test that relies upon the underlying distributions being normal (such as the Z-test or T-test) will give an inaccurate result. Parametric statistic have the better power at detecting differences between groups, so long as the underlying distributions are Gaussian. Due to their sensitivity; parametric tests are more influenced by outlying values. Because there are low numbers of death from gliding per year, any increase in numbers of deaths per year will have a marked impact upon the death rate (i.e. if there is 1 gliding death in one year and 2 gliding deaths the following year, the number of gliding deaths has doubled (100% increase). Whereas there are ~1800 driving fatalities per year, have one extra death per year will increase the number of deaths by 0.05%). Using the Z-test (a classic parametric test, population Sdev known from data), the results are thus: 1) Fatality rate per 100,000 members (n1=10, n2=10), Gliding=29.0 (popn Sdev=36.4), driving=15.3 (popn Sdev=1.88), Z-statistic=1.19, one sided P-value = 0.118, two sided P-value=0.235. HENCE no significant difference in fatality rates per 100,000 members at the 5% level. 2) Fatality rate per 100,000 hours (n1=10, n2=9), Gliding=1.23 (popn Sdev=1.53), driving=0.04 (popn Sdev=0.04), Z-statistic=2.46, one sided P-value = 0.007, two sided P-value=0.014. HENCE there is a significant difference in fatality rates per 100,000 hours at the 5% level, with gliding having a significantly higher fatality rate per 100,000 hours than driving. 3) Fatality rate per 100 million km (n1=10, n2=9), Gliding=12.7 (popn Sdev=15.8), driving=1.12 (popn Sdev=0.11), Z-statistic=2.32, one sided P-value = 0.010, two sided P-value=0.020. HENCE there is a significant difference in fatality rates per 100 million km at the 5% level, with gliding having a significantly higher fatality rate per 100 million km than driving. 4) Fatality rate per million trips (n1=10, n2=9), Gliding=9.16 (popn Sdev=11.3), driving=0.05 (popn Sdev=0.01), Z-statistic=2.54, one sided P-value = 0.005, two sided P-value=0.011. HENCE there is a significant difference in fatality rates per million trips at the 5% level, with gliding having a significantly higher fatality rate per million trips than driving. USING NON PARAMETRIC STATISTICS Numbers of gliding deaths are not normally distributed (vide infra). Non parametric tests are less powerful than parametric tests, but more robust (i.e. outlying data has less of an effect upon the result). There are 6 years out of the 10 studied in which no gliding fatality occurred, whereas there were road fatalities for every year studied. Put another way, for 6 years out of the last ten, the gliding fatality rate (that is zero deaths!) is less than the road fatality rate (as measured per member, per km travelled, per hour etc.) In other words, the glider pilots' median fatality rate is zero by all means of measurement. Does this mean that the death rates are lower for gliding? The number of gliding fatalities per year follows a Poisson distribution (Kolmogorov-Smirnov (K-S) statistic (vide infra) max |D|=0.4, P value>0.99) with lambda=0.9 (i.e average of 0.9 deaths per year). Using this distribution, the chance of zero gliding deaths per year is 41%, the chance of 1 or more gliding deaths per year is 59%, the chance of 2 or more gliding deaths per year is 23%, the chance of 3 or more gliding deaths per year is 6%, the chance of 4 or more gliding deaths per year is 1% and so on. The K-S test is used to see if data from different groups comes from the same underlying distribution (irrespective if the underlying distribution is Gaussian or not). The Rank-Sum (Mann-Whitney U) test compares the ranks of on group versus another (it is essentially a comparison of medians). Using the K-S test and Rank-Sum test, the results are thus: 1) Fatality rate per 100,000 members (n1=10, n2=10), K-S test max|D|=0.6, p=0.052, min D=-0.6, p=0.015. Rank sum = 95, p>0.05. HENCE the fatality rates per 100,000 members are from the same underlying distribution (accepted at the 5% level). Gliding death rates per 100,000 members cluster at a lower rate than driving. The median death rates per 100,000 members in both groups are the same. 2) Fatality rate per 100,000 hours (n1=10, n2=9), K-S test max|D|=0.6, p=0.030, min D=-0.6, p=0.026. Rank sum = 91, p>0.05. HENCE the fatality rates per 100,000 hours are from a different underlying distribution (accepted at the 5% level). Gliding death rates per 100,000 hours cluster at a lower rate than driving. The median death rates per 100,000 hours in both groups are comparable. 3) Fatality rate per 100 million km (n1=10, n2=9), K-S test max|D|=0.6, p=0.030, min D=-0.6, p=0.026. Rank sum = 91, p>0.05. HENCE the fatality rates per 100 million km are from a different underlying distribution (accepted at the 5% level). Gliding death rates per 100 million km cluster at a lower rate than driving. The median death rates per 100 million km in both groups are comparable. 4) Fatality rate per million trips (n1=10, n2=9), K-S test max|D|=0.6, p=0.030, min D=-0.6, p=0.026. Rank sum = 91, p>0.05. HENCE the fatality rates per million trips are from a different underlying distribution (accepted at the 5% level). Gliding death rates per million trips cluster at a lower rate than driving. The median death rates per million trips in both groups are comparable. The above results occur due to the fact that gliding has fatality free years. SUMMARY The low number of deaths per year of glider pilots compared with the high number of road deaths makes comparison difficult. Hence any changes in the gliding death rate can lead to large fluctuations in average gliding death rates when compared to driving. Using parametric statistics leads to the conclusion that gliding has higher average death rates than gliding (apart from per 100,000 members), normality (Gaussian nature) of glider death rates is assumed. Using non-parametric statistics leads to the conclusion that gliding has comparable median death rates to driving, and in fact by some measures has a lower death rate than driving. Vide supra! Input from statisticians welcomed, if the raw data is required, please contact me off list and I can provide it as an Excel spreadsheet. Cheers, Michael Texler Abbreviations: Vide infra = see below Vide supra = see above popn Sdev = population standard deviation -- * You are subscribed to the aus-soaring mailing list. * To Unsubscribe: send email to [EMAIL PROTECTED] * with "unsubscribe aus-soaring" in the body of the message * or with "help" in the body of the message for more information.
