Hi All,

Upon more detailed analysis of the data.

"There are lies and damned statistics!"

In summary:

THE MEDIAN FATALITY RATES OF GLIDING AND DRIVING ARE NOT SIGNIFICANTLY
DIFFERENT

THE FATALITY RATE PER 100,000 MEMBERS HAS A SIMILAR DISTRIBUTION IN BOTH
GROUPS

FATALITY RATES PER 100,000 HOURS, PER 100 MILLION KM, PER MILLION TRIPS COME
FROM DIFFERENT DISTRIBUTIONS, WITH GLIDING HAVING FATALITY RATES CLUSTERING
AT LOWER VALUES THAN DRIVING

GLIDING DEATH RATES HAVE A POISSON DISTRIBUTION

(Significance 5% level i.e there is a 5% (1 in 20) chance that these above
statements are incorrect.)


The following e-mail is a bit heavy, read on if interested (off-list
feedback welcome):

Yesterday I gave comparative risks, which seem pretty alarming. However
these rates mean nothing if there is no statistical significance between the
groups.

The supposition made was that the fatality rates for glider pilots and
drivers are normally distributed (in others words that the fatality rates
have a "bell shaped curve" distribution, i.e. that the fatality rates are
centred around an average (mean) value with a measure of dispersion
(standard deviation)).

The fatality rates for car driving are normally distributed (using
Lilliefors test for normality), whereas the gliding fatality rates are not
normally (Gaussian) distributed (the distributions are quite asymmetric,
have a peak at zero with values on the "right side of the curve").

USING PARAMETRIC STATISTICS

Hence any statistical test that relies upon the underlying distributions
being normal (such as the Z-test or T-test) will give an inaccurate result.
Parametric statistic have the better power at detecting differences between
groups, so long as the underlying distributions are Gaussian. Due to their
sensitivity; parametric tests are more influenced by outlying values.
Because there are low numbers of death from gliding per year, any increase
in numbers of deaths per year will have a marked impact upon the death rate
(i.e. if there is 1 gliding death in one year and 2 gliding deaths the
following year, the number of gliding deaths has doubled (100% increase).
Whereas there are ~1800 driving fatalities per year, have one extra death
per year will increase the number of deaths by 0.05%).

Using the Z-test (a classic parametric test, population Sdev known from
data), the results are thus:

1) Fatality rate per 100,000 members (n1=10, n2=10), Gliding=29.0 (popn
Sdev=36.4), driving=15.3 (popn Sdev=1.88), Z-statistic=1.19, one sided
P-value = 0.118, two sided P-value=0.235. HENCE no significant difference in
fatality rates per 100,000 members at the 5% level.

2) Fatality rate per 100,000 hours (n1=10, n2=9), Gliding=1.23 (popn
Sdev=1.53), driving=0.04 (popn Sdev=0.04), Z-statistic=2.46, one sided
P-value = 0.007, two sided P-value=0.014. HENCE there is a significant
difference in fatality rates per 100,000 hours at the 5% level, with gliding
having a significantly higher fatality rate per 100,000 hours than driving.

3) Fatality rate per 100 million km (n1=10, n2=9), Gliding=12.7 (popn
Sdev=15.8), driving=1.12 (popn Sdev=0.11), Z-statistic=2.32, one sided
P-value = 0.010, two sided P-value=0.020. HENCE there is a significant
difference in fatality rates per 100 million km at the 5% level, with
gliding having a significantly higher fatality rate per 100 million km than
driving.

4) Fatality rate per million trips (n1=10, n2=9), Gliding=9.16 (popn
Sdev=11.3), driving=0.05 (popn Sdev=0.01), Z-statistic=2.54, one sided
P-value = 0.005, two sided P-value=0.011. HENCE there is a significant
difference in fatality rates per million trips at the 5% level, with gliding
having a significantly higher fatality rate per million trips than driving.

USING NON PARAMETRIC STATISTICS

Numbers of gliding deaths are not normally distributed (vide infra). Non
parametric tests are less powerful than parametric tests, but more robust
(i.e. outlying data has less of an effect upon the result).

There are 6 years out of the 10 studied in which no gliding fatality
occurred, whereas there were road fatalities for every year studied. Put
another way, for 6 years out of the last ten, the gliding fatality rate
(that is zero deaths!) is less than the road fatality rate (as measured per
member, per km travelled, per hour etc.) In other words, the glider pilots'
median fatality rate is zero by all means of measurement. Does this mean
that the death rates are lower for gliding?

The number of gliding fatalities per year follows a Poisson distribution
(Kolmogorov-Smirnov (K-S) statistic (vide infra) max |D|=0.4, P value>0.99)
with lambda=0.9 (i.e average of 0.9 deaths per year). Using this
distribution, the chance of zero gliding deaths per year is 41%, the chance
of 1 or more gliding deaths per year is 59%, the chance of 2 or more gliding
deaths per year is 23%, the chance of 3 or more gliding deaths per year is
6%, the chance of 4 or more gliding deaths per year is 1% and so on.

The K-S test is used to see if data from different groups comes from the
same underlying distribution (irrespective if the underlying distribution is
Gaussian or not). The Rank-Sum (Mann-Whitney U) test compares the ranks of
on group versus another (it is essentially a comparison of medians).

Using the K-S test and Rank-Sum test, the results are thus:

1) Fatality rate per 100,000 members (n1=10, n2=10), K-S test max|D|=0.6,
p=0.052, min D=-0.6, p=0.015. Rank sum = 95, p>0.05. HENCE the fatality
rates per 100,000 members are from the same underlying distribution
(accepted at the 5% level). Gliding death rates per 100,000 members cluster
at a lower rate than driving. The median death rates per 100,000 members in
both groups are the same.

2) Fatality rate per 100,000 hours (n1=10, n2=9), K-S test max|D|=0.6,
p=0.030, min D=-0.6, p=0.026. Rank sum = 91, p>0.05. HENCE the fatality
rates per 100,000 hours are from a different underlying distribution
(accepted at the 5% level). Gliding death rates per 100,000 hours cluster at
a lower rate than driving. The median death rates per 100,000 hours in both
groups are comparable.

3) Fatality rate per 100 million km (n1=10, n2=9), K-S test max|D|=0.6,
p=0.030, min D=-0.6, p=0.026. Rank sum = 91, p>0.05. HENCE the fatality
rates per 100 million km are from a different underlying distribution
(accepted at the 5% level). Gliding death rates per 100 million km cluster
at a lower rate than driving. The median death rates per 100 million km in
both groups are comparable.

4) Fatality rate per million trips (n1=10, n2=9), K-S test max|D|=0.6,
p=0.030, min D=-0.6, p=0.026. Rank sum = 91, p>0.05. HENCE the fatality
rates per million trips are from a different underlying distribution
(accepted at the 5% level). Gliding death rates per million trips cluster at
a lower rate than driving. The median death rates per million trips in both
groups are comparable.

The above results occur due to the fact that gliding has fatality free
years.

SUMMARY
The low number of deaths per year of glider pilots compared with the high
number of road deaths makes comparison difficult. Hence any changes in the
gliding death rate can lead to large fluctuations in average gliding death
rates when compared to driving. Using parametric statistics leads to the
conclusion that gliding has higher average death rates than gliding (apart
from per 100,000 members), normality (Gaussian nature) of glider death rates
is assumed.

Using non-parametric statistics leads to the conclusion that gliding has
comparable median death rates to driving, and in fact by some measures has a
lower death rate than driving.

Vide supra!

Input from statisticians welcomed, if the raw data is required, please
contact me off list and I can provide it as an Excel spreadsheet.

Cheers,

Michael Texler



Abbreviations:
Vide infra = see below
Vide supra = see above
popn Sdev = population standard deviation


--
  * You are subscribed to the aus-soaring mailing list.
  * To Unsubscribe: send email to [EMAIL PROTECTED]
  * with "unsubscribe aus-soaring" in the body of the message
  * or with "help" in the body of the message for more information.

Reply via email to