On 10/03/2016 6:50 PM, Texler, Michael wrote:
I've not seen them described that way in the road safety literature that I'm
familiar with. How would that work? If the number of accidents is on the Y
axis, what variable would the X axis have? If we go with road accidents (my
field of expertise) it can't be age/driving experience, because the accident
stats in NO way form a poisson distribution when age/experience is your X-axis
variable. (Actually, road prangs by age/experience gives you more of a U-shaped
curve.) Also, rate of accidents (be they road prangs or glider prangs) aren't
constant over time (as required for a poisson distribution to be your
distribution of
choice) - they vary by time of day, for fairly obvious reasons, as well as
other things (day of the week, long weekends, etc etc).
You appear to be approaching the issue from a rather different statistical
approach to the ones I'm familiar with. Could you spell out your
approach/methods in more detail? It's always interesting to hear how folk in
other fields approach problems I'm familiar with. :-)
I am approaching it as counting events occurring over a duration of time
(analogous to say counting disintegrations per second for radioactive decay).
Y axis would be the accident rate with any metric that you care to choose (i.e.
accidents per 1,000 hours flown, accidents per 100km travelled, accidents per
1,000 flights etc.).
Y axis would be a duration of time, i.e over one year, over 10 years, over 100
years.
Then it is a case of using the appropriate test to compare the two groups (null
hypothesis being that the accident rate between two groups is the same).
I'm afraid I'm still not with you. *Which* two groups, exactly?
Displaying all recorded traffic accidents over time in that way will (if
you use Australian data) give you a single line that (depending on the
period covered, but lets go with "the last 20 years") trends downward
over time. Who are you comparing again whom, in your example?
A fairly blunt measure granted.
Given your experience with road accidents analysis, how would you approach it?
Well, it would depend on exactly which question was being asked. If we
were interested in the numbers of accidents had by drivers of different
ages, my previous example (up in the first para quoted above) was a
simple descriptive graph showing difference in number of accidents by
age, for a set amount of time (a year, say). Or we could do it another
way, and have a graph with dates along the X axis, and separate lines
(one for each age group, maybe 16-25, 26-35 and so on) showing how
accident numbers have changed over time for each age group, if we were
interested in seeing if there were any obvious differences in crash
rates over time by age group.
Or, if the question whether a particular time of day is more crash-prone
than other times, we could graph all the accidents occurring in the last
year with the X axis showing hours of the day (midnight-0200, 0201-0400,
etc). Or whatever. All this is pretty basic stuff. We could go on from
there, and report means and standard deviations for age groups/time
periods/whatever of interest, and see if anything leaps out in terms of
obvious differences or trends. But that still isn't going to get you
anything you might want to discuss using null hypotheses or p values ...
for that you really do need actual *inferential* statistical tests, with
specific groups that you are comparing. And this broad-brush descriptive
approach isn't going to give you that. You need to narrow it down a bit.
So: lets come back to the original topic that started all this - glider
accidents. How would I approach that?
Well, first would be deciding exactly what question I want an answer to.
Do I want to know if the glider prang rate is increasing or decreasing
over time? Or do I want to know whether more crashes are happening in
comps than in cross-country gliding? Or how the glider crash rate as a
whole compares with the number of motorcycle crashes for a given period?
Lets go with the last one, since we were also discussing that earlier.
Firstly, getting a good source of data for *both* of those elements in
the comparison is tricky. So I'm gonna handwave past that and assume
that we have good quality data on both of these, including exposure data
(i.e. how much time was spent per pilot/cyclist actually flying/cycling
during that time period), because exposure is critical for topics like
this: it means absolutely nothing to say that there were 12 glider
prangs and 355 bike prangs in a given period, if we don't *also* know
that there were a lot more cyclists on the road, driving for a lot more
overall hours, than there were glider pilots in the air during the same
period.
OK. So now I hypothetically have ten years' worth of crash rates per
hour of flying or riding for the respective groups, and I want to
compare them. This is where the inferential statistics come in. There
will be differences between any two groups that are simply random
chance, but the real trick is identifying *actual* differences through
the "noise" of random variation. We want to perform a simple comparison
of the two groups, to see if they basically have the same means and
variances - i.e. is it reasonable to assume they're both samples from
one overall population? (Yes, I know they're probably not in real life,
but that's how the statistical tests work.) In this example I'd probably
go for a t-test for independent samples (since we're assuming that the
bikers and the pilots are, by and large, different people). And what
that would give me would be a probability value which, as you pointed
out earlier, is basically the probability that the difference between
the groups is due to random chance, as opposed to being a real
difference. So if we get a p value of .05 from my t-test, that tells us
that there is a 5% chance that this result is a random fluke, and a 95%
chance that it's a real difference between our bikers and glider pilots.
Lets mix it up a bit. What if we want to add other factors into the
model to see if that makes any difference... age, say. Are the patterns
of accidents for pilots and bikers of different ages similar? Does it
matter what the age of the vehicle they're flying/riding is? For those
I'd probably run a regression or analysis of variance of some kind on
the data, with the exact type dependent on the exact nature of the
additional factor(s) I'm plugging into the model. Or lets say I come
across a group of bikers who also fly gliders. That's extra-useful,
because, as the *same* individuals doing both activities, we can get a
*lot* more statistical power out of whatever model we choose.
Repeated-measures analysis of variance may well be my tool of choice for
that sort of analysis. Or maybe even a mixed-methods general linear
model (now, *those* can get complex enough to lead to tears and tearing
of hair...)
And so on it goes.
Does that help clarify things?
Teal
_______________________________________________
Aus-soaring mailing list
Aus-soaring@lists.base64.com.au
http://lists.base64.com.au/listinfo/aus-soaring