On 10/03/2016 6:50 PM, Texler, Michael wrote:
  I've not seen them described that way in the road safety literature that I'm 
familiar with. How would that work? If the number of accidents is on the Y 
axis, what variable would the X axis have? If we go with road accidents (my 
field of expertise) it can't be age/driving experience, because the accident 
stats in NO way form a poisson distribution  when age/experience is your X-axis 
variable. (Actually, road prangs by age/experience gives you more of a U-shaped 
curve.) Also, rate of accidents (be they road prangs or glider prangs) aren't 
constant over time (as required for a poisson distribution to be your 
distribution of
choice) - they vary by time of day, for fairly obvious reasons, as well as 
other things (day of the week, long weekends, etc etc).

You appear to be approaching the issue from a rather different statistical 
approach to the ones I'm familiar with. Could you spell out your 
approach/methods in more detail? It's always interesting to hear how folk in 
other fields approach problems I'm familiar with. :-)
I am approaching it as counting events occurring over a duration of time 
(analogous to say counting disintegrations per second for radioactive decay).

Y axis would be the accident rate with any metric that you care to choose (i.e. 
accidents per 1,000 hours flown, accidents per 100km travelled, accidents per 
1,000 flights etc.).
Y axis would be a duration of time, i.e over one year, over 10 years, over 100 
years.

Then it is a case of using the appropriate test to compare the two groups (null 
hypothesis being that the accident rate between two groups is the same).

I'm afraid I'm still not with you. *Which* two groups, exactly? Displaying all recorded traffic accidents over time in that way will (if you use Australian data) give you a single line that (depending on the period covered, but lets go with "the last 20 years") trends downward over time. Who are you comparing again whom, in your example?

A fairly blunt measure granted.

Given your experience with road accidents analysis, how would you approach it?

Well, it would depend on exactly which question was being asked. If we were interested in the numbers of accidents had by drivers of different ages, my previous example (up in the first para quoted above) was a simple descriptive graph showing difference in number of accidents by age, for a set amount of time (a year, say). Or we could do it another way, and have a graph with dates along the X axis, and separate lines (one for each age group, maybe 16-25, 26-35 and so on) showing how accident numbers have changed over time for each age group, if we were interested in seeing if there were any obvious differences in crash rates over time by age group.

Or, if the question whether a particular time of day is more crash-prone than other times, we could graph all the accidents occurring in the last year with the X axis showing hours of the day (midnight-0200, 0201-0400, etc). Or whatever. All this is pretty basic stuff. We could go on from there, and report means and standard deviations for age groups/time periods/whatever of interest, and see if anything leaps out in terms of obvious differences or trends. But that still isn't going to get you anything you might want to discuss using null hypotheses or p values ... for that you really do need actual *inferential* statistical tests, with specific groups that you are comparing. And this broad-brush descriptive approach isn't going to give you that. You need to narrow it down a bit.

So: lets come back to the original topic that started all this - glider accidents. How would I approach that?

Well, first would be deciding exactly what question I want an answer to. Do I want to know if the glider prang rate is increasing or decreasing over time? Or do I want to know whether more crashes are happening in comps than in cross-country gliding? Or how the glider crash rate as a whole compares with the number of motorcycle crashes for a given period?

Lets go with the last one, since we were also discussing that earlier. Firstly, getting a good source of data for *both* of those elements in the comparison is tricky. So I'm gonna handwave past that and assume that we have good quality data on both of these, including exposure data (i.e. how much time was spent per pilot/cyclist actually flying/cycling during that time period), because exposure is critical for topics like this: it means absolutely nothing to say that there were 12 glider prangs and 355 bike prangs in a given period, if we don't *also* know that there were a lot more cyclists on the road, driving for a lot more overall hours, than there were glider pilots in the air during the same period.

OK. So now I hypothetically have ten years' worth of crash rates per hour of flying or riding for the respective groups, and I want to compare them. This is where the inferential statistics come in. There will be differences between any two groups that are simply random chance, but the real trick is identifying *actual* differences through the "noise" of random variation. We want to perform a simple comparison of the two groups, to see if they basically have the same means and variances - i.e. is it reasonable to assume they're both samples from one overall population? (Yes, I know they're probably not in real life, but that's how the statistical tests work.) In this example I'd probably go for a t-test for independent samples (since we're assuming that the bikers and the pilots are, by and large, different people). And what that would give me would be a probability value which, as you pointed out earlier, is basically the probability that the difference between the groups is due to random chance, as opposed to being a real difference. So if we get a p value of .05 from my t-test, that tells us that there is a 5% chance that this result is a random fluke, and a 95% chance that it's a real difference between our bikers and glider pilots.

Lets mix it up a bit. What if we want to add other factors into the model to see if that makes any difference... age, say. Are the patterns of accidents for pilots and bikers of different ages similar? Does it matter what the age of the vehicle they're flying/riding is? For those I'd probably run a regression or analysis of variance of some kind on the data, with the exact type dependent on the exact nature of the additional factor(s) I'm plugging into the model. Or lets say I come across a group of bikers who also fly gliders. That's extra-useful, because, as the *same* individuals doing both activities, we can get a *lot* more statistical power out of whatever model we choose. Repeated-measures analysis of variance may well be my tool of choice for that sort of analysis. Or maybe even a mixed-methods general linear model (now, *those* can get complex enough to lead to tears and tearing of hair...)

And so on it goes.

Does that help clarify things?


Teal



_______________________________________________
Aus-soaring mailing list
Aus-soaring@lists.base64.com.au
http://lists.base64.com.au/listinfo/aus-soaring

Reply via email to