[EMAIL PROTECTED] (Tony) wrote in message news:<[EMAIL PROTECTED]>... > Hi, > > One of my collegues is working on a report to provide feedback to a > number of organisation that have provided data to us. One of these > statistics collected was how many weeks did it take to fill a vacancy. > In reporting this statistic back to the organisations my collegue > asked whether she should use the mean or the median. The reply from > her supervisor was "if the results are normally distributed then use > the mean otherwise use the median". I am sure this is sage advice, > but why?
I would not call it sage advice. It depends on what you want to know. If interest centers on average time, then work with that. But beware! - note that if a vacancy remains unfilled you have a problem incorporating it in the average(since you don't know the number of weeks), and potentially an even bigger problem if you omit it (since it will generally be a large number of weeks that you know the number of weeks must exceed - so omitting it makes the average lower than it should be). (This is known as censoring. If the average is the desired quantity you'll need to make some assumption about distributional shape to estimate the average). It is possible that the median may be of greater interest in this case (in which case that's why you use it), and doesn't suffer from the censoring issue (unless the level of censoring is large). > I would have thought that if the distribution was normally > distributed then the mean and the median would be roughly similar > figures since the normal curve has the frequency distributed around > the mean. For any symmetric distribution whose mean exists, the population mean and median are the same. However, the sample median and the sample mean have different efficiencies (a more efficient estimator is able to get a better estimate of a quantity from a given sample). For example, for normal distributions and large samples, the median is about 64% as efficient as the mean (you need about 50% more observations to pin down the population mean as closely). And if the distribution isn't symmetric, the median will generally be biased for the mean - and generally, distributions are not symmetric. > My further thinking about this was that for any > distribution the mean will always be the upper bound of the median. > Is this correct? No. It will (depending on how you measure skewness), happen that for right skew distributions the population mean will be larger than the median, and the sample mean will usually follow suit. Glen . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
