I've been following the correspondence between "saisat" and Rich Ulrich
with some interest.  A few comments are embedded below:

On 20 Feb 2003, saisat wrote (edited):

> I have a "peer" group of distributions with the following data

Should we know what you intend to imply by '"peer" group' in order to
understand the question(s) you want to ask?

    M = mean,  SD = standard deviation,  N = number of data values:
> Distribution 1 :  M1, SD1, N1
> Distribution 2 :  M2, SD2, N2
> Distribution 3 :  M3, SD3, N3
> ...
> Distribution n :  Mn, SDn, Nn
>
> What is the most accurate way to determine the overall mean and the
> standard deviation of the above distributions?

Depends partly on what you want to mean by "accurate":  that is, what
criterion for accuracy interests you.  For the mean, one way is to add
(M1+...+Mn) and divide by n, giving each distribution equal weight.
Another is to weight the Mi by the Ni, giving each individual
observation equal weight ("one man, one vote").  If the number of
distributions (your "n") is fairly large, OR if the various Mi are not
very far apart, there won't be much difference between these.

For the SD, it depends also on what you mean by "overall", and on what
you mean by "of the above distributions".  You could be asking (1) "What
is the average SD of these n distributions?" or (2) "What is the SD of
the distribution I would get if I could combine all the data sets into
one humungous data set and calculate its mean and SD in the usual way?"
or (3) "What is the SD of these n distributions, each considered as a
single entity?" (that is, "What's the SD of the distribution of the n
means M1, M2, ..., Mn?").  These are different questions, with (in
general) different answers.

> Also, I need to remove any distributions that are not similar to the
> rest of the distributions in the "peer" group.

Rich has already pointed out that this is usually a BAD idea, as he
interpreted your question.  But what do you mean by "remove"?   Among
the possibilities are
 (1) "discard, throw away, data never again to be seen by human eyes"
 (2) "set aside for subsequent analysis and comparison to "the rest of
the distributions"".

Rich apparently thought you meant (1).

You have not described WHY you "need to remove...", which might give us
some notion of how to assist you (aside from Rich's recommendation, "Go
talk to a real live statistician", which may indeed be the best advice
you can get).  Nor have you said what you want to mean by "similar" (or,
perhaps more to the point, "not similar");  I presume you are not using
it in the sense of grade-9 Geometry I, for instance?

If you find some of these questions hard to answer, then you will have
begun to understand the point of asking them.  Good luck!

 -----------------------------------------------------------------------
 Donald F. Burrill                                            [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to