Herman Rubin wrote:
> 
> In article <8smcpv$41r$[EMAIL PROTECTED]>,
> Choi, Young Sung <[EMAIL PROTECTED]> wrote:
> >I am a statistically poor researcher and have a statistical problem.
> 
> >I have two candidate distributions, A(theta1) and B(theta1, theta2) to model
> >my data.
> >Then how should I determine the best distribution for my data?
> >Suggest me an easy book that explain how to select a distribution when
> >making a probability model and how to test the goodness of the selected
> >distribution over other ones.
> 
> The decision as to what probability models are appropriate
> must come from understanding your subject. not from any
> use of simple distributions from probability or statistics
> textbooks.  Above all, do not use what you know or do not
> know about statistical methods to influence this stage; a
> good statistician might be able to tell you that certain
> assumptions are NOT important, but as a statistician must
> not suggest a model.  However, he may be able to ask you
> the questions which must be answered to produce a good model.

        Herman's advice may be good in "mature" disciplines in which the
processes introducing randomness are truly and completely understood. 
Thermodynamics, for instance, or... I'm sure there was another one
somewhere?

        But what if one wants to model (say) rainfall, human heights, or the
number of ticks on a sheep? By the time one has a complete enough
understanding of meteorology, human growth processes, or tick ecology to
come up with an _a_priori_ model that one trusts at least as well as
well as one trusts the data, one doesn't really need to do statistics
any more, just probability theory. (As in thermodynamics...)

        Suppose somebody *does* come up with a theoretical arguments that shows
that (say) birth weights ought to be normally distributed. And suppose
the data disagree? What should one do? It would seem as if Herman's
advice would lead one to say either "Then so much the worse for the
data", or "That is what comes of trying to do statistics when one is not
yet infallible", or at most "As our theoretical model does not fit the
data, we cannot proceed and will go out to the pub instead." 

        I would argue that in _most_ areas where statistics is needed, there
are not theories capable of justifying a certain model _a_priori_ and
there will never be. (There may be theories capable of justifying an
approximate model, but as argued above such a model must still be tested
to see if it works!)  Thus, in reality, the "understanding of your
subject" will reduce to using the distribution that your colleagues used
last year. And why did _they_ use it? Eventually, either because it fit
some related data set or for some worse reason.

        I would certainly agree that one must not choose models in the teeth of
the data _because_ they are simple, and one must not accept models
merely because one has a small and toothless data set that has not got
the power to defend itself against baseless allegations.   However, if
one has a large enough data set that one can say that any model that
fits it must be very _close_ to a certain simple model, I do not see the
harm (and I do see the utility) of using that simple model.

        With small data sets, unless one has a model justified by a larger and
closely related data set, nonparametric or robust techniques are safer.
For very small data sets, in many cases, you cannot proceed and should
go off to the pub...

        -Robert Dawson


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to