On 20 Mar 2001 15:50:07 -0800, [EMAIL PROTECTED] (Brian M.
Schott) wrote:

BMS:  " Is it necessary to approximately balance the sample size in
the groups of the validation or holdout sample to  develop a good
discriminant function?"

The answer is generally No, but I'm not sure 
what the alternatives are supposed to be.

The 'holdout sample', if there's just one, doesn't 
do much to develop the function; it illustrates it.
Some points - being representative usually matter 
a lot (as opposed to merely being numerous).  
Never throw away free cases that you have in-hand,
just for the sake of achieving balance.


BMS:  " I am a little unclear about the extent to which the prior
probabilities can be used to adjust for sample proportions which do
not represent the population proportions.  I suspect there is a
difference  between predictive and descriptive discriminant analysis
in regard to this question, btw. But I cannot find a textbook that
addresses this question. "

In the usual, ordinary, discriminant function, the 'prior
probabilities'  play absolutely no role in the mathematics 
of the solution.  The DF is, in other terms, a problem in 
canonical correlation; or an eigenvector problem.  There's
no place in the basic problem for those weights to enter in.
[ You might assign weights on the groups, if you look at 
step-wise inclusion of variables -- but that whole prospect is
unappealing.  I have not bothered to see what is implemented. ]

The 'prior probabilities'  are used to in the step that describes the
(predicted) group memberships.   You draw lines in particular places. 
In your terms, you might say, it is a part of the 'descriptive
analysis' only.  However, there is NO  *analysis*  in a sense - 
you just have the description.  

Furthermore:  Using the priors is (often) not well understood.  
Most writers avoid  from saying much, because they haven't 
figured them out, either.

USUALLY, you do not need to (and should not) use priors.  
In the cases where they are used, USUALLY the adjustment 
(away from 50-50) should be less than proportionate to the Ns.
USUALLY, it is fair to draw a cutoff score (almost) arbitrarily.

I don't have an explicit reference.  The textbooks on my shelf
don't say much.  I can suggest looking for texts that mention
cost-benefit, and Decisions.  You might want to read journal
references on those topics, which are from the 1970s in my
texts.  I have not seen the new ones, but I suspect there are
citations for 1995+, to go along with  multiple-category logistic
regression being available in SPSS and SAS.  -- Try google.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to