[R] Cross-validation for Linear Discrimitant Analysis

2004-09-15 Thread Yu Shao
Hello:

I am new to R and statistics and I have two questions.

First I need help to interpret the cross-validation result from the R
linear discriminant analysis function lda. I did the following:

lda (group ~ Var1 + Var2, CV=T)

where CV=T tells the lda to do cross-validation. The output of lda are
the posterior probabilities among other things, but I can't find an error
term (like delta returned by cv.glm). My question is how to get such an
error term from the output? Can I just simply calculate the prediction
accuracy using the posterior probabilities from the cross-validation, and
use that to measure the quality of the model?

Another question is more basic: how to determine if a lda model is
significant? (There is no p-value.) Thanks,

Yu Shao

Wadsworth Research Center
Department of Health of New York State
Albany, NY 12208

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cross-validation for Linear Discrimitant Analysis

2004-09-15 Thread Prof Brian Ripley
On Wed, 15 Sep 2004, Yu Shao wrote:

 I am new to R and statistics and I have two questions.

Perhaps then you need to start by explaining why you are using LDA.
Please take a good look at the posting guide.

 First I need help to interpret the cross-validation result from the R
 linear discriminant analysis function lda. 

You mean Professor Ripley's function lda in package MASS, I guess.

 I did the following:
 
 lda (group ~ Var1 + Var2, CV=T)

R allows you to use meaningful names, so please do so.

 where CV=T tells the lda to do cross-validation. The output of lda are
 the posterior probabilities among other things, but I can't find an error
 term (like delta returned by cv.glm). My question is how to get such an
 error term from the output? Can I just simply calculate the prediction
 accuracy using the posterior probabilities from the cross-validation, and
 use that to measure the quality of the model?

cv.glm as in Dr Canty's package boot?  If you are trying to predict
classifications, LDA is not the right tool, and LOO CV probably is not
either.  There is no unique definition of `error term' (true for cv.glm as
well), and people have written whole books about how to assess
classifiers.  LDA is about `discrimination' not `allocation' in the jargon 
used ca 1960.

 Another question is more basic: how to determine if a lda model is
 significant? (There is no p-value.) Thanks,

Please do read the references on the ?lda page.  It's not a useful
question, as LDA is about discriminating between populations and makes the
unrealistic assumption of multivariate normality.  (Analogously for linear
regression, there are ways to test if that is (statistically)
`significant', but knowledgable users almost never do so.)

Perhaps more realistic advice is to suggest you seek some statistical 
consultancy.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html