Dear R-help list,

apparently lda from the MASS package can be used in situations with collinear variables. It only produces a warning then but at least it defines a classification rule and produces results.

However, I can't find on the help page how exactly it does this. I have a suspicion (it may look at the hyperplane containing the class means, using some kind of default/trivial within-group covariance matrix) but I'd like to know in detail if possible.

I find particularly puzzling that it produces different results whether I choose CV=TRUE or I run a manual LOO cross-validation.

Constructing an example, I realised that I'm puzzled about CV=TRUE not only in the collinear case. The example is below. Actually it also produces different (though rather similar) results for p=10 (no longer collinear).

See here:

library(MASS)
set.seed(12345)
n <- 50
p <- 200 # or p<- 10
testdata <- matrix(ncol=p,nrow=n)
for (i in 1:p)
  testdata[,i] <- rnorm(n)
class <- as.factor(c(rep(1,25),rep(2,25)))

lda1 <- lda(x=testdata,grouping=class,CV=TRUE)
table1 <- table(lda1$class,class)


y.lda <- rep(NA, n)
for(i in 1:n){
  testset <- testdata[i,,drop=FALSE]
  trainset <- testdata[-i,]
  model.lda <- lda(x=trainset,grouping=class[-i])
  y.lda[i] <- predict(model.lda, testset)$class
}
table2 <-table(y.lda, class)

table1
   class
     1  2
  1 14 16
  2 11  9

table2
     class
y.lda  1  2
    1 15 10
    2 10 15

With p=10:
table1
   class
     1  2
  1 10 11
  2 15 14
table2
     class
y.lda  1  2
    1 10 12
    2 15 13


Any explanation?

Best regards,
Christian


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to