hi! this is a question about lda (MASS) in R on a particular dataset. I'm not a specialist about any of this but: First with the well-known "iris" dataset, I tried using lda to discriminate versicolor from the other to classes and I got approx. 70% of accuracy testing on train set. In iris, versicolor stands "between" the 2 other so one can expect lda not to perform well since it cannot cluser the negative instances (seposa+virginica) together (Is this correct?) (KNN=96% in xval.)
Now, I use my "real" dataset (900 instances, 21 attributes), which 2 classes can be serparated with accuracy no more than 80% (10xval) with KNN, SVM, C4.5 and the like. So I was very surprised to see that lda also gets an accuracy of 80% on it, because lda is very simple (finding the best line -- for a 2 classes problem -- and using projections on the line for classification.) So my question is: how does lda (in MASS) use the projections to make the decision? Usually the decision for a test instances is made using means and variances of the 2 classes but there are other possibilites (especially in higher dimensions.) Thanks for any idea, the doc is a bit spares and Venebles&Ripley's book also for this particular matter. Samuel PS: and does anybody know how to use the CV option of lda to make xval? I can't get it. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html