On Tue, 16 Mar 2010, Emanuele Olivetti wrote: > dataset coming from the same distribution. In case of SVMs (and some > others) and leave-one-out this estimator is proved to be almost > unbiased [0]. Yarik, you mentioned issues with bias in this case, can > you send me some references on that? I am eagerly collecting > information on the topic. yes -- I did mention bias... and yes there is one, like with any other estimate ;) Indeed in general LOO is known indeed to provide small (and small is generally != 0) bias and often large variance of the estimate (like you pointed out below). But lets look into the 'fine print's for the theorem/propositions you've mentioned:
,--- | 1. The term "almost" refers to the fact that the leave-one-out error | provides an estimate for training on sets of size m-1 rather than | m, cf. Proposition 7.4. `--- So, as any other estimator, LOO does have its bias, which is known to be quite small... and this note points to probably a critical point which becomes important on small data set sizes: it is an estimate for training on m-1 samples... lets keep it mind for now and now proposition: ,--- | Proposition 7.4 The expectation of the number of Support Vectors obtained | during training on a training set of size m, divided by m, is an upper bound | on the expected probability of test error of the SVM trained on training sets | of size m-1. `--- and that is what makes SVMs great -- decision boundary derived on relatively few number of samples out of the whole large population is known to provide good generalization performance. But then what happens if, once again, we have small sample size and large feature space? We do get lots of SVs (in my experience you can easily get all of your samples in SVs even with relatively hard margin, ie high C), theoretical bounds become wider; hence "almost" might loose its meaning of "close to none" because none of the estimates become reliable (variance is high) going back to comment 1., it becomes very important if you have small sample size, possibly balanced and then you assess performance on n-1, where set becomes unbalanced: > In the leave-one-out case you have just one example to compute the > error rate on the test set which can be (as Yarik said) 0% or > 100%. This is a poor estimate of the error rate for that fold, but it > will be used just to compute the final average, which is then OK. and you easily achieve your average at 100% (got myself quite often with SVM when did this evil, ie taking just a single sample out) or 0% (never crafted such, but it is easy to come with such if you already have one giving 100% error ;) ) as well. So here comes your bias of LOO with literally no variance, simply because it is ok with the bounds which are now very wide ;) > is not a problem: in cross-validation the variance depends on the > number/size of the folds so it is quite arbitrary (and of little use > in most of practical cases) since you are free decide this > number/size. well, not exactly in the case with fMRI (read non-independent samples) where you are quite actually restricted on what to take as take-out pieces. Imagine a database of handwritten digits (e.g. MNIST) where exactly the same picture for each digit was present in data set multiple times? what would happen with your LOO bias if you don't take care about keeping "the same" pictures together (in terms of training/testing splits)? and that is imho one of nuisances of applying any generalization estimate / mixed effects analysis to fMRI data. > As for the small sample set case I am reading about it and testing > some code. It seems to lead to high bias under some regimes. here we go.... to get consistent results -- just analyze some nice noise without structure, so SVM would have no chance to learn anything, bound gets wide, and you get your bias. > But it > sounds conceivable to me since small samples can't represent properly > the problem except in very simple cases. Again if you have literature > on this please let me know. I never really investigated this topic deeply enough to share good references I have digested myself and found worth sharing, sorry ;) Just was sharing my own experience (and what others (e.g. advisors, teachers) put into my brain). but this one might become interesting on a touching point (if you don't have this one yet in your toolbelt): http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.1930&rep=rep1&type=pdf An Analysis of the Anti-Learning Phenomenon Of the Class Symmetric Polyhedron > I can share mine if interested. sure! -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] _______________________________________________ Pkg-ExpPsy-PyMVPA mailing list [email protected] http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

