fair enough... btw are chunks 5 and may be 1 the ones which gave you head ache?
On Sat, 28 Mar 2009, Per B. Sederberg wrote: > Here's the summary: > In [41]: print dat.summary() > Dataset / float32 2142 x 29155 > uniq: 6 chunks 2 labels > stats: mean=-4.0314e-06 std=0.916641 var=0.84023 min=-1.23648 max=43.8952 > Counts of labels in each chunk: > chunks\labels 0 1 > --- --- > 0 173 184 > 1 168 189 > 2 184 173 > 3 188 169 > 4 173 184 > 5 196 161 > Summary per label across chunks > label mean std min max #chunks > 0 180 9.81 168 196 6 > 1 177 9.81 161 189 6 > Summary per chunk across labels > chunk mean std min max #labels > 0 178 5.5 173 184 2 > 1 178 10.5 168 189 2 > 2 178 5.5 173 184 2 > 3 178 9.5 169 188 2 > 4 178 5.5 173 184 2 > 5 178 17.5 161 196 2 > It does look like there are some outliers in there based on the min > and max above. Note that I have already zscored the data at this > point. > I'm getting some really good classification now with the > regularization parameter in there. At some point I may try looking > for outliers and explicitly removing them and then seeing if I need > the regularization. The good news is that the regularization is > seeming to have no effect on the chunks of the data that didn't need > it before, so I'm not really hurting anything by having it there (at > least for this dataset). > Best, > P > 2009/3/28 Yaroslav Halchenko <[email protected]>: > > Hypothesis: > > I think that one of the possible causes for such behavior is whenever > > your features are not normed to lie approximately in the same range... > > assume that you have one which is 100 and the rest is "proper" ones in > > [-5, 5] (zscores?). Or you have some outliers/excursions from this > > 'normed' case. Then while doing summation/dot-products. That heavy > > feature would demolish the effect of variance in the "proper" features. > > What is the histogram of your features? or just > > print dataset.summary() > > ? > > On Sat, 28 Mar 2009, Per B. Sederberg wrote: > >> Howdy all: > >> I've got it working for my data (had to add what I thought was a > >> pretty hefty regularization of .001 before it stopped giving me the > >> error, but it turns out I get the same classification performance with > >> a value of 2.0). > >> I made it so that regularization is always applied, but that it > >> defaults to 0.0, so the default behavior of the classifier is > >> unchanged. Here's the commit message: > >> -------------------- > >> I'm not sure what to call it, so I put it all. Anyway, I just added > >> the regularization > >> parameter to be used all the time by GPR. It still defaults to 0.0, > >> so there should be no > >> change for people currently using GPR with no issues. However, this > >> enables the manual > >> setting of the regularization if you run into the not positive, > >> definite error. I figured > >> that if you were going to use regularization, you would want to use it > >> all the time and not > >> have different values on different folds of your data. > >> ---------------------- > >> Thanks for all the input today. I now have working GPR classification :) > >> Best, > >> Per > >> On Sat, Mar 28, 2009 at 12:48 PM, Per B. Sederberg <[email protected]> > >> wrote: > >> > On Sat, Mar 28, 2009 at 12:06 PM, Scott Gorlin <[email protected]> wrote: > >> >> you could try 10x the calculated eps :) > >> >> Depending on how long it takes to run your classifier, why don't you try > >> >> varying epsilon logarithmically and plotting the error as a function of > >> >> epsilon. Most likely you'll see a local min around some optimum value > >> >> of > >> >> regularization, and then it will plateau out as epsilon increases - this > >> >> should, at least, convince you that a high epsilon isn't that bad (if > >> >> it is, > >> >> then your data probably aren't as separable as they should be...) > >> > A great idea! I'll expose the regularization parameter as a variable > >> > to GPR and try it out. > >> >> All that high epsilon does in Tikhonov regularization is penalize steep > >> >> hyperplanes, as it adds the norm of a function to the loss, so the > >> >> results > >> >> will be smoother (and theoretically generalize better). So, it should > >> >> be > >> >> set so that epsilon*function norm is ballpark equal to the raw > >> >> classifier > >> >> loss, so that it makes a difference (though I don't know the exact form > >> >> in > >> >> the GPR implementation; i'm just trying to say that there is an > >> >> effective > >> >> range which isn't necessarily approaching 0). As long as Xfold > >> >> validation > >> >> still works with high epsilon, there's nothing to worry about - in fact, > >> >> that's what it's designed to do. > >> > So, this begs the question of whether we should have some level of > >> > regularization all the time. The main difficulty would be figuring > >> > out what it should be (ballpark value) for different datasets (i.e., > >> > whether there is some principled first-pass for setting it based on > >> > the features of your data.) > >> > Thanks for all the info, > >> > Per > >> >> if you still get a linalg error with very high epsilon, then there's > >> >> probably a different problem ;) > >> >> Per B. Sederberg wrote: > >> >>> hi s&e: > >> >>> this is almost exactly what I did, but I still get the same error. in > >> >>> numpy you get the precision with: > >> >>> np.finfo(np.float64).eps > >> >>> I tried using that value, but actually checking the eps of the C > >> >>> matrix, and also tried using the float32 eps, which is greater than 2 > >> >>> times the float64 eps. all to no avail, so i'm worried my data will > >> >>> need a much larger eps. what are the bad consequences I could run > >> >>> into by adding in a major regularization? > >> >>> thanks, > >> >>> p > >> >>> ps: in case it's informative, I have over 1000 samples so the kernel > >> >>> matrix is quite large... > >> >>> On 3/28/09, Scott Gorlin <[email protected]> wrote: > >> >>>> I haven't checked the python syntax, but in Matlab, the eps function > >> >>>> can > >> >>>> take a number and return the smallest recordable numerical difference > >> >>>> at > >> >>>> that level. If there is an equivalent in numpy then (say 2x) this > >> >>>> would > >> >>>> be a good alternative default epsilon (calculated at train time) > >> >>>> Emanuele Olivetti wrote: > >> >>>>> Dear Per and Scott, > >> >>>>> You are both right and I feel quite guilty since I did not spend > >> >>>>> time on GPR as I promised long time ago (basically lack of time). The > >> >>>>> epsilon issue is exactly what Scott says: I discussed the same issue > >> >>>>> with Yarick some time ago but then did not work on a principled > >> >>>>> solution. > >> >>>>> I'm fine if you patch the current code in order to increase epsilon > >> >>>>> till a > >> >>>>> certain extent. Any idea about alternative/principled solutions? > >> >>>>> Moreover I'm (slowly) after another numerical instability due, I > >> >>>>> guess, to > >> >>>>> underflow in summation which could affect the marginal likelihood > >> >>>>> computation (which means it is highly relevant for GPR). This is more > >> >>>>> likely to happen when the dataset is large. So be prepared to that > >> >>>>> ;-) > >> >>>>> As far as I remember I forced the kernel matrix to be double > >> >>>>> precision > >> >>>>> for exactly the same reasons you mentioned, but checking more > >> >>>>> accurately this fact is highly desirable. > >> >>>>> Unfortunately I'm not going to work on GPR very soon, even though > >> >>>>> I'd love to. So any patch is warmly welcome, as well as bug reports > >> >>>>> ;-) > >> >>>>> Best, > >> >>>>> Emanuele > >> >>>>> Per B. Sederberg wrote: > >> >>>>>> Hi Scott: > >> >>>>>> You are correct on all accounts. epsilon is currently hard-coded > >> >>>>>> and > >> >>>>>> either of your proposed solutions would help to make GPR work more > >> >>>>>> frequently/consistently. I'll try out some combination of both and > >> >>>>>> report back. > >> >>>>>> Best, > >> >>>>>> Per > >> >>>>>> On Sat, Mar 28, 2009 at 9:46 AM, Scott Gorlin <[email protected]> > >> >>>>>> wrote: > >> >>>>>>> Epsilon is the Tikhonov regularization parameter, no? (Haven't > >> >>>>>>> read > >> >>>>>>> much about this classifier, but it looks similar to the RLS > >> >>>>>>> solution > >> >>>>>>> to > >> >>>>>>> me...). As such it should be adjustable to enable punishing > >> >>>>>>> overfit > >> >>>>>>> solutions, so maybe it should actually be a formal parameter. > >> >>>>>>> If the default returns a LinAlg error, you could also stick the > >> >>>>>>> cholesky > >> >>>>>>> decomposition in a finite loop where epsilon is doubled each time > >> >>>>>>> (though not too many iterations...) > >> >>>>>>> On a wild tangent, also check that your kernel matrix (here > >> >>>>>>> self._C) > >> >>>>>>> is > >> >>>>>>> double precision, since 1e-20 is close to the precision limit for > >> >>>>>>> single > >> >>>>>>> floats. > >> >>>>>>> Per B. Sederberg wrote: > >> >>>>>>>> Hi Everybody (and Emanuele in particular): > >> >>>>>>>> I've tried GPR on about 5 different datasets and I've never > >> >>>>>>>> actually > >> >>>>>>>> gotten it to run through an entire cross validation process. I > >> >>>>>>>> always > >> >>>>>>>> get the following error: > >> >>>>>>>> /home/per/lib/python/mvpa/clfs/gpr.pyc in _train(self, data) > >> >>>>>>>> 308 except SLAError: > >> >>>>>>>> 309 epsilon = 1.0e-20 * N.eye(self._C.shape[0]) > >> >>>>>>>> --> 310 self._L = SLcholesky(self._C + epsilon, > >> >>>>>>>> lower=True) > >> >>>>>>>> 311 self._LL = (self._L, True) > >> >>>>>>>> 312 pass > >> >>>>>>>> /usr/lib/python2.5/site-packages/scipy/linalg/decomp.pyc in > >> >>>>>>>> cholesky(a, lower, overwrite_a) > >> >>>>>>>> 552 potrf, = get_lapack_funcs(('potrf',),(a1,)) > >> >>>>>>>> 553 c,info = > >> >>>>>>>> potrf(a1,lower=lower,overwrite_a=overwrite_a,clean=1) > >> >>>>>>>> --> 554 if info>0: raise LinAlgError, "matrix not positive > >> >>>>>>>> definite" > >> >>>>>>>> 555 if info<0: raise ValueError,\ > >> >>>>>>>> 556 'illegal value in %-th argument of internal > >> >>>>>>>> potrf'%(-info) > >> >>>>>>>> LinAlgError: matrix not positive definite > >> >>>>>>>> I think Yarik mentioned changing epsilon or something like that, > >> >>>>>>>> but > >> >>>>>>>> it seems to me that having to change source code to tweak a > >> >>>>>>>> classifier > >> >>>>>>>> to not crash is not ideal. > >> >>>>>>>> Do you have any suggestions for how to fix this? I've been > >> >>>>>>>> tantalized > >> >>>>>>>> with some very good transfer errors in the folds that GPR was > >> >>>>>>>> able to > >> >>>>>>>> complete before crashing. Is there a permanent change we could > >> >>>>>>>> make > >> >>>>>>>> to the GPR code to make it more stable across datasets? Or could > >> >>>>>>>> we > >> >>>>>>>> turn epsilon into a configurable variable when setting up the > >> >>>>>>>> classifier? If so, what should I change epsilon to in order to > >> >>>>>>>> see > >> >>>>>>>> if > >> >>>>>>>> I can get it to run? > >> >>>>>>>> Thanks, > >> >>>>>>>> Per > >> >>>>>>>> _______________________________________________ > >> >>>>>>>> Pkg-ExpPsy-PyMVPA mailing list > >> >>>>>>>> [email protected] > >> >>>>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa > >> >>>>>>> _______________________________________________ > >> >>>>>>> Pkg-ExpPsy-PyMVPA mailing list > >> >>>>>>> [email protected] > >> >>>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa > >> >>>>>> _______________________________________________ > >> >>>>>> Pkg-ExpPsy-PyMVPA mailing list > >> >>>>>> [email protected] > >> >>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa > >> _______________________________________________ > >> Pkg-ExpPsy-PyMVPA mailing list > >> [email protected] > >> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa > > -- > > Yaroslav Halchenko > > Research Assistant, Psychology Department, Rutgers-Newark > > Student Ph.D. @ CS Dept. NJIT > > Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171 > > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102 > > WWW: http://www.linkedin.com/in/yarik -- Yaroslav Halchenko Research Assistant, Psychology Department, Rutgers-Newark Student Ph.D. @ CS Dept. NJIT Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102 WWW: http://www.linkedin.com/in/yarik _______________________________________________ Pkg-ExpPsy-PyMVPA mailing list [email protected] http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

