hi s&e: this is almost exactly what I did, but I still get the same error. in numpy you get the precision with:
np.finfo(np.float64).eps I tried using that value, but actually checking the eps of the C matrix, and also tried using the float32 eps, which is greater than 2 times the float64 eps. all to no avail, so i'm worried my data will need a much larger eps. what are the bad consequences I could run into by adding in a major regularization? thanks, p ps: in case it's informative, I have over 1000 samples so the kernel matrix is quite large... On 3/28/09, Scott Gorlin <[email protected]> wrote: > I haven't checked the python syntax, but in Matlab, the eps function can > take a number and return the smallest recordable numerical difference at > that level. If there is an equivalent in numpy then (say 2x) this would > be a good alternative default epsilon (calculated at train time) > > Emanuele Olivetti wrote: >> Dear Per and Scott, >> >> You are both right and I feel quite guilty since I did not spend >> time on GPR as I promised long time ago (basically lack of time). The >> epsilon issue is exactly what Scott says: I discussed the same issue >> with Yarick some time ago but then did not work on a principled solution. >> I'm fine if you patch the current code in order to increase epsilon >> till a >> certain extent. Any idea about alternative/principled solutions? >> >> Moreover I'm (slowly) after another numerical instability due, I >> guess, to >> underflow in summation which could affect the marginal likelihood >> computation (which means it is highly relevant for GPR). This is more >> likely to happen when the dataset is large. So be prepared to that ;-) >> >> As far as I remember I forced the kernel matrix to be double precision >> for exactly the same reasons you mentioned, but checking more >> accurately this fact is highly desirable. >> >> Unfortunately I'm not going to work on GPR very soon, even though >> I'd love to. So any patch is warmly welcome, as well as bug reports ;-) >> >> Best, >> >> Emanuele >> >> >> Per B. Sederberg wrote: >>> Hi Scott: >>> >>> You are correct on all accounts. epsilon is currently hard-coded and >>> either of your proposed solutions would help to make GPR work more >>> frequently/consistently. I'll try out some combination of both and >>> report back. >>> >>> Best, >>> Per >>> >>> On Sat, Mar 28, 2009 at 9:46 AM, Scott Gorlin <[email protected]> wrote: >>> >>>> Epsilon is the Tikhonov regularization parameter, no? (Haven't read >>>> much about this classifier, but it looks similar to the RLS solution to >>>> me...). As such it should be adjustable to enable punishing overfit >>>> solutions, so maybe it should actually be a formal parameter. >>>> >>>> If the default returns a LinAlg error, you could also stick the >>>> cholesky >>>> decomposition in a finite loop where epsilon is doubled each time >>>> (though not too many iterations...) >>>> >>>> On a wild tangent, also check that your kernel matrix (here self._C) is >>>> double precision, since 1e-20 is close to the precision limit for >>>> single >>>> floats. >>>> >>>> Per B. Sederberg wrote: >>>> >>>>> Hi Everybody (and Emanuele in particular): >>>>> >>>>> I've tried GPR on about 5 different datasets and I've never actually >>>>> gotten it to run through an entire cross validation process. I always >>>>> get the following error: >>>>> >>>>> /home/per/lib/python/mvpa/clfs/gpr.pyc in _train(self, data) >>>>> 308 except SLAError: >>>>> 309 epsilon = 1.0e-20 * N.eye(self._C.shape[0]) >>>>> --> 310 self._L = SLcholesky(self._C + epsilon, >>>>> lower=True) >>>>> 311 self._LL = (self._L, True) >>>>> 312 pass >>>>> >>>>> /usr/lib/python2.5/site-packages/scipy/linalg/decomp.pyc in >>>>> cholesky(a, lower, overwrite_a) >>>>> 552 potrf, = get_lapack_funcs(('potrf',),(a1,)) >>>>> 553 c,info = >>>>> potrf(a1,lower=lower,overwrite_a=overwrite_a,clean=1) >>>>> --> 554 if info>0: raise LinAlgError, "matrix not positive >>>>> definite" >>>>> 555 if info<0: raise ValueError,\ >>>>> 556 'illegal value in %-th argument of internal >>>>> potrf'%(-info) >>>>> >>>>> LinAlgError: matrix not positive definite >>>>> >>>>> I think Yarik mentioned changing epsilon or something like that, but >>>>> it seems to me that having to change source code to tweak a classifier >>>>> to not crash is not ideal. >>>>> >>>>> Do you have any suggestions for how to fix this? I've been tantalized >>>>> with some very good transfer errors in the folds that GPR was able to >>>>> complete before crashing. Is there a permanent change we could make >>>>> to the GPR code to make it more stable across datasets? Or could we >>>>> turn epsilon into a configurable variable when setting up the >>>>> classifier? If so, what should I change epsilon to in order to see if >>>>> I can get it to run? >>>>> >>>>> Thanks, >>>>> Per >>>>> >>>>> _______________________________________________ >>>>> Pkg-ExpPsy-PyMVPA mailing list >>>>> [email protected] >>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Pkg-ExpPsy-PyMVPA mailing list >>>> [email protected] >>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa >>>> >>>> >>> >>> _______________________________________________ >>> Pkg-ExpPsy-PyMVPA mailing list >>> [email protected] >>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa >>> >>> >> > _______________________________________________ Pkg-ExpPsy-PyMVPA mailing list [email protected] http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa

