Le 10/11/2011 11:21, Andreas Müller a écrit : > On 11/10/2011 12:18 AM, Gael Varoquaux wrote: >> On Wed, Nov 09, 2011 at 11:00:34PM +0100, Andreas Mueller wrote: >>> As in the other thread, usually one has to scan for parameters any way. >>> Computing every value just once and then storing it seems ok to me. For >>> example, for the chi2 kernel, there is very efficient code available by >>> Christoph Lampert using SSE2 instructions. I used precomputed kernel >>> matrices for multi instance kernels. I could easily implement them on >>> the GPU using batches and then store them one and for all. If I had to >>> do memory transfers for every single example that I need the kernel >>> for, it would be very slow. >>> Maybe these are special use cases but I think they are valid ones. >> They are, but the question is: can they be answered in a toolkit meant to >> be used from Python, where there is a large function-call overhead? I >> don't know the answer to this question, to be fair, I am just raising it. > Maybe I wasn't clear in making my point: I was trying to say > that computing the whole gram matrix worked just fine for me. > > I think the large function call overhead makes other solutions > impractical. I agree with Andy and my use cases seem to be similar. To sum up my point of view: - not every one works only on large scale ;-) - linear methods are trendy but I still believe in kernels - precomputing the full kernel matrices allows for optimization tricks that matter a lot in practice (e.g. parallelization)
Some details of my use case for those that are interested: I work on action recognition and most of the datasets have a small number of samples (< 10k). I use complex video models (not necessarily vector-based) and complex kernels. The cost of kernel computations is quite high, therefore I coded it in C, precompute the gram matrices off-line and store them, then have fun with sklearn :-) In the case of Gaussian RBF kernels with custom distances, I precompute the distances, not the whole kernel. That way, when I want to cross-validate the bandwidth parameter, I just have to exponentiate the distance matrix when I change the bandwidth instead of recomputing the kernel. Finally, with a small number of samples, I found it faster to compute all possible pairwise kernel evaluations offline (and in parallel because it is embarrassingly parallel of course), even though not all might be used, e.g. with SVMs. You don't know the support vectors in advance and changing C changes the SV. Therefore, you probably need to recompute kernel evaluations unless you cache them. Furthermore, the best C values on my problems are high ones and, therefore, I have almost all points as SV. That was also true in my experience on the Pascal VOC challenge with RBF chi-square kernels on Bag-of-Features. Hope this helps. Cheers, Adrien > > ------------------------------------------------------------------------------ > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsa-sfdev2dev1 > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
