Le 10/11/2011 11:21, Andreas Müller a écrit :
> On 11/10/2011 12:18 AM, Gael Varoquaux wrote:
>> On Wed, Nov 09, 2011 at 11:00:34PM +0100, Andreas Mueller wrote:
>>> As in the other thread, usually one has to scan for parameters any way.
>>> Computing every value just once and then storing it seems ok to me. For
>>> example, for the chi2 kernel, there is very efficient code available by
>>> Christoph Lampert using SSE2 instructions. I used precomputed kernel
>>> matrices for multi instance kernels. I could easily implement them on
>>> the GPU using batches and then store them one and for all. If I had to
>>> do memory transfers for every single example that I need the kernel
>>> for, it would be very slow.
>>> Maybe these are special use cases but I think they are valid ones.
>> They are, but the question is: can they be answered in a toolkit meant to
>> be used from Python, where there is a large function-call overhead? I
>> don't know the answer to this question, to be fair, I am just raising it.
> Maybe I wasn't clear in making my point: I was trying to say
> that computing the whole gram matrix worked just fine for me.
>
> I think the large function call overhead makes other solutions
> impractical.
I agree with Andy and my use cases seem to be similar. To sum up my 
point of view:
- not every one works only on large scale ;-)
- linear methods are trendy but I still believe in kernels
- precomputing the full kernel matrices allows for optimization tricks 
that matter a lot in practice (e.g. parallelization)

Some details of my use case for those that are interested:

I work on action recognition and most of the datasets have a small 
number of samples (< 10k). I use complex video models (not necessarily 
vector-based) and complex kernels. The cost of kernel computations is 
quite high, therefore I coded it in C, precompute the gram matrices 
off-line and store them, then have fun with sklearn :-)

In the case of Gaussian RBF kernels with custom distances, I precompute 
the distances, not the whole kernel. That way, when I want to 
cross-validate the bandwidth parameter, I just have to exponentiate the 
distance matrix when I change the bandwidth instead of recomputing the 
kernel.

Finally, with a small number of samples, I found it faster to compute 
all possible pairwise kernel evaluations offline (and in parallel 
because it is embarrassingly parallel of course), even though not all 
might be used, e.g. with SVMs. You don't know the support vectors in 
advance and changing C changes the SV. Therefore, you probably need to 
recompute kernel evaluations unless you cache them. Furthermore, the 
best C values on my problems are high ones and, therefore, I have almost 
all points as SV. That was also true in my experience on the Pascal VOC 
challenge with RBF chi-square kernels on Bag-of-Features.

Hope this helps.

Cheers,

Adrien
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to