kernel as data input [was GraphLasso pull request and feature]

Mathieu Blondel Wed, 09 Nov 2011 12:02:13 -0800

On Thu, Nov 10, 2011 at 2:31 AM, Olivier Grisel
<[email protected]> wrote:


> Maybe it would be better to have a dedicated method for this use case
> rather that using complex kwargs in the `fit` method. What about using
> a `fit_from_kernel` or `fit_from_affinity` method (if you find a
> better name)?

For me the main problem of precomputed kernels is more the data
representation than the method names.

1) storing a symmetric matrix in a 2d dense array is inefficient memory-wise
2) some values in the matrix may never be needed during training (this
is the case for SVMs for example) so computing them all is a waste
3) for SVC.predict, requiring a n_test x n_train dense array is a huge waste

For 3), I'd suggest using n_test x n_SV array instead.
For 1), using a upper triangular packed format (i.e. store the values
in a 1d-array) would be a solution and would be easy to use from
Cython. Combined with mmap arrays, it would allow to store large Gram
matrices (but wouldn't solve problem 2)).

User-defined kernel functions could be an answer to all 1), 2) and 3)
if the function was called on-demand but I think the current
implementation just pre-computes the entire Gram matrix.

Currently, people who want to learn SVMs on large-scale datasets with
custom kernels should probably use libsvm's C++ API directly.

Mathieu

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Consistent API for handling affinity / Gram / kernel as data input [was GraphLasso pull request and feature]

Reply via email to