Re: [Scikit-learn-general] NMF implementation

Jacob VanderPlas Fri, 16 Mar 2012 09:52:32 -0700

Hi Olivier,

The code looks very well written. I think it would fit well inscikit-learn. The API would have to be modified to fit the scikit-learnformat. You can read more about that at the developers' page:http://scikit-learn.org/stable/developers/index.html

It will also require some unit tests, some examples in the doc string,and some narrative documentation (and perhaps a plotting example or two)on how it can be used. No need to do all of that before submitting apull request: you can start with what you have now (label it WIP: workin progress), and through the github comments we'll help you get thecode to the point where it can be merged.

Again, thanks for doing this! I'm actually using some NMF in my workright now, and I'm excited to try out these variants.

   Jake

Do a pull request on github, and the other developers

On 03/16/12 08:30, Olivier Mangin wrote:

On 03/15/2012 06:41 PM, Olivier Grisel wrote:

Le 13 mars 2012 02:32, Olivier Mangin<[email protected]> aécrit :

Hello,

Since I am currently using NMF for my robotics research, I have an
implementation of some algorithms from [1]_ that could extend
scikit-learn current implementation.

More precisely the current implementation covers the Frobenius normcase

(for measuring the reconstruction error), and the sparsity enforcement
method introduced on [2]_. [1]_ describes algorithms that generalize to
all the beta-divergences (Frobenius norm corresponds to beta = 2), and
adaptation of some of these algorithms to L_1 regularization (amongst
other things).

My implementation covers algorithms for beta-divergence minimization
based on three kind of approaches presented in [1]_:
* gradient descent
* maximization-minimization (that leads to multiplicative updates)
* heuristic update (generalizes to all beta, multiplicative updates
commonly used for NMF with Frobenius (beta = 2) norm, Kullback-Leibler
(beta = 1) and Itakura-Saito (beta = 0) divergences)

I was planning to integrate my code into a BetaNMF class in
'sklearn/decomposition/nmf.py' as an alternative to the existing
ProjectedGradientNMF, and to enable access to the various algorithms
through arguments to the fit and transform method (I think default
should be heuristic multiplicative updates).

Is this duplicated work ? Is it the right way to do it ?

Sounds interesting. I don't think this is duplicated work. Do you have
example scripts that shows how it performs on application tasks? AFAIK
the Itakura-Saito divergence is useful for audio signal analysis. How
does you implementation scale compared to the existing sklearn
implementation? What is the maximum number of samples / features /
extracted components that it can process in less than 1min on some
realistic datasets? Rough numbers no need to be specific.


HI,
Thanks for your answers.

To give some precisions about the questions you asked.

The code is quite short (200 lines of code with three different updatemethods).

The algorithm is based on updates and an outside loop. All updates arecoded through numpy array operations. More precisely Hadamarsmultiplications, products and a matrix products. So the complexity ofeach update is the roughly complexity of the matrix product(dimensions involved are the dimensions from the targetfactorization). In around 1 minute it can factorize approximatelyhundred(s) by hundred(s) data matrix.


I've attached the current version of the code.

Currently the code can be used in the following manner:
- import BetaNMF class:
  from nmf import BetaNMF

- instantiate a BetaNMF object:
  nmf = BetaNMF(X, k, beta=1.2)

# X, is the data matrix (each example is a column), k is the sizeof the targeted dictionary

- perform one or more iterations (for small data tens are OK, forbigger hundreds of iterations are needed

      nmf.factorize(iterations=50)

- to get the reconstruction error (as the beta divergence between dataand reconstruction), just call:

  nmf.error()

I hope these details will be helpful.

Olivier


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] NMF implementation

Reply via email to