Hi Olivier,
The code looks very well written. I think it would fit well in
scikit-learn. The API would have to be modified to fit the scikit-learn
format. You can read more about that at the developers' page:
http://scikit-learn.org/stable/developers/index.html
It will also require some unit tests, some examples in the doc string,
and some narrative documentation (and perhaps a plotting example or two)
on how it can be used. No need to do all of that before submitting a
pull request: you can start with what you have now (label it WIP: work
in progress), and through the github comments we'll help you get the
code to the point where it can be merged.
Again, thanks for doing this! I'm actually using some NMF in my work
right now, and I'm excited to try out these variants.
Jake
Do a pull request on github, and the other developers
On 03/16/12 08:30, Olivier Mangin wrote:
On 03/15/2012 06:41 PM, Olivier Grisel wrote:
Le 13 mars 2012 02:32, Olivier Mangin<[email protected]> a
écrit :
Hello,
Since I am currently using NMF for my robotics research, I have an
implementation of some algorithms from [1]_ that could extend
scikit-learn current implementation.
More precisely the current implementation covers the Frobenius norm
case
(for measuring the reconstruction error), and the sparsity enforcement
method introduced on [2]_. [1]_ describes algorithms that generalize to
all the beta-divergences (Frobenius norm corresponds to beta = 2), and
adaptation of some of these algorithms to L_1 regularization (amongst
other things).
My implementation covers algorithms for beta-divergence minimization
based on three kind of approaches presented in [1]_:
* gradient descent
* maximization-minimization (that leads to multiplicative updates)
* heuristic update (generalizes to all beta, multiplicative updates
commonly used for NMF with Frobenius (beta = 2) norm, Kullback-Leibler
(beta = 1) and Itakura-Saito (beta = 0) divergences)
I was planning to integrate my code into a BetaNMF class in
'sklearn/decomposition/nmf.py' as an alternative to the existing
ProjectedGradientNMF, and to enable access to the various algorithms
through arguments to the fit and transform method (I think default
should be heuristic multiplicative updates).
Is this duplicated work ? Is it the right way to do it ?
Sounds interesting. I don't think this is duplicated work. Do you have
example scripts that shows how it performs on application tasks? AFAIK
the Itakura-Saito divergence is useful for audio signal analysis. How
does you implementation scale compared to the existing sklearn
implementation? What is the maximum number of samples / features /
extracted components that it can process in less than 1min on some
realistic datasets? Rough numbers no need to be specific.
HI,
Thanks for your answers.
To give some precisions about the questions you asked.
The code is quite short (200 lines of code with three different update
methods).
The algorithm is based on updates and an outside loop. All updates are
coded through numpy array operations. More precisely Hadamars
multiplications, products and a matrix products. So the complexity of
each update is the roughly complexity of the matrix product
(dimensions involved are the dimensions from the target
factorization). In around 1 minute it can factorize approximately
hundred(s) by hundred(s) data matrix.
I've attached the current version of the code.
Currently the code can be used in the following manner:
- import BetaNMF class:
from nmf import BetaNMF
- instantiate a BetaNMF object:
nmf = BetaNMF(X, k, beta=1.2)
# X, is the data matrix (each example is a column), k is the size
of the targeted dictionary
- perform one or more iterations (for small data tens are OK, for
bigger hundreds of iterations are needed
nmf.factorize(iterations=50)
- to get the reconstruction error (as the beta divergence between data
and reconstruction), just call:
nmf.error()
I hope these details will be helpful.
Olivier
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general