On 03/15/2012 06:41 PM, Olivier Grisel wrote:
Le 13 mars 2012 02:32, Olivier Mangin<[email protected]> a écrit :
Hello,
Since I am currently using NMF for my robotics research, I have an
implementation of some algorithms from [1]_ that could extend
scikit-learn current implementation.
More precisely the current implementation covers the Frobenius norm case
(for measuring the reconstruction error), and the sparsity enforcement
method introduced on [2]_. [1]_ describes algorithms that generalize to
all the beta-divergences (Frobenius norm corresponds to beta = 2), and
adaptation of some of these algorithms to L_1 regularization (amongst
other things).
My implementation covers algorithms for beta-divergence minimization
based on three kind of approaches presented in [1]_:
* gradient descent
* maximization-minimization (that leads to multiplicative updates)
* heuristic update (generalizes to all beta, multiplicative updates
commonly used for NMF with Frobenius (beta = 2) norm, Kullback-Leibler
(beta = 1) and Itakura-Saito (beta = 0) divergences)
I was planning to integrate my code into a BetaNMF class in
'sklearn/decomposition/nmf.py' as an alternative to the existing
ProjectedGradientNMF, and to enable access to the various algorithms
through arguments to the fit and transform method (I think default
should be heuristic multiplicative updates).
Is this duplicated work ? Is it the right way to do it ?
Sounds interesting. I don't think this is duplicated work. Do you have
example scripts that shows how it performs on application tasks? AFAIK
the Itakura-Saito divergence is useful for audio signal analysis. How
does you implementation scale compared to the existing sklearn
implementation? What is the maximum number of samples / features /
extracted components that it can process in less than 1min on some
realistic datasets? Rough numbers no need to be specific.
HI,
Thanks for your answers.
To give some precisions about the questions you asked.
The code is quite short (200 lines of code with three different update
methods).
The algorithm is based on updates and an outside loop. All updates are
coded through numpy array operations. More precisely Hadamars
multiplications, products and a matrix products. So the complexity of
each update is the roughly complexity of the matrix product (dimensions
involved are the dimensions from the target factorization). In around 1
minute it can factorize approximately hundred(s) by hundred(s) data matrix.
I've attached the current version of the code.
Currently the code can be used in the following manner:
- import BetaNMF class:
from nmf import BetaNMF
- instantiate a BetaNMF object:
nmf = BetaNMF(X, k, beta=1.2)
# X, is the data matrix (each example is a column), k is the size of
the targeted dictionary
- perform one or more iterations (for small data tens are OK, for bigger
hundreds of iterations are needed
nmf.factorize(iterations=50)
- to get the reconstruction error (as the beta divergence between data
and reconstruction), just call:
nmf.error()
I hope these details will be helpful.
Olivier
# -*- coding: utf-8 -*-
__author__ = 'Olivier Mangin <[email protected]>'
__date__ = '06/2011'
"""Set of tools to perform Non-negative Matrix Factorization with optional
sparsness constraints.
Algorithms described in:
Lee, D. D., & Seung, H. S. (2001). Algorithms for Non-negative Matrix
Factorization. Advances in Neural Information Processing Systems (pp. 556-562). MIT Press.
Hoyer, P. O. (2004). Non-negative Matrix Factorization with Sparseness
Constraints. Journal of Machine Learning Research, 5, 1457â1469.
Févotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the β-divergence.
Neural Computation, 29(9), 2421-2456. doi:http://dx.doi.org/10.1162/NECO_a_00168
"""
import numpy as np
EPSILON = 1.e-9
# Norms (equivalent in beta values) :
FRO = 2
DKL = 1
IS = 0
# Beta NMF update methods
GRADIENT = 0
HEURISTIC = 1
MAXMIN = 2
UPDATES = {'gradient' : GRADIENT, 'heuristic' : HEURISTIC, 'maxmin' : MAXMIN}
def frobenius_norm(M):
"""Returns the forbenius norm of the matrix M.
"""
return np.sqrt(np.sum(M ** 2))
def beta_divergence(beta, X, Y, weights = 1.):
"""Returns the beta-divergence from matrix X to Y.
This is computed value-wise and then summed over all coefficients.
"""
if beta == 0:
q = np.divide(X, Y + EPSILON)
v = q - np.log(q) - 1
elif beta == 1:
v = np.multiply(X, np.log(np.divide(X, Y + EPSILON))) - X + Y
else:
v = X ** beta
v += (beta - 1) * ((Y + EPSILON) ** beta)
v -= beta * np.multiply(X, (Y + EPSILON) ** (beta - 1))
v /= beta * (beta - 1)
return np.sum(weights * v)
def der_beta_div_der_y(beta, X, Y):
"""Returns the result of the elemtent-wise application to matrices X and Y
of the partial derivative of the beta-divergence, regarding second
variable.
"""
if beta == 0:
# d_beta'(x|y) = 1 / y - x / y**2
Yinv = 1. / (Y + EPSILON)
return Yinv - X * (Yinv ** 2)
elif beta == 1:
# d_beta'(x|y) = 1 - x / y
return 1. - X / (Y + EPSILON)
else:
# d_beta'(x|y) = y ** (beta - 1) - x * (y ** (beta - 2))
return (EPSILON + Y) ** (beta - 1) - X * ((Y + EPSILON) ** (beta - 2))
def grad_W(beta, X, W, H, weights = 1.):
"""Returns the gradient (in form of a matrix) of the loss function
for given values of X, W, H.
"""
return np.dot(weights * der_beta_div_der_y(beta, X, np.dot(W, H)), H.T)
def grad_H(beta, X, W, H, weights = 1.):
"""Returns the gradient (in form of a matrix) of the loss function
for given values of X, W, H.
"""
return np.dot(W.T, weights * der_beta_div_der_y(beta, X, np.dot(W, H)))
# Could also be defined as grad_W(beta, X.T, H.T, W.T).T
def heuristic_W(beta, X, W, H, weights = 1.):
reconstr = weights * np.dot(W, H)
numerator = np.dot(X * ((reconstr + EPSILON) ** (beta - 2)), H.T)
return np.divide(numerator, np.dot((reconstr + EPSILON) ** (beta - 1), H.T))
def heuristic_H(beta, X, W, H, weights = 1.):
reconstr = weights * np.dot(W, H)
numerator = np.dot(W.T, X * ((reconstr + EPSILON) ** (beta - 2)))
return np.divide(numerator, np.dot(W.T, (reconstr + EPSILON) ** (beta - 1)))
# Same remark than for gradient
def gamma_exponent(beta):
"""Exponent used for heuristic update from [Fevotte2011].
"""
if beta < 1.:
return 1. / (2. - beta)
elif beta > 2:
return 1. / (beta - 1)
else:
return 1.
class BetaNMF:
"""Class to compute Non negative factorization with beta divergence cost.
@param data: numpy array, shape (data dimension, nb examples)
@param num_bases: inner dimension of the factorization
@param custom_init: if specified, custom values for W and H initialization,
as a pair (W, H)
@param weights: weights on the cost function used as coefficients for each
element of a matrix of same size than data. If smaller dimension is
provided, has the same behaviour than numpy.multiply.
"""
def __init__(self, data, num_bases, beta = 2, custom_init = None, weights = 1.):
self.X = data
(self.f, self.n) = data.shape # Nb features, Nb examples
self.k = num_bases
self.beta = beta
self.H = None
self.W = None
self.weights = weights
if custom_init is None:
self.init()
else:
self.W, self.H = custom_init
def init(self):
"""Init W and H.
"""
self.W = np.random.random((self.f, self.k))
self.H = np.random.random((self.k, self.n))
def factorize(self, update = 'heuristic',
iterations = 1, eta = 0.1, subit = 10):
"""Perform updates on W and H to achieve facorization of X as WH.
Different update rules are available:
- 'gradient': simple alternate projected gradient descent
on beta-divergence,
- 'maxmin': Maximization-Minimization updates (see [Fevote2011]),
- 'heuristic': (default) Heuristic updates (see [Fevote2011]).
"""
try:
up = UPDATES[update]
except KeyError:
raise ValueError, "Update should be in %r." % UPDATES.keys()
for _ in xrange(iterations):
if up == GRADIENT:
# Alternate projected gradient updates
self.grad_update_W(eta, subit)
self.grad_update_H(eta, subit)
elif up == MAXMIN:
# Maximization-Minimization update
self.max_min_update_W()
self.max_min_update_H()
elif up == HEURISTIC:
# Heuristic update
self.heuristic_update_W()
self.heuristic_update_H()
# Update rules
def grad_update_W(self, eta, subit):
for _ in range(subit):
gradW = grad_W(self.beta, self.X, self.W, self.H, weights=self.weights)
self.W -= eta / (1 - eta) * gradW
self.W *= (1 - eta) * (self.W > 0)
def grad_update_H(self, eta, subit):
for _ in range(subit):
gradH = grad_H(self.beta, self.X, self.W, self.H, weights=self.weights)
self.H -= eta / (1 - eta) * gradH
self.H *= (1 - eta) * (self.H > 0)
def max_min_update_W(self):
self.W *= (heuristic_W(self.beta, self.X, self.W, self.H, weights=self.weights)
** gamma_exponent(self.beta))
def max_min_update_H(self):
self.H *= (heuristic_H(self.beta, self.X, self.W, self.H, weights=self.weights)
** gamma_exponent(self.beta))
# Gamma exponent could be computed at init. Does it really matter ?
def heuristic_update_W(self):
self.W *= heuristic_W(self.beta, self.X, self.W, self.H, weights=self.weights)
def heuristic_update_H(self):
self.H *= heuristic_H(self.beta, self.X, self.W, self.H, weights=self.weights)
# Errors and performance estimations
def error(self):
return beta_divergence(self.beta, self.X, np.dot(self.W, self.H), weights=self.weights)
# Projections
def scale(self, factors):
"""Scale W columns and H rows inversely, according to the given coefficients.
"""
factors = np.array(factors)[np.newaxis, :]
self.W *= factors
self.H /= factors.T + EPSILON
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general