Re: [Scikit-learn-general] NMF implementation

Olivier Mangin Fri, 16 Mar 2012 08:31:15 -0700

On 03/15/2012 06:41 PM, Olivier Grisel wrote:

Le 13 mars 2012 02:32, Olivier Mangin<[email protected]>  a écrit :

Hello,


Since I am currently using NMF for my robotics research, I have an
implementation of some algorithms from [1]_ that could extend
scikit-learn current implementation.

More precisely the current implementation covers the Frobenius norm case
(for measuring the reconstruction error), and the sparsity enforcement
method introduced on [2]_. [1]_ describes algorithms that generalize to
all the beta-divergences (Frobenius norm corresponds to beta = 2), and
adaptation of some of these algorithms to L_1 regularization (amongst
other things).

My implementation covers algorithms for beta-divergence minimization
based on three kind of approaches presented in [1]_:
* gradient descent
* maximization-minimization (that leads to multiplicative updates)
* heuristic update (generalizes to all beta, multiplicative updates
commonly used for NMF with Frobenius (beta = 2) norm, Kullback-Leibler
(beta = 1) and Itakura-Saito (beta = 0) divergences)

I was planning to integrate my code into a BetaNMF class in
'sklearn/decomposition/nmf.py' as an alternative to the existing
ProjectedGradientNMF, and to enable access to the various algorithms
through arguments to the fit and transform method (I think default
should be heuristic multiplicative updates).

Is this duplicated work ? Is it the right way to do it ?

Sounds interesting. I don't think this is duplicated work. Do you have
example scripts that shows how it performs on application tasks? AFAIK
the Itakura-Saito divergence is useful for audio signal analysis. How
does you implementation scale compared to the existing sklearn
implementation? What is the maximum number of samples / features /
extracted components that it can process in less than 1min on some
realistic datasets? Rough numbers no need to be specific.


HI,
Thanks for your answers.

To give some precisions about the questions you asked.

The code is quite short (200 lines of code with three different updatemethods).

The algorithm is based on updates and an outside loop. All updates arecoded through numpy array operations. More precisely Hadamarsmultiplications, products and a matrix products. So the complexity ofeach update is the roughly complexity of the matrix product (dimensionsinvolved are the dimensions from the target factorization). In around 1minute it can factorize approximately hundred(s) by hundred(s) data matrix.


I've attached the current version of the code.

Currently the code can be used in the following manner:
- import BetaNMF class:
  from nmf import BetaNMF

- instantiate a BetaNMF object:
  nmf = BetaNMF(X, k, beta=1.2)

# X, is the data matrix (each example is a column), k is the size ofthe targeted dictionary

- perform one or more iterations (for small data tens are OK, for biggerhundreds of iterations are needed

      nmf.factorize(iterations=50)

- to get the reconstruction error (as the beta divergence between dataand reconstruction), just call:

  nmf.error()

I hope these details will be helpful.

Olivier

# -*- coding: utf-8 -*-


__author__ = 'Olivier Mangin <[email protected]>'
__date__ = '06/2011'


"""Set of tools to perform Non-negative Matrix Factorization with optional
sparsness constraints.

Algorithms described in:
    Lee, D. D., & Seung, H. S. (2001). Algorithms for Non-negative Matrix
    Factorization. Advances in Neural Information Processing Systems (pp. 556-562). MIT Press.

    Hoyer, P. O. (2004). Non-negative Matrix Factorization with Sparseness
    Constraints. Journal of Machine Learning Research, 5, 1457â1469.

    FÃ©votte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the Î²-divergence.
    Neural Computation, 29(9), 2421-2456. doi:http://dx.doi.org/10.1162/NECO_a_00168
"""


import numpy as np


EPSILON = 1.e-9

# Norms (equivalent in beta values) :
FRO = 2
DKL = 1
IS = 0

# Beta NMF update methods
GRADIENT = 0
HEURISTIC = 1
MAXMIN = 2
UPDATES = {'gradient' : GRADIENT, 'heuristic' : HEURISTIC, 'maxmin' : MAXMIN}


def frobenius_norm(M):
    """Returns the forbenius norm of the matrix M.
    """
    return np.sqrt(np.sum(M ** 2))

def beta_divergence(beta, X, Y, weights = 1.):
    """Returns the beta-divergence from matrix X to Y.
    This is computed value-wise and then summed over all coefficients.
    """
    if beta == 0:
        q = np.divide(X, Y + EPSILON)
        v = q - np.log(q) - 1
    elif beta == 1:
        v = np.multiply(X, np.log(np.divide(X, Y + EPSILON))) - X + Y
    else:
        v = X ** beta 
        v += (beta - 1) * ((Y + EPSILON) ** beta)
        v -= beta * np.multiply(X, (Y + EPSILON) ** (beta - 1))
        v /= beta * (beta - 1)
    return np.sum(weights * v)

def der_beta_div_der_y(beta, X, Y):
    """Returns the result of the elemtent-wise application to matrices X and Y
    of the partial derivative of the beta-divergence, regarding second
    variable.
    """
    if beta == 0:
        # d_beta'(x|y) = 1 / y - x / y**2
        Yinv = 1. / (Y + EPSILON)
        return Yinv - X * (Yinv ** 2)
    elif beta == 1:
        # d_beta'(x|y) = 1 - x / y
        return 1. - X / (Y + EPSILON)
    else:
        # d_beta'(x|y) = y ** (beta - 1) - x * (y ** (beta - 2))
        return (EPSILON + Y) ** (beta - 1) - X * ((Y + EPSILON) ** (beta - 2))

def grad_W(beta, X, W, H, weights = 1.):
    """Returns the gradient (in form of a matrix) of the loss function
    for given values of X, W, H.
    """
    return np.dot(weights * der_beta_div_der_y(beta, X, np.dot(W, H)), H.T)

def grad_H(beta, X, W, H, weights = 1.):
    """Returns the gradient (in form of a matrix) of the loss function
    for given values of X, W, H.
    """
    return np.dot(W.T, weights * der_beta_div_der_y(beta, X, np.dot(W, H)))
    # Could also be defined as grad_W(beta, X.T, H.T, W.T).T

def heuristic_W(beta, X, W, H, weights = 1.):
    reconstr = weights * np.dot(W, H)
    numerator = np.dot(X * ((reconstr + EPSILON) ** (beta - 2)), H.T)
    return np.divide(numerator, np.dot((reconstr + EPSILON) ** (beta - 1), H.T))

def heuristic_H(beta, X, W, H, weights = 1.):
    reconstr = weights * np.dot(W, H)
    numerator = np.dot(W.T, X * ((reconstr + EPSILON) ** (beta - 2)))
    return np.divide(numerator, np.dot(W.T, (reconstr + EPSILON) ** (beta - 1)))
    # Same remark than for gradient

def gamma_exponent(beta):
    """Exponent used for heuristic update from [Fevotte2011].
    """
    if beta < 1.:
        return 1. / (2. - beta)
    elif beta > 2:
        return 1. / (beta - 1)
    else:
        return 1.


class BetaNMF:
    """Class to compute Non negative factorization with beta divergence cost.

    @param data: numpy array, shape (data dimension, nb examples)
    @param num_bases: inner dimension of the factorization
    @param custom_init: if specified, custom values for W and H initialization,
        as a pair (W, H)
    @param weights: weights on the cost function used as coefficients for each
        element of a matrix of same size than data. If smaller dimension is
        provided, has the same behaviour than numpy.multiply.
    """

    def __init__(self, data, num_bases, beta = 2, custom_init = None, weights = 1.):
        self.X = data
        (self.f, self.n) = data.shape # Nb features, Nb examples
        self.k = num_bases
        self.beta = beta
        self.H = None
        self.W = None
        self.weights = weights
        if custom_init is None:
            self.init()
        else:
            self.W, self.H = custom_init

    def init(self):
        """Init W and H.
        """
        self.W = np.random.random((self.f, self.k))
        self.H = np.random.random((self.k, self.n))

    def factorize(self, update = 'heuristic',
            iterations = 1, eta = 0.1, subit = 10):
        """Perform updates on W and H to achieve facorization of X as WH.
        Different update rules are available:
        - 'gradient': simple alternate projected gradient descent
            on beta-divergence,
        - 'maxmin': Maximization-Minimization updates (see [Fevote2011]),
        - 'heuristic': (default) Heuristic updates (see [Fevote2011]).
        """
        try:
            up = UPDATES[update]
        except KeyError:
            raise ValueError, "Update should be in %r." % UPDATES.keys()
        for _ in xrange(iterations):
            if up == GRADIENT:
                # Alternate projected gradient updates
                self.grad_update_W(eta, subit)
                self.grad_update_H(eta, subit)
            elif up == MAXMIN:
                # Maximization-Minimization update
                self.max_min_update_W()
                self.max_min_update_H()
            elif up == HEURISTIC:
                # Heuristic update
                self.heuristic_update_W()
                self.heuristic_update_H()

    # Update rules

    def grad_update_W(self, eta, subit):
        for _ in range(subit):
            gradW = grad_W(self.beta, self.X, self.W, self.H, weights=self.weights)
            self.W -= eta / (1 - eta) * gradW
            self.W *= (1 - eta) * (self.W > 0)

    def grad_update_H(self, eta, subit):
        for _ in range(subit):
            gradH = grad_H(self.beta, self.X, self.W, self.H, weights=self.weights)
            self.H -= eta / (1 - eta) * gradH
            self.H *= (1 - eta) * (self.H > 0)

    def max_min_update_W(self):
        self.W *= (heuristic_W(self.beta, self.X, self.W, self.H, weights=self.weights)
                ** gamma_exponent(self.beta))

    def max_min_update_H(self):
        self.H *= (heuristic_H(self.beta, self.X, self.W, self.H, weights=self.weights)
                ** gamma_exponent(self.beta))
        # Gamma exponent could be computed at init. Does it really matter ?

    def heuristic_update_W(self):
        self.W *= heuristic_W(self.beta, self.X, self.W, self.H, weights=self.weights)

    def heuristic_update_H(self):
        self.H *= heuristic_H(self.beta, self.X, self.W, self.H, weights=self.weights)

    # Errors and performance estimations

    def error(self):
        return beta_divergence(self.beta, self.X, np.dot(self.W, self.H), weights=self.weights)

    # Projections

    def scale(self, factors):
        """Scale W columns and H rows inversely, according to the given coefficients.
        """
        factors = np.array(factors)[np.newaxis, :]
        self.W *= factors
        self.H /= factors.T + EPSILON

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] NMF implementation

Reply via email to