math-scala dssvd docs

Andrew Palumbo Fri, 27 Mar 2015 12:58:19 -0700

math-scala dssvd follows the same algorithm as MR ssvd correct? Lookingat the code against the algorithm outlined at the bottom ofhttp://mahout.apache.org/users/dim-reduction/ssvd.html it seems thesame, but I wanted to make I'm not missing anything before I put thefollowing doc up (with the algorithm taken from the MR ssvd page):


# Distributed Stochastic Singular Value Decomposition



## Intro

Mahout has a distributed implementation of Stochastic Singular ValueDecomposition [1].


## Modified SSVD Algorithm

Given an `\(m\times n\)`
matrix `\(\mathbf{A}\)`, a target rank `\(k\in\mathbb{N}_{1}\)`
, an oversampling parameter `\(p\in\mathbb{N}_{1}\)`,
and the number of additional power iterations `\(q\in\mathbb{N}_{0}\)`,
this procedure computes an `\(m\times\left(k+p\right)\)`
SVD `\(\mathbf{A\approx U}\boldsymbol{\Sigma}\mathbf{V}^{\top}\)`:

  1. Create seed for random `\(n\times\left(k+p\right)\)`

matrix `\(\boldsymbol{\Omega}\)`. The seed defines matrix`\(\mathbf{\Omega}\)`using Gaussian unit vectors per one of suggestions in [Halko,Martinsson, Tropp].

2.`\(\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\mathbb{R}^{m\times\left(k+p\right)}\)`


  3. Column-orthonormalize `\(\mathbf{Y}\rightarrow\mathbf{Q}\)`
  by computing thin decomposition `\(\mathbf{Y}=\mathbf{Q}\mathbf{R}\)`.

Also,`\(\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)`;denoted as `\(\mathbf{Q}=\mbox{qr}\left(\mathbf{Y}\right).\mathbf{Q}\)`

4.`\(\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B}\in\mathbb{R}^{\left(k+p\right)\timesn}\)`.


  5. If `\(q>0\)`
  repeat: for `\(i=1..q\)`:
`\(\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}\)`
  (power iterations step).

6. Compute Eigensolution of a small Hermitian`\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}\)`,

`\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)`.

7. Singular values`\(\mathbf{\boldsymbol{\Sigma}}=\boldsymbol{\Lambda}^{0.5}\)`,

  or, in other words, `\(s_{i}=\sqrt{\sigma_{i}}\)`.

  8. If needed, compute `\(\mathbf{U}=\mathbf{Q}\hat{\mathbf{U}}\)`.

9. If needed, compute`\(\mathbf{V}=\mathbf{B}_{q}^{\top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}\)`.Another way is`\(\mathbf{V}=\mathbf{A}^{\top}\mathbf{U}\boldsymbol{\Sigma}^{-1}\)`.





## Implementation

Mahout `dssvd(...)` is implemented in the mahout `math-scala` algebraicoptimizer which translates Mahout's R-like linear algebra operators intoa physical plan for both Spark and H2O distributed engines.

def dssvd[K: ClassTag](drmA: DrmLike[K], k: Int, p: Int = 15, q:Int = 0):

        (DrmLike[K], DrmLike[Int], Vector) = {

        val drmAcp = drmA.checkpoint()

        val m = drmAcp.nrow
        val n = drmAcp.ncol
        assert(k <= (m min n), "k cannot be greater than smaller of m, n.")
        val pfxed = safeToNonNegInt((m min n) - k min p)

        // Actual decomposition rank
        val r = k + pfxed

        // We represent Omega by its seed.
        val omegaSeed = RandomUtils.getRandom().nextInt()

        // Compute Y = A*Omega.
        var drmY = drmAcp.mapBlock(ncol = r) {
            case (keys, blockA) =>

val blockY = blockA %*%Matrices.symmetricUniformView(n, r, omegaSeed)

            keys -> blockY
        }

        var drmQ = dqrThin(drmY.checkpoint())._1

        // Checkpoint Q if last iteration
        if (q == 0) drmQ = drmQ.checkpoint()

        var drmBt = drmAcp.t %*% drmQ

        // Checkpoint B' if last iteration
        if (q == 0) drmBt = drmBt.checkpoint()

        for (i <- 0  until q) {
            drmY = drmAcp %*% drmBt
            drmQ = dqrThin(drmY.checkpoint())._1

            // Checkpoint Q if last iteration
            if (i == q - 1) drmQ = drmQ.checkpoint()

            drmBt = drmAcp.t %*% drmQ

            // Checkpoint B' if last iteration
            if (i == q - 1) drmBt = drmBt.checkpoint()
        }

        val (inCoreUHat, d) = eigen(drmBt.t %*% drmBt)
        val s = d.sqrt

// Since neither drmU nor drmV are actually computed untilactually used// we don't need the flags instructing compute (or not compute)either of the U,V outputs

        val drmU = drmQ %*% inCoreUHat
        val drmV = drmBt %*% (inCoreUHat %*%: diagv(1 /: s))

        (drmU(::, 0 until k), drmV(::, 0 until k), s(0 until k))
    }

Note: As a side effect of checkpointing, U and V values are returned aslogical operators (i.e. they are neither checkpointed nor computed).Therefore there is no physical work actually done to compute`\(\mathbf{U}\)` or `\(\mathbf{V}\)` until they are used in a subsequentexpression.



## Usage

The scala `dssvd(...)` method can easily be called in any Spark or H2Oapplication built with the `math-scala` library and the corresponding`Spark` or `H2O` engine module as follows:


    import org.apache.mahout.math._
    import org.decompsitions._


    val(drmU, drmV, s) = dssvd(drma, k = 40, q = 1)


## References

[1]: [Mahout Scala and Mahout Spark Bindings for Linear AlgebraSubroutines](http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf)


[2]: [Halko, Martinsson, Tropp](http://arxiv.org/abs/0909.4061)

[3]: [Mahout Spark and ScalaBindings](http://mahout.apache.org/users/sparkbindings/home.html)


[4]: [Randomized methods for computing low-rank

approximations ofmatrices](http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf)

math-scala dssvd docs

Reply via email to