i think there's a typo in package name under "usage". It should be o.a.m.math.decompositions i think
import org.decompsitions._ On Fri, Mar 27, 2015 at 4:07 PM, Andrew Palumbo <ap....@outlook.com> wrote: > I'm going to put a link under "algorithms" to the algorithm grid, and > leave the actual content in the same place. > > > On 03/27/2015 06:58 PM, Andrew Palumbo wrote: > >> >> On 03/27/2015 06:46 PM, Dmitriy Lyubimov wrote: >> >>> Note also that all these related beasts come in pairs (in-core input <-> >>> distributed input): >>> >>> ssvd - dssvd >>> spca - dspca >>> >> yeah I've been thinking that i'd give a less detailed description of them >> (the in core algos) in an overview page (which would include the basics and >> operators, etc.). Probably makes sense to reference them here as well. >> I'd like to get most of the manual easily viewable on different pages. >> >> Yes. Except it doesn't follow same parallel reordered Givens QR but uses >>> >> Cholesky QR (which we call "thin QR") as an easy-to-implement shortcut. >> But >> this page makes no mention of QR specifics i think >> >> >> right.. no QR specifics, just the dqrThin(...) call in the code. I'm >> going to put the link directly below the Cholesky QR link so that will tie >> together well. >> >> >> >>> On Fri, Mar 27, 2015 at 3:45 PM, Dmitriy Lyubimov <dlie...@gmail.com> >>> wrote: >>> >>> But MR version of SSVD is more stable because of the QR differences. >>>> >>>> On Fri, Mar 27, 2015 at 3:44 PM, Dmitriy Lyubimov <dlie...@gmail.com> >>>> wrote: >>>> >>>> Yes. Except it doesn't follow same parallel reordered Givens QR but >>>>> uses >>>>> Cholesky QR (which we call "thin QR") as an easy-to-implement >>>>> shortcut. But >>>>> this page makes no mention of QR specifics i think >>>>> >>>>> On Fri, Mar 27, 2015 at 12:57 PM, Andrew Palumbo <ap....@outlook.com> >>>>> wrote: >>>>> >>>>> math-scala dssvd follows the same algorithm as MR ssvd correct? >>>>>> Looking >>>>>> at the code against the algorithm outlined at the bottom of >>>>>> http://mahout.apache.org/users/dim-reduction/ssvd.html it seems the >>>>>> same, but I wanted to make I'm not missing anything before I put the >>>>>> following doc up (with the algorithm taken from the MR ssvd page): >>>>>> >>>>>> # Distributed Stochastic Singular Value Decomposition >>>>>> >>>>>> >>>>>> ## Intro >>>>>> >>>>>> Mahout has a distributed implementation of Stochastic Singular Value >>>>>> Decomposition [1]. >>>>>> >>>>>> ## Modified SSVD Algorithm >>>>>> >>>>>> Given an `\(m\times n\)` >>>>>> matrix `\(\mathbf{A}\)`, a target rank `\(k\in\mathbb{N}_{1}\)` >>>>>> , an oversampling parameter `\(p\in\mathbb{N}_{1}\)`, >>>>>> and the number of additional power iterations >>>>>> `\(q\in\mathbb{N}_{0}\)`, >>>>>> this procedure computes an `\(m\times\left(k+p\right)\)` >>>>>> SVD `\(\mathbf{A\approx U}\boldsymbol{\Sigma}\mathbf{V}^{\top}\)`: >>>>>> >>>>>> 1. Create seed for random `\(n\times\left(k+p\right)\)` >>>>>> matrix `\(\boldsymbol{\Omega}\)`. The seed defines matrix >>>>>> `\(\mathbf{\Omega}\)` >>>>>> using Gaussian unit vectors per one of suggestions in [Halko, >>>>>> Martinsson, Tropp]. >>>>>> >>>>>> 2. `\(\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\ >>>>>> mathbb{R}^{m\times\left(k+p\right)}\)` >>>>>> >>>>>> 3. Column-orthonormalize `\(\mathbf{Y}\rightarrow\mathbf{Q}\)` >>>>>> by computing thin decomposition `\(\mathbf{Y}=\mathbf{Q}\ >>>>>> mathbf{R}\)`. >>>>>> Also, `\(\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\ >>>>>> mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)`; >>>>>> denoted as `\(\mathbf{Q}=\mbox{qr}\left(\ >>>>>> mathbf{Y}\right).\mathbf{Q}\)` >>>>>> >>>>>> 4. `\(\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B} >>>>>> \in\mathbb{R}^{\left(k+p\right)\times n}\)`. >>>>>> >>>>>> 5. If `\(q>0\)` >>>>>> repeat: for `\(i=1..q\)`: >>>>>> `\(\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\ >>>>>> left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}\)` >>>>>> (power iterations step). >>>>>> >>>>>> 6. Compute Eigensolution of a small Hermitian >>>>>> `\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\ >>>>>> boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}\)`, >>>>>> `\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left( >>>>>> k+p\right)\times\left(k+p\right)}\)`. >>>>>> >>>>>> 7. Singular values `\(\mathbf{\boldsymbol{\Sigma} >>>>>> }=\boldsymbol{\Lambda}^{0.5}\)`, >>>>>> or, in other words, `\(s_{i}=\sqrt{\sigma_{i}}\)`. >>>>>> >>>>>> 8. If needed, compute `\(\mathbf{U}=\mathbf{Q}\hat{\mathbf{U}}\)`. >>>>>> >>>>>> 9. If needed, compute `\(\mathbf{V}=\mathbf{B}_{q}^{ >>>>>> \top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}\)`. >>>>>> Another way is `\(\mathbf{V}=\mathbf{A}^{\top}\mathbf{U}\boldsymbol{\ >>>>>> Sigma}^{-1}\)`. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ## Implementation >>>>>> >>>>>> Mahout `dssvd(...)` is implemented in the mahout `math-scala` >>>>>> algebraic >>>>>> optimizer which translates Mahout's R-like linear algebra operators >>>>>> into a >>>>>> physical plan for both Spark and H2O distributed engines. >>>>>> >>>>>> def dssvd[K: ClassTag](drmA: DrmLike[K], k: Int, p: Int = 15, q: >>>>>> Int >>>>>> = 0): >>>>>> (DrmLike[K], DrmLike[Int], Vector) = { >>>>>> >>>>>> val drmAcp = drmA.checkpoint() >>>>>> >>>>>> val m = drmAcp.nrow >>>>>> val n = drmAcp.ncol >>>>>> assert(k <= (m min n), "k cannot be greater than smaller of >>>>>> m, >>>>>> n.") >>>>>> val pfxed = safeToNonNegInt((m min n) - k min p) >>>>>> >>>>>> // Actual decomposition rank >>>>>> val r = k + pfxed >>>>>> >>>>>> // We represent Omega by its seed. >>>>>> val omegaSeed = RandomUtils.getRandom().nextInt() >>>>>> >>>>>> // Compute Y = A*Omega. >>>>>> var drmY = drmAcp.mapBlock(ncol = r) { >>>>>> case (keys, blockA) => >>>>>> val blockY = blockA %*% >>>>>> Matrices.symmetricUniformView(n, >>>>>> r, omegaSeed) >>>>>> keys -> blockY >>>>>> } >>>>>> >>>>>> var drmQ = dqrThin(drmY.checkpoint())._1 >>>>>> >>>>>> // Checkpoint Q if last iteration >>>>>> if (q == 0) drmQ = drmQ.checkpoint() >>>>>> >>>>>> var drmBt = drmAcp.t %*% drmQ >>>>>> >>>>>> // Checkpoint B' if last iteration >>>>>> if (q == 0) drmBt = drmBt.checkpoint() >>>>>> >>>>>> for (i <- 0 until q) { >>>>>> drmY = drmAcp %*% drmBt >>>>>> drmQ = dqrThin(drmY.checkpoint())._1 >>>>>> >>>>>> // Checkpoint Q if last iteration >>>>>> if (i == q - 1) drmQ = drmQ.checkpoint() >>>>>> >>>>>> drmBt = drmAcp.t %*% drmQ >>>>>> >>>>>> // Checkpoint B' if last iteration >>>>>> if (i == q - 1) drmBt = drmBt.checkpoint() >>>>>> } >>>>>> >>>>>> val (inCoreUHat, d) = eigen(drmBt.t %*% drmBt) >>>>>> val s = d.sqrt >>>>>> >>>>>> // Since neither drmU nor drmV are actually computed until >>>>>> actually used >>>>>> // we don't need the flags instructing compute (or not >>>>>> compute) >>>>>> either of the U,V outputs >>>>>> val drmU = drmQ %*% inCoreUHat >>>>>> val drmV = drmBt %*% (inCoreUHat %*%: diagv(1 /: s)) >>>>>> >>>>>> (drmU(::, 0 until k), drmV(::, 0 until k), s(0 until k)) >>>>>> } >>>>>> >>>>>> Note: As a side effect of checkpointing, U and V values are returned >>>>>> as >>>>>> logical operators (i.e. they are neither checkpointed nor computed). >>>>>> Therefore there is no physical work actually done to compute >>>>>> `\(\mathbf{U}\)` or `\(\mathbf{V}\)` until they are used in a >>>>>> subsequent >>>>>> expression. >>>>>> >>>>>> >>>>>> ## Usage >>>>>> >>>>>> The scala `dssvd(...)` method can easily be called in any Spark or H2O >>>>>> application built with the `math-scala` library and the corresponding >>>>>> `Spark` or `H2O` engine module as follows: >>>>>> >>>>>> import org.apache.mahout.math._ >>>>>> import org.decompsitions._ >>>>>> >>>>>> >>>>>> val(drmU, drmV, s) = dssvd(drma, k = 40, q = 1) >>>>>> >>>>>> >>>>>> ## References >>>>>> >>>>>> [1]: [Mahout Scala and Mahout Spark Bindings for Linear Algebra >>>>>> Subroutines](http://mahout.apache.org/users/sparkbindings/ >>>>>> ScalaSparkBindings.pdf) >>>>>> >>>>>> [2]: [Halko, Martinsson, Tropp](http://arxiv.org/abs/0909.4061) >>>>>> >>>>>> [3]: [Mahout Spark and Scala Bindings](http://mahout. >>>>>> apache.org/users/ >>>>>> sparkbindings/home.html) >>>>>> >>>>>> [4]: [Randomized methods for computing low-rank >>>>>> approximations of matrices](http://amath. >>>>>> colorado.edu/faculty/martinss/ >>>>>> Pubs/2012_halko_dissertation.pdf) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >> >