it is possible to say import o.a.m.math._ import decompositions._
then it will assume second line as o.a.m.math.decompositions automatically On Fri, Mar 27, 2015 at 4:09 PM, Dmitriy Lyubimov <[email protected]> wrote: > i think there's a typo in package name under "usage". It should be > o.a.m.math.decompositions i think > > import org.decompsitions._ > > > On Fri, Mar 27, 2015 at 4:07 PM, Andrew Palumbo <[email protected]> > wrote: > >> I'm going to put a link under "algorithms" to the algorithm grid, and >> leave the actual content in the same place. >> >> >> On 03/27/2015 06:58 PM, Andrew Palumbo wrote: >> >>> >>> On 03/27/2015 06:46 PM, Dmitriy Lyubimov wrote: >>> >>>> Note also that all these related beasts come in pairs (in-core input <-> >>>> distributed input): >>>> >>>> ssvd - dssvd >>>> spca - dspca >>>> >>> yeah I've been thinking that i'd give a less detailed description of >>> them (the in core algos) in an overview page (which would include the >>> basics and operators, etc.). Probably makes sense to reference them here >>> as well. I'd like to get most of the manual easily viewable on different >>> pages. >>> >>> Yes. Except it doesn't follow same parallel reordered Givens QR but uses >>>> >>> Cholesky QR (which we call "thin QR") as an easy-to-implement shortcut. >>> But >>> this page makes no mention of QR specifics i think >>> >>> >>> right.. no QR specifics, just the dqrThin(...) call in the code. I'm >>> going to put the link directly below the Cholesky QR link so that will tie >>> together well. >>> >>> >>> >>>> On Fri, Mar 27, 2015 at 3:45 PM, Dmitriy Lyubimov <[email protected]> >>>> wrote: >>>> >>>> But MR version of SSVD is more stable because of the QR differences. >>>>> >>>>> On Fri, Mar 27, 2015 at 3:44 PM, Dmitriy Lyubimov <[email protected]> >>>>> wrote: >>>>> >>>>> Yes. Except it doesn't follow same parallel reordered Givens QR but >>>>>> uses >>>>>> Cholesky QR (which we call "thin QR") as an easy-to-implement >>>>>> shortcut. But >>>>>> this page makes no mention of QR specifics i think >>>>>> >>>>>> On Fri, Mar 27, 2015 at 12:57 PM, Andrew Palumbo <[email protected]> >>>>>> wrote: >>>>>> >>>>>> math-scala dssvd follows the same algorithm as MR ssvd correct? >>>>>>> Looking >>>>>>> at the code against the algorithm outlined at the bottom of >>>>>>> http://mahout.apache.org/users/dim-reduction/ssvd.html it seems the >>>>>>> same, but I wanted to make I'm not missing anything before I put the >>>>>>> following doc up (with the algorithm taken from the MR ssvd page): >>>>>>> >>>>>>> # Distributed Stochastic Singular Value Decomposition >>>>>>> >>>>>>> >>>>>>> ## Intro >>>>>>> >>>>>>> Mahout has a distributed implementation of Stochastic Singular Value >>>>>>> Decomposition [1]. >>>>>>> >>>>>>> ## Modified SSVD Algorithm >>>>>>> >>>>>>> Given an `\(m\times n\)` >>>>>>> matrix `\(\mathbf{A}\)`, a target rank `\(k\in\mathbb{N}_{1}\)` >>>>>>> , an oversampling parameter `\(p\in\mathbb{N}_{1}\)`, >>>>>>> and the number of additional power iterations >>>>>>> `\(q\in\mathbb{N}_{0}\)`, >>>>>>> this procedure computes an `\(m\times\left(k+p\right)\)` >>>>>>> SVD `\(\mathbf{A\approx U}\boldsymbol{\Sigma}\mathbf{V}^{\top}\)`: >>>>>>> >>>>>>> 1. Create seed for random `\(n\times\left(k+p\right)\)` >>>>>>> matrix `\(\boldsymbol{\Omega}\)`. The seed defines matrix >>>>>>> `\(\mathbf{\Omega}\)` >>>>>>> using Gaussian unit vectors per one of suggestions in [Halko, >>>>>>> Martinsson, Tropp]. >>>>>>> >>>>>>> 2. `\(\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\ >>>>>>> mathbb{R}^{m\times\left(k+p\right)}\)` >>>>>>> >>>>>>> 3. Column-orthonormalize `\(\mathbf{Y}\rightarrow\mathbf{Q}\)` >>>>>>> by computing thin decomposition `\(\mathbf{Y}=\mathbf{Q}\ >>>>>>> mathbf{R}\)`. >>>>>>> Also, `\(\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\ >>>>>>> mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)`; >>>>>>> denoted as `\(\mathbf{Q}=\mbox{qr}\left(\ >>>>>>> mathbf{Y}\right).\mathbf{Q}\)` >>>>>>> >>>>>>> 4. `\(\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B} >>>>>>> \in\mathbb{R}^{\left(k+p\right)\times n}\)`. >>>>>>> >>>>>>> 5. If `\(q>0\)` >>>>>>> repeat: for `\(i=1..q\)`: >>>>>>> `\(\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\ >>>>>>> left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}\)` >>>>>>> (power iterations step). >>>>>>> >>>>>>> 6. Compute Eigensolution of a small Hermitian >>>>>>> `\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\ >>>>>>> boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}\)`, >>>>>>> `\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left( >>>>>>> k+p\right)\times\left(k+p\right)}\)`. >>>>>>> >>>>>>> 7. Singular values `\(\mathbf{\boldsymbol{\Sigma} >>>>>>> }=\boldsymbol{\Lambda}^{0.5}\)`, >>>>>>> or, in other words, `\(s_{i}=\sqrt{\sigma_{i}}\)`. >>>>>>> >>>>>>> 8. If needed, compute `\(\mathbf{U}=\mathbf{Q}\hat{\ >>>>>>> mathbf{U}}\)`. >>>>>>> >>>>>>> 9. If needed, compute `\(\mathbf{V}=\mathbf{B}_{q}^{ >>>>>>> \top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}\)`. >>>>>>> Another way is `\(\mathbf{V}=\mathbf{A}^{\ >>>>>>> top}\mathbf{U}\boldsymbol{\ >>>>>>> Sigma}^{-1}\)`. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ## Implementation >>>>>>> >>>>>>> Mahout `dssvd(...)` is implemented in the mahout `math-scala` >>>>>>> algebraic >>>>>>> optimizer which translates Mahout's R-like linear algebra operators >>>>>>> into a >>>>>>> physical plan for both Spark and H2O distributed engines. >>>>>>> >>>>>>> def dssvd[K: ClassTag](drmA: DrmLike[K], k: Int, p: Int = 15, >>>>>>> q: Int >>>>>>> = 0): >>>>>>> (DrmLike[K], DrmLike[Int], Vector) = { >>>>>>> >>>>>>> val drmAcp = drmA.checkpoint() >>>>>>> >>>>>>> val m = drmAcp.nrow >>>>>>> val n = drmAcp.ncol >>>>>>> assert(k <= (m min n), "k cannot be greater than smaller of >>>>>>> m, >>>>>>> n.") >>>>>>> val pfxed = safeToNonNegInt((m min n) - k min p) >>>>>>> >>>>>>> // Actual decomposition rank >>>>>>> val r = k + pfxed >>>>>>> >>>>>>> // We represent Omega by its seed. >>>>>>> val omegaSeed = RandomUtils.getRandom().nextInt() >>>>>>> >>>>>>> // Compute Y = A*Omega. >>>>>>> var drmY = drmAcp.mapBlock(ncol = r) { >>>>>>> case (keys, blockA) => >>>>>>> val blockY = blockA %*% >>>>>>> Matrices.symmetricUniformView(n, >>>>>>> r, omegaSeed) >>>>>>> keys -> blockY >>>>>>> } >>>>>>> >>>>>>> var drmQ = dqrThin(drmY.checkpoint())._1 >>>>>>> >>>>>>> // Checkpoint Q if last iteration >>>>>>> if (q == 0) drmQ = drmQ.checkpoint() >>>>>>> >>>>>>> var drmBt = drmAcp.t %*% drmQ >>>>>>> >>>>>>> // Checkpoint B' if last iteration >>>>>>> if (q == 0) drmBt = drmBt.checkpoint() >>>>>>> >>>>>>> for (i <- 0 until q) { >>>>>>> drmY = drmAcp %*% drmBt >>>>>>> drmQ = dqrThin(drmY.checkpoint())._1 >>>>>>> >>>>>>> // Checkpoint Q if last iteration >>>>>>> if (i == q - 1) drmQ = drmQ.checkpoint() >>>>>>> >>>>>>> drmBt = drmAcp.t %*% drmQ >>>>>>> >>>>>>> // Checkpoint B' if last iteration >>>>>>> if (i == q - 1) drmBt = drmBt.checkpoint() >>>>>>> } >>>>>>> >>>>>>> val (inCoreUHat, d) = eigen(drmBt.t %*% drmBt) >>>>>>> val s = d.sqrt >>>>>>> >>>>>>> // Since neither drmU nor drmV are actually computed until >>>>>>> actually used >>>>>>> // we don't need the flags instructing compute (or not >>>>>>> compute) >>>>>>> either of the U,V outputs >>>>>>> val drmU = drmQ %*% inCoreUHat >>>>>>> val drmV = drmBt %*% (inCoreUHat %*%: diagv(1 /: s)) >>>>>>> >>>>>>> (drmU(::, 0 until k), drmV(::, 0 until k), s(0 until k)) >>>>>>> } >>>>>>> >>>>>>> Note: As a side effect of checkpointing, U and V values are returned >>>>>>> as >>>>>>> logical operators (i.e. they are neither checkpointed nor computed). >>>>>>> Therefore there is no physical work actually done to compute >>>>>>> `\(\mathbf{U}\)` or `\(\mathbf{V}\)` until they are used in a >>>>>>> subsequent >>>>>>> expression. >>>>>>> >>>>>>> >>>>>>> ## Usage >>>>>>> >>>>>>> The scala `dssvd(...)` method can easily be called in any Spark or >>>>>>> H2O >>>>>>> application built with the `math-scala` library and the corresponding >>>>>>> `Spark` or `H2O` engine module as follows: >>>>>>> >>>>>>> import org.apache.mahout.math._ >>>>>>> import org.decompsitions._ >>>>>>> >>>>>>> >>>>>>> val(drmU, drmV, s) = dssvd(drma, k = 40, q = 1) >>>>>>> >>>>>>> >>>>>>> ## References >>>>>>> >>>>>>> [1]: [Mahout Scala and Mahout Spark Bindings for Linear Algebra >>>>>>> Subroutines](http://mahout.apache.org/users/sparkbindings/ >>>>>>> ScalaSparkBindings.pdf) >>>>>>> >>>>>>> [2]: [Halko, Martinsson, Tropp](http://arxiv.org/abs/0909.4061) >>>>>>> >>>>>>> [3]: [Mahout Spark and Scala Bindings](http://mahout. >>>>>>> apache.org/users/ >>>>>>> sparkbindings/home.html) >>>>>>> >>>>>>> [4]: [Randomized methods for computing low-rank >>>>>>> approximations of matrices](http://amath. >>>>>>> colorado.edu/faculty/martinss/ >>>>>>> Pubs/2012_halko_dissertation.pdf) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>> >> >
