RE: math-scala dssvd docs

Andrew Palumbo Fri, 27 Mar 2015 16:27:42 -0700

I think it makes sense in case anybody just tries straight from there.

So just


import drm._




Sent from my Verizon Wireless 4G LTE smartphone

<div>-------- Original message --------</div><div>From: Dmitriy Lyubimov 
<[email protected]> </div><div>Date:03/27/2015  7:23 PM  (GMT-05:00) 
</div><div>To: [email protected] </div><div>Subject: Re: math-scala dssvd 
docs </div><div>
</div>
We probably do need drms since we are passing in DRM types, right?

On Fri, Mar 27, 2015 at 4:16 PM, Andrew Palumbo <[email protected]> wrote:

> oops- you're right.. i'll fix that.
>
> we don't need
>
> importorg.apache.mahout.math.drm._
>
> there right?
>
>
>
>
> On 03/27/2015 07:09 PM, Dmitriy Lyubimov wrote:
>
>> i think there's a typo in package name under "usage". It should be
>> o.a.m.math.decompositions i think
>>
>> import org.decompsitions._
>>
>>
>> On Fri, Mar 27, 2015 at 4:07 PM, Andrew Palumbo <[email protected]>
>> wrote:
>>
>>  I'm going to put a link under "algorithms"  to the algorithm grid, and
>>> leave the actual content in the same place.
>>>
>>>
>>> On 03/27/2015 06:58 PM, Andrew Palumbo wrote:
>>>
>>>  On 03/27/2015 06:46 PM, Dmitriy Lyubimov wrote:
>>>>
>>>>  Note also that all these related beasts come in pairs (in-core input
>>>>> <->
>>>>> distributed input):
>>>>>
>>>>> ssvd - dssvd
>>>>> spca - dspca
>>>>>
>>>>>  yeah I've been thinking that i'd give a less detailed description of
>>>> them
>>>> (the in core algos) in an overview page (which would include the basics
>>>> and
>>>> operators, etc.).  Probably makes sense to reference them here as well.
>>>> I'd like to get most of the manual easily viewable on different pages.
>>>>
>>>>   Yes. Except it doesn't follow same parallel reordered Givens QR but
>>>> uses
>>>> Cholesky QR (which we call "thin QR") as an easy-to-implement shortcut.
>>>> But
>>>> this page makes no mention of QR specifics i think
>>>>
>>>>
>>>> right.. no QR specifics, just the dqrThin(...) call in the code. I'm
>>>> going to put the link directly below the Cholesky QR link so that will
>>>> tie
>>>> together well.
>>>>
>>>>
>>>>
>>>>  On Fri, Mar 27, 2015 at 3:45 PM, Dmitriy Lyubimov <[email protected]>
>>>>> wrote:
>>>>>
>>>>>   But MR version of SSVD is more stable because of the QR differences.
>>>>>
>>>>>> On Fri, Mar 27, 2015 at 3:44 PM, Dmitriy Lyubimov <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>   Yes. Except it doesn't follow same parallel reordered Givens QR but
>>>>>>
>>>>>>> uses
>>>>>>> Cholesky QR (which we call "thin QR") as an easy-to-implement
>>>>>>> shortcut. But
>>>>>>> this page makes no mention of QR specifics i think
>>>>>>>
>>>>>>> On Fri, Mar 27, 2015 at 12:57 PM, Andrew Palumbo <[email protected]
>>>>>>> >
>>>>>>> wrote:
>>>>>>>
>>>>>>>   math-scala dssvd follows the same algorithm as MR ssvd correct?
>>>>>>>
>>>>>>>> Looking
>>>>>>>> at the code against the algorithm outlined at the bottom of
>>>>>>>> http://mahout.apache.org/users/dim-reduction/ssvd.html it seems the
>>>>>>>> same, but I wanted to make I'm not missing anything before I put the
>>>>>>>> following doc up (with the algorithm taken from the MR ssvd page):
>>>>>>>>
>>>>>>>> # Distributed Stochastic Singular Value Decomposition
>>>>>>>>
>>>>>>>>
>>>>>>>> ## Intro
>>>>>>>>
>>>>>>>> Mahout has a distributed implementation of Stochastic Singular Value
>>>>>>>> Decomposition [1].
>>>>>>>>
>>>>>>>> ## Modified SSVD Algorithm
>>>>>>>>
>>>>>>>> Given an `\(m\times n\)`
>>>>>>>> matrix `\(\mathbf{A}\)`, a target rank `\(k\in\mathbb{N}_{1}\)`
>>>>>>>> , an oversampling parameter `\(p\in\mathbb{N}_{1}\)`,
>>>>>>>> and the number of additional power iterations
>>>>>>>> `\(q\in\mathbb{N}_{0}\)`,
>>>>>>>> this procedure computes an `\(m\times\left(k+p\right)\)`
>>>>>>>> SVD `\(\mathbf{A\approx U}\boldsymbol{\Sigma}\mathbf{V}^{\top}\)`:
>>>>>>>>
>>>>>>>>     1. Create seed for random `\(n\times\left(k+p\right)\)`
>>>>>>>>     matrix `\(\boldsymbol{\Omega}\)`. The seed defines matrix
>>>>>>>> `\(\mathbf{\Omega}\)`
>>>>>>>>     using Gaussian unit vectors per one of suggestions in [Halko,
>>>>>>>> Martinsson, Tropp].
>>>>>>>>
>>>>>>>>     2. `\(\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\
>>>>>>>> mathbb{R}^{m\times\left(k+p\right)}\)`
>>>>>>>>
>>>>>>>>     3. Column-orthonormalize `\(\mathbf{Y}\rightarrow\mathbf{Q}\)`
>>>>>>>>     by computing thin decomposition `\(\mathbf{Y}=\mathbf{Q}\
>>>>>>>> mathbf{R}\)`.
>>>>>>>>     Also, `\(\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\
>>>>>>>> mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)`;
>>>>>>>> denoted as `\(\mathbf{Q}=\mbox{qr}\left(\
>>>>>>>> mathbf{Y}\right).\mathbf{Q}\)`
>>>>>>>>
>>>>>>>>     4. `\(\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B}
>>>>>>>> \in\mathbb{R}^{\left(k+p\right)\times n}\)`.
>>>>>>>>
>>>>>>>>     5. If `\(q>0\)`
>>>>>>>>     repeat: for `\(i=1..q\)`:
>>>>>>>> `\(\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\
>>>>>>>> left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}\)`
>>>>>>>>     (power iterations step).
>>>>>>>>
>>>>>>>>     6. Compute Eigensolution of a small Hermitian
>>>>>>>> `\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\
>>>>>>>> boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}\)`,
>>>>>>>> `\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left(
>>>>>>>> k+p\right)\times\left(k+p\right)}\)`.
>>>>>>>>
>>>>>>>>     7. Singular values `\(\mathbf{\boldsymbol{\Sigma}
>>>>>>>> }=\boldsymbol{\Lambda}^{0.5}\)`,
>>>>>>>>     or, in other words, `\(s_{i}=\sqrt{\sigma_{i}}\)`.
>>>>>>>>
>>>>>>>>     8. If needed, compute `\(\mathbf{U}=\mathbf{Q}\hat{\
>>>>>>>> mathbf{U}}\)`.
>>>>>>>>
>>>>>>>>     9. If needed, compute `\(\mathbf{V}=\mathbf{B}_{q}^{
>>>>>>>> \top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}\)`.
>>>>>>>> Another way is `\(\mathbf{V}=\mathbf{A}^{\
>>>>>>>> top}\mathbf{U}\boldsymbol{\
>>>>>>>> Sigma}^{-1}\)`.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ## Implementation
>>>>>>>>
>>>>>>>> Mahout `dssvd(...)` is implemented in the mahout `math-scala`
>>>>>>>> algebraic
>>>>>>>> optimizer which translates Mahout's R-like linear algebra operators
>>>>>>>> into a
>>>>>>>> physical plan for both Spark and H2O distributed engines.
>>>>>>>>
>>>>>>>>       def dssvd[K: ClassTag](drmA: DrmLike[K], k: Int, p: Int = 15,
>>>>>>>> q:
>>>>>>>> Int
>>>>>>>> = 0):
>>>>>>>>           (DrmLike[K], DrmLike[Int], Vector) = {
>>>>>>>>
>>>>>>>>           val drmAcp = drmA.checkpoint()
>>>>>>>>
>>>>>>>>           val m = drmAcp.nrow
>>>>>>>>           val n = drmAcp.ncol
>>>>>>>>           assert(k <= (m min n), "k cannot be greater than smaller
>>>>>>>> of
>>>>>>>> m,
>>>>>>>> n.")
>>>>>>>>           val pfxed = safeToNonNegInt((m min n) - k min p)
>>>>>>>>
>>>>>>>>           // Actual decomposition rank
>>>>>>>>           val r = k + pfxed
>>>>>>>>
>>>>>>>>           // We represent Omega by its seed.
>>>>>>>>           val omegaSeed = RandomUtils.getRandom().nextInt()
>>>>>>>>
>>>>>>>>           // Compute Y = A*Omega.
>>>>>>>>           var drmY = drmAcp.mapBlock(ncol = r) {
>>>>>>>>               case (keys, blockA) =>
>>>>>>>>                   val blockY = blockA %*%
>>>>>>>> Matrices.symmetricUniformView(n,
>>>>>>>> r, omegaSeed)
>>>>>>>>               keys -> blockY
>>>>>>>>           }
>>>>>>>>
>>>>>>>>           var drmQ = dqrThin(drmY.checkpoint())._1
>>>>>>>>
>>>>>>>>           // Checkpoint Q if last iteration
>>>>>>>>           if (q == 0) drmQ = drmQ.checkpoint()
>>>>>>>>
>>>>>>>>           var drmBt = drmAcp.t %*% drmQ
>>>>>>>>
>>>>>>>>           // Checkpoint B' if last iteration
>>>>>>>>           if (q == 0) drmBt = drmBt.checkpoint()
>>>>>>>>
>>>>>>>>           for (i <- 0  until q) {
>>>>>>>>               drmY = drmAcp %*% drmBt
>>>>>>>>               drmQ = dqrThin(drmY.checkpoint())._1
>>>>>>>>
>>>>>>>>               // Checkpoint Q if last iteration
>>>>>>>>               if (i == q - 1) drmQ = drmQ.checkpoint()
>>>>>>>>
>>>>>>>>               drmBt = drmAcp.t %*% drmQ
>>>>>>>>
>>>>>>>>               // Checkpoint B' if last iteration
>>>>>>>>               if (i == q - 1) drmBt = drmBt.checkpoint()
>>>>>>>>           }
>>>>>>>>
>>>>>>>>           val (inCoreUHat, d) = eigen(drmBt.t %*% drmBt)
>>>>>>>>           val s = d.sqrt
>>>>>>>>
>>>>>>>>           // Since neither drmU nor drmV are actually computed until
>>>>>>>> actually used
>>>>>>>>           // we don't need the flags instructing compute (or not
>>>>>>>> compute)
>>>>>>>> either of the U,V outputs
>>>>>>>>           val drmU = drmQ %*% inCoreUHat
>>>>>>>>           val drmV = drmBt %*% (inCoreUHat %*%: diagv(1 /: s))
>>>>>>>>
>>>>>>>>           (drmU(::, 0 until k), drmV(::, 0 until k), s(0 until k))
>>>>>>>>       }
>>>>>>>>
>>>>>>>> Note: As a side effect of checkpointing, U and V values are returned
>>>>>>>> as
>>>>>>>> logical operators (i.e. they are neither checkpointed nor computed).
>>>>>>>> Therefore there is no physical work actually done to compute
>>>>>>>> `\(\mathbf{U}\)` or `\(\mathbf{V}\)` until they are used in a
>>>>>>>> subsequent
>>>>>>>> expression.
>>>>>>>>
>>>>>>>>
>>>>>>>> ## Usage
>>>>>>>>
>>>>>>>> The scala `dssvd(...)` method can easily be called in any Spark or
>>>>>>>> H2O
>>>>>>>> application built with the `math-scala` library and the
>>>>>>>> corresponding
>>>>>>>> `Spark` or `H2O` engine module as follows:
>>>>>>>>
>>>>>>>>       import org.apache.mahout.math._
>>>>>>>>       import org.decompsitions._
>>>>>>>>
>>>>>>>>
>>>>>>>>       val(drmU, drmV, s) = dssvd(drma, k = 40, q = 1)
>>>>>>>>
>>>>>>>>
>>>>>>>> ## References
>>>>>>>>
>>>>>>>> [1]: [Mahout Scala and Mahout Spark Bindings for Linear Algebra
>>>>>>>> Subroutines](http://mahout.apache.org/users/sparkbindings/
>>>>>>>> ScalaSparkBindings.pdf)
>>>>>>>>
>>>>>>>> [2]: [Halko, Martinsson, Tropp](http://arxiv.org/abs/0909.4061)
>>>>>>>>
>>>>>>>> [3]: [Mahout Spark and Scala Bindings](http://mahout.
>>>>>>>> apache.org/users/
>>>>>>>> sparkbindings/home.html)
>>>>>>>>
>>>>>>>> [4]: [Randomized methods for computing low-rank
>>>>>>>> approximations of matrices](http://amath.
>>>>>>>> colorado.edu/faculty/martinss/
>>>>>>>> Pubs/2012_halko_dissertation.pdf)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>

RE: math-scala dssvd docs

Reply via email to