Re: svd( ) implementation

Imran Younus Sat, 29 Jul 2017 13:53:04 -0700

Hi Janardhan,

By tall-skinny matrix I mean that the number of columns should be
reasonably small (doesn't matter how many rows are there) . When you do QR
decomposition, R would a square matrix (upper triangular) of dimensions
equal to the number of columns of matrix A. But then R has to fit on
driver's memory, and thats why number of columns of A cannot be be very
large. (10k x 10k matrix will close to 1 GB). Once you've R, a local SVD
implementation will be needed to to compute SVD of R. Now, this is not a
very general method, but I think this is good enough for most of the cases.


imran

On Fri, Jul 28, 2017 at 10:37 PM, Janardhan Pulivarthi <
janardhan.pulivar...@gmail.com> wrote:

> Thanks, Imran. As per the paper, at first we perform QR decomposition of
> input matrix (A), from which we obtain R. And then, we compute the svd(R)
> using the builtin local function (??). I'll try this.
>
> Tall-skinny matrix: so, do we have problem with square matrices?. or do we
> have to partition the matrix into tall-skinny matrices if we have a square
> one?.
>
> Thanks,
>
> Janardhan
>
> On Fri, Jul 28, 2017 at 11:52 PM, Imran Younus <imranyou...@gmail.com>
> wrote:
>
> > Just to clarify one thing. For QR based, method, you can assume that R
> > matrix is small enough to fit on driver memory and them perform SVD on
> the
> > driver. That means your actual matrix has to tall-skinny matrix.
> >
> > imran
> >
> > On Fri, Jul 28, 2017 at 11:15 AM, Imran Younus <imranyou...@gmail.com>
> > wrote:
> >
> > > Janardhan,
> > >
> > > The papers you're referring may not be relevant. The first paper, as
> far
> > > as I can tell, is about updating an existing svd decomposition as new
> > data
> > > comes in. The 3rd paper in this list is the one I used, but that method
> > is
> > > not good.
> > >
> > > There is also a method that uses QR decomposition and then calculates
> SVD
> > > from R matrix. Please have a look at equation 1.3 in this paper:
> > >
> > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.115&rank=1
> > >
> > > I think this is worth trying out. The distributed QR is already
> > > implemented in SystemlML, so it may quick to try out.
> > >
> > > imran
> > >
> > >
> > >
> > > On Fri, Jul 28, 2017 at 10:10 AM, Janardhan Pulivarthi <
> > > janardhan.pulivar...@gmail.com> wrote:
> > >
> > >> Hi Nakul & all the committers,
> > >>
> > >> Till now I am half way through the literature. But, for now a couple
> of
> > >> things to mention, in SVD there are three stages
> > >>   1. Bidiagonal reduction step
> > >>   2. Computation of the singular values
> > >>   3. Computation of the singular vectors
> > >>
> > >> of these three, The* Bidiagonal reduction* step is very expensive, so
> is
> > >> our focus on this( when considering GPU, at times where handling with
> > CPU
> > >> is infeasible).
> > >>
> > >> About literature:
> > >>
> > >>    - I took some time to go through " A Stable and Fast Algorithm for
> > >>    Updating the Singular Value Decomposition" by "Gu & Stanley", to
> > >> understand
> > >>    the numerical stability and round-off errors when we are
> partitioning
> > >> the
> > >>    matrix in this distributed algorithm. The author has assured that
> > each
> > >>    component computed will be of high absolute accuracy. And also, the
> > >>    properties that the resultant matrix support do not have any
> > conflicts
> > >> with
> > >>    parent matrix. [pdf
> > >>    <http://www.cs.yale.edu/publications/techreports/tr966.pdf>]
> > >>
> > >>
> > >>    - "High performance bidiagonal reduction using the tile algorithms
> on
> > >>    homogeneous multicore clusters ", by "Ltaief et. al", this paper
> has
> > >>    focused on the first stage mainly and has discussed a good about
> tile
> > >>    algorithms and their runtime implementations.(although off-topic, I
> > >> read
> > >>    this just to understand.) [pdf
> > >>    <http://www.netlib.org/lapack/lawnspdf/lawn247.pdf>]
> > >>
> > >>
> > >>    -  "A distributed and incremental svd algorithm for agglomerative
> > data
> > >>    analysis on large networks", by "Iwen & Ong", *Please go through*
> the
> > >>    (a). TABLE 1, TABLE 2 . (b). APPENDIX A. RAW DATA FROM NUMERICAL
> > >>    EXPERIMENTS. [pdf <https://arxiv.org/pdf/1601.07010.pdf>]
> > >>
> > >> Thanks,
> > >>
> > >> Janardhan
> > >>
> > >> On Wed, Jul 26, 2017 at 12:29 AM, Nakul Jindal <naku...@gmail.com>
> > wrote:
> > >>
> > >> > Hi Janardhan,
> > >> >
> > >> > The images you've used as attachments haven't reached my inbox.
> > >> > Could you please send them to me directly, rather than through the
> dev
> > >> > mailing list.
> > >> > (Or upload it to a image hosting site like imgur and paste the links
> > in
> > >> the
> > >> > email)
> > >> >
> > >> > I would like to point out that my knowledge of machine learning is
> > >> limited.
> > >> > Still, how would you want to test the algorithm?
> > >> >
> > >> >
> > >> > Sparse matrices in SystemML (in Spark Execution Mode)
> > >> > Sparse matrix support in SystemML is implemented at a block level.
> > Each
> > >> > (potentially giant) matrix is stored as blocks (in Spark execution
> > >> mode).
> > >> > The matrix itself doesn't store knowledge of whether it is sparse or
> > >> not.
> > >> > Each block does. Each block of this giant matrix can be stored in
> > dense
> > >> or
> > >> > spare format, depending on how many non-zeroes are in that block.
> > >> > The sparsity of a block is recomputed every so often, and the block
> > >> > representation can be converted from dense to sparse and vice-versa.
> > >> > When operations are performed on blocks, internally, a sparse aware
> > >> > operation maybe performed, depending on how the blocks are stored.
> > >> >
> > >> > (One of the other contributors can clarify, if I've missed something
> > or
> > >> > have said something wrong).
> > >> >
> > >> > Given this context, can you please think about whether missing
> sparse
> > >> > matrix support is still a problem?
> > >> >
> > >> >
> > >> > -Nakul
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Jul 25, 2017 at 11:14 AM, Janardhan Pulivarthi <
> > >> > janardhan.pulivar...@gmail.com> wrote:
> > >> >
> > >> > > Hi Nakul,
> > >> > >
> > >> > > Thanks for explaining me about pros and cons of the two
> approaches.
> > >> For
> > >> > > now, I have gone through the paper carefully over a couple of days
> > and
> > >> > > found the following interesting things.
> > >> > >
> > >> > > 1. This is the algorithm we implemented.
> > >> > >
> > >> > > 
> > >> > > 2. In the above algorithm the input matrix A is approximated to
> > >> another
> > >> > > matrix B with the following relation with the error of chi(p, i) [
> > as
> > >> > shown
> > >> > > in (3) ] which the author argues will be in an acceptable limit.
> So,
> > >> we
> > >> > can
> > >> > > go with this algorithm.
> > >> > >
> > >> > >
> > >> > > But, one bad thing is that author is not sure about whether the
> > >> algorithm
> > >> > > supports the sparse matrices or not. So, we may need to test it
> > here.
> > >> > >
> > >> > > For the time being we need to test the present algorithm
> implemented
> > >> by
> > >> > > Imran Younus again. Can you help me with the testing?
> > >> > >
> > >> > > Thanks,
> > >> > > Janardhan
> > >> > > 
> > >> > >
> > >> > > On Fri, Jul 21, 2017 at 6:07 AM, Nakul Jindal <naku...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > >> Hi Janardhan,
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> How will GPU implementation help scale distributed SVD:
> > >> > >>
> > >> > >> Imran implemented an algorithm he found out about in the paper "A
> > >> > >> Distributed and Incremental SVD Algorithm for Agglomerative Data
> > >> > Analysis
> > >> > >> on Large Networks" (
> > >> > >> https://github.com/apache/systemml/pull/273/files#diff-488f0
> > >> > >> 6e290f7a54db2e125f7bc608971R27
> > >> > >> ).
> > >> > >> The idea there was to build up a distributed SVD using
> invocations
> > of
> > >> > svd
> > >> > >> on your local machine. He tried to achieve the multi-level
> > >> parallelism
> > >> > >> through the parfor construct.
> > >> > >> Each local invocation of svd was done using the Apache Commons
> Math
> > >> > >> library.
> > >> > >> If each invocation of this local svd can instead be done on a
> GPU,
> > >> the
> > >> > >> overall wall time for the distributed version would be decreased.
> > >> > >>
> > >> > >> Users may not always have a GPU. In that case, the svd falls back
> > to
> > >> the
> > >> > >> Apache Comons Math implementation. But if they do and if we have
> a
> > >> "svd"
> > >> > >> builtin function, then it would be easier to take advantage of
> the
> > >> GPU.
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> Problem with scalable svd in dml is due to spark backed issues,
> > >> > otherwise
> > >> > >> there is not problem scaling w/o a local svd():
> > >> > >>
> > >> > >> There maybe spark backend issues and more may come to light and
> > more
> > >> > >> workloads are executed on SystemML.
> > >> > >> For any given operation - we can implement it as a DML bodied
> > >> function
> > >> > or
> > >> > >> a
> > >> > >> builtin function.
> > >> > >> For DML Bodied functions:
> > >> > >> Pros:
> > >> > >> - The SystemML optimizer can be applied to it
> > >> > >> - Distribution of SVD is then taken care of by SystemML. It will
> > >> > generate
> > >> > >> and run the spark primitives needed.
> > >> > >>
> > >> > >> Cons:
> > >> > >> - Implementing SVD, whether in DML or C, is a fair amount of work
> > >> > >> - There would not be a straightforward call to the svd gpu
> library.
> > >> In
> > >> > >> fact, each of the linear algebra primitives would be accelerated
> on
> > >> the
> > >> > >> GPU, but not the entire operation itself. This would involve many
> > >> more
> > >> > JNI
> > >> > >> calls.
> > >> > >>
> > >> > >> For builtin functions:
> > >> > >> Pros:
> > >> > >> - Use of GPU libraries (cuSolver) and CPU libraries (Apache
> Commons
> > >> > Math)
> > >> > >> can be made, these are already optimized (in case of the GPU)
> > >> > >> - If a better SVD implementation is available via a library, that
> > can
> > >> > >> easily be plugged in.
> > >> > >>
> > >> > >> Cons:
> > >> > >> - Would have to come up with an algorithm to implement
> distributed
> > >> SVD
> > >> > >> with
> > >> > >> spark primitives
> > >> > >>
> > >> > >> Pick your battle.
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> Maybe we could try another algorithm for scalable svd() :
> > >> > >>
> > >> > >> Sure. But before you do that, it may be worth our while to
> > understand
> > >> > what
> > >> > >> is exactly misleading about the paper that Imran talks about.
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> -Nakul
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Jul 20, 2017 at 4:02 PM, Janardhan Pulivarthi <
> > >> > >> janardhan.pulivar...@gmail.com> wrote:
> > >> > >>
> > >> > >> > Hi Nakul,
> > >> > >> >
> > >> > >> > Can you help me understand how gpu implementations help scale
> it.
> > >> > >> Whether
> > >> > >> > the user always have GPUs in use when using this fn or is it an
> > >> > optional
> > >> > >> > feature.
> > >> > >>
> > >> > >>
> > >> > >> > The problem with implementing the scalable svd() in dml is due
> to
> > >> the
> > >> > >> spark
> > >> > >> > backend issues, otherwise there is no problem scaling w/o a
> local
> > >> > svd()
> > >> > >> > function.
> > >> > >> >
> > >> > >> > May be we could try another algorithm for the scalable svd( ),
> if
> > >> the
> > >> > >> > present algorithm is misleading as Imran Younus pointed out.
> > >> > >> >
> > >> > >> > Thank you,
> > >> > >> > Janardhan
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: svd( ) implementation

Reply via email to