Re: svd( ) implementation

Imran Younus Fri, 28 Jul 2017 11:22:33 -0700

Just to clarify one thing. For QR based, method, you can assume that R
matrix is small enough to fit on driver memory and them perform SVD on the
driver. That means your actual matrix has to tall-skinny matrix.


imran

On Fri, Jul 28, 2017 at 11:15 AM, Imran Younus <imranyou...@gmail.com>
wrote:

> Janardhan,
>
> The papers you're referring may not be relevant. The first paper, as far
> as I can tell, is about updating an existing svd decomposition as new data
> comes in. The 3rd paper in this list is the one I used, but that method is
> not good.
>
> There is also a method that uses QR decomposition and then calculates SVD
> from R matrix. Please have a look at equation 1.3 in this paper:
>
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.115&rank=1
>
> I think this is worth trying out. The distributed QR is already
> implemented in SystemlML, so it may quick to try out.
>
> imran
>
>
>
> On Fri, Jul 28, 2017 at 10:10 AM, Janardhan Pulivarthi <
> janardhan.pulivar...@gmail.com> wrote:
>
>> Hi Nakul & all the committers,
>>
>> Till now I am half way through the literature. But, for now a couple of
>> things to mention, in SVD there are three stages
>>   1. Bidiagonal reduction step
>>   2. Computation of the singular values
>>   3. Computation of the singular vectors
>>
>> of these three, The* Bidiagonal reduction* step is very expensive, so is
>> our focus on this( when considering GPU, at times where handling with CPU
>> is infeasible).
>>
>> About literature:
>>
>>    - I took some time to go through " A Stable and Fast Algorithm for
>>    Updating the Singular Value Decomposition" by "Gu & Stanley", to
>> understand
>>    the numerical stability and round-off errors when we are partitioning
>> the
>>    matrix in this distributed algorithm. The author has assured that each
>>    component computed will be of high absolute accuracy. And also, the
>>    properties that the resultant matrix support do not have any conflicts
>> with
>>    parent matrix. [pdf
>>    <http://www.cs.yale.edu/publications/techreports/tr966.pdf>]
>>
>>
>>    - "High performance bidiagonal reduction using the tile algorithms on
>>    homogeneous multicore clusters ", by "Ltaief et. al", this paper has
>>    focused on the first stage mainly and has discussed a good about tile
>>    algorithms and their runtime implementations.(although off-topic, I
>> read
>>    this just to understand.) [pdf
>>    <http://www.netlib.org/lapack/lawnspdf/lawn247.pdf>]
>>
>>
>>    -  "A distributed and incremental svd algorithm for agglomerative data
>>    analysis on large networks", by "Iwen & Ong", *Please go through* the
>>    (a). TABLE 1, TABLE 2 . (b). APPENDIX A. RAW DATA FROM NUMERICAL
>>    EXPERIMENTS. [pdf <https://arxiv.org/pdf/1601.07010.pdf>]
>>
>> Thanks,
>>
>> Janardhan
>>
>> On Wed, Jul 26, 2017 at 12:29 AM, Nakul Jindal <naku...@gmail.com> wrote:
>>
>> > Hi Janardhan,
>> >
>> > The images you've used as attachments haven't reached my inbox.
>> > Could you please send them to me directly, rather than through the dev
>> > mailing list.
>> > (Or upload it to a image hosting site like imgur and paste the links in
>> the
>> > email)
>> >
>> > I would like to point out that my knowledge of machine learning is
>> limited.
>> > Still, how would you want to test the algorithm?
>> >
>> >
>> > Sparse matrices in SystemML (in Spark Execution Mode)
>> > Sparse matrix support in SystemML is implemented at a block level. Each
>> > (potentially giant) matrix is stored as blocks (in Spark execution
>> mode).
>> > The matrix itself doesn't store knowledge of whether it is sparse or
>> not.
>> > Each block does. Each block of this giant matrix can be stored in dense
>> or
>> > spare format, depending on how many non-zeroes are in that block.
>> > The sparsity of a block is recomputed every so often, and the block
>> > representation can be converted from dense to sparse and vice-versa.
>> > When operations are performed on blocks, internally, a sparse aware
>> > operation maybe performed, depending on how the blocks are stored.
>> >
>> > (One of the other contributors can clarify, if I've missed something or
>> > have said something wrong).
>> >
>> > Given this context, can you please think about whether missing sparse
>> > matrix support is still a problem?
>> >
>> >
>> > -Nakul
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jul 25, 2017 at 11:14 AM, Janardhan Pulivarthi <
>> > janardhan.pulivar...@gmail.com> wrote:
>> >
>> > > Hi Nakul,
>> > >
>> > > Thanks for explaining me about pros and cons of the two approaches.
>> For
>> > > now, I have gone through the paper carefully over a couple of days and
>> > > found the following interesting things.
>> > >
>> > > 1. This is the algorithm we implemented.
>> > >
>> > > 
>> > > 2. In the above algorithm the input matrix A is approximated to
>> another
>> > > matrix B with the following relation with the error of chi(p, i) [ as
>> > shown
>> > > in (3) ] which the author argues will be in an acceptable limit. So,
>> we
>> > can
>> > > go with this algorithm.
>> > >
>> > >
>> > > But, one bad thing is that author is not sure about whether the
>> algorithm
>> > > supports the sparse matrices or not. So, we may need to test it here.
>> > >
>> > > For the time being we need to test the present algorithm implemented
>> by
>> > > Imran Younus again. Can you help me with the testing?
>> > >
>> > > Thanks,
>> > > Janardhan
>> > > 
>> > >
>> > > On Fri, Jul 21, 2017 at 6:07 AM, Nakul Jindal <naku...@gmail.com>
>> wrote:
>> > >
>> > >> Hi Janardhan,
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> How will GPU implementation help scale distributed SVD:
>> > >>
>> > >> Imran implemented an algorithm he found out about in the paper "A
>> > >> Distributed and Incremental SVD Algorithm for Agglomerative Data
>> > Analysis
>> > >> on Large Networks" (
>> > >> https://github.com/apache/systemml/pull/273/files#diff-488f0
>> > >> 6e290f7a54db2e125f7bc608971R27
>> > >> ).
>> > >> The idea there was to build up a distributed SVD using invocations of
>> > svd
>> > >> on your local machine. He tried to achieve the multi-level
>> parallelism
>> > >> through the parfor construct.
>> > >> Each local invocation of svd was done using the Apache Commons Math
>> > >> library.
>> > >> If each invocation of this local svd can instead be done on a GPU,
>> the
>> > >> overall wall time for the distributed version would be decreased.
>> > >>
>> > >> Users may not always have a GPU. In that case, the svd falls back to
>> the
>> > >> Apache Comons Math implementation. But if they do and if we have a
>> "svd"
>> > >> builtin function, then it would be easier to take advantage of the
>> GPU.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> Problem with scalable svd in dml is due to spark backed issues,
>> > otherwise
>> > >> there is not problem scaling w/o a local svd():
>> > >>
>> > >> There maybe spark backend issues and more may come to light and more
>> > >> workloads are executed on SystemML.
>> > >> For any given operation - we can implement it as a DML bodied
>> function
>> > or
>> > >> a
>> > >> builtin function.
>> > >> For DML Bodied functions:
>> > >> Pros:
>> > >> - The SystemML optimizer can be applied to it
>> > >> - Distribution of SVD is then taken care of by SystemML. It will
>> > generate
>> > >> and run the spark primitives needed.
>> > >>
>> > >> Cons:
>> > >> - Implementing SVD, whether in DML or C, is a fair amount of work
>> > >> - There would not be a straightforward call to the svd gpu library.
>> In
>> > >> fact, each of the linear algebra primitives would be accelerated on
>> the
>> > >> GPU, but not the entire operation itself. This would involve many
>> more
>> > JNI
>> > >> calls.
>> > >>
>> > >> For builtin functions:
>> > >> Pros:
>> > >> - Use of GPU libraries (cuSolver) and CPU libraries (Apache Commons
>> > Math)
>> > >> can be made, these are already optimized (in case of the GPU)
>> > >> - If a better SVD implementation is available via a library, that can
>> > >> easily be plugged in.
>> > >>
>> > >> Cons:
>> > >> - Would have to come up with an algorithm to implement distributed
>> SVD
>> > >> with
>> > >> spark primitives
>> > >>
>> > >> Pick your battle.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> Maybe we could try another algorithm for scalable svd() :
>> > >>
>> > >> Sure. But before you do that, it may be worth our while to understand
>> > what
>> > >> is exactly misleading about the paper that Imran talks about.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> -Nakul
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Thu, Jul 20, 2017 at 4:02 PM, Janardhan Pulivarthi <
>> > >> janardhan.pulivar...@gmail.com> wrote:
>> > >>
>> > >> > Hi Nakul,
>> > >> >
>> > >> > Can you help me understand how gpu implementations help scale it.
>> > >> Whether
>> > >> > the user always have GPUs in use when using this fn or is it an
>> > optional
>> > >> > feature.
>> > >>
>> > >>
>> > >> > The problem with implementing the scalable svd() in dml is due to
>> the
>> > >> spark
>> > >> > backend issues, otherwise there is no problem scaling w/o a local
>> > svd()
>> > >> > function.
>> > >> >
>> > >> > May be we could try another algorithm for the scalable svd( ), if
>> the
>> > >> > present algorithm is misleading as Imran Younus pointed out.
>> > >> >
>> > >> > Thank you,
>> > >> > Janardhan
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: svd( ) implementation

Reply via email to