Janardhan, The papers you're referring may not be relevant. The first paper, as far as I can tell, is about updating an existing svd decomposition as new data comes in. The 3rd paper in this list is the one I used, but that method is not good.
There is also a method that uses QR decomposition and then calculates SVD from R matrix. Please have a look at equation 1.3 in this paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.115&rank=1 I think this is worth trying out. The distributed QR is already implemented in SystemlML, so it may quick to try out. imran On Fri, Jul 28, 2017 at 10:10 AM, Janardhan Pulivarthi < janardhan.pulivar...@gmail.com> wrote: > Hi Nakul & all the committers, > > Till now I am half way through the literature. But, for now a couple of > things to mention, in SVD there are three stages > 1. Bidiagonal reduction step > 2. Computation of the singular values > 3. Computation of the singular vectors > > of these three, The* Bidiagonal reduction* step is very expensive, so is > our focus on this( when considering GPU, at times where handling with CPU > is infeasible). > > About literature: > > - I took some time to go through " A Stable and Fast Algorithm for > Updating the Singular Value Decomposition" by "Gu & Stanley", to > understand > the numerical stability and round-off errors when we are partitioning > the > matrix in this distributed algorithm. The author has assured that each > component computed will be of high absolute accuracy. And also, the > properties that the resultant matrix support do not have any conflicts > with > parent matrix. [pdf > <http://www.cs.yale.edu/publications/techreports/tr966.pdf>] > > > - "High performance bidiagonal reduction using the tile algorithms on > homogeneous multicore clusters ", by "Ltaief et. al", this paper has > focused on the first stage mainly and has discussed a good about tile > algorithms and their runtime implementations.(although off-topic, I read > this just to understand.) [pdf > <http://www.netlib.org/lapack/lawnspdf/lawn247.pdf>] > > > - "A distributed and incremental svd algorithm for agglomerative data > analysis on large networks", by "Iwen & Ong", *Please go through* the > (a). TABLE 1, TABLE 2 . (b). APPENDIX A. RAW DATA FROM NUMERICAL > EXPERIMENTS. [pdf <https://arxiv.org/pdf/1601.07010.pdf>] > > Thanks, > > Janardhan > > On Wed, Jul 26, 2017 at 12:29 AM, Nakul Jindal <naku...@gmail.com> wrote: > > > Hi Janardhan, > > > > The images you've used as attachments haven't reached my inbox. > > Could you please send them to me directly, rather than through the dev > > mailing list. > > (Or upload it to a image hosting site like imgur and paste the links in > the > > email) > > > > I would like to point out that my knowledge of machine learning is > limited. > > Still, how would you want to test the algorithm? > > > > > > Sparse matrices in SystemML (in Spark Execution Mode) > > Sparse matrix support in SystemML is implemented at a block level. Each > > (potentially giant) matrix is stored as blocks (in Spark execution mode). > > The matrix itself doesn't store knowledge of whether it is sparse or not. > > Each block does. Each block of this giant matrix can be stored in dense > or > > spare format, depending on how many non-zeroes are in that block. > > The sparsity of a block is recomputed every so often, and the block > > representation can be converted from dense to sparse and vice-versa. > > When operations are performed on blocks, internally, a sparse aware > > operation maybe performed, depending on how the blocks are stored. > > > > (One of the other contributors can clarify, if I've missed something or > > have said something wrong). > > > > Given this context, can you please think about whether missing sparse > > matrix support is still a problem? > > > > > > -Nakul > > > > > > > > > > > > > > > > On Tue, Jul 25, 2017 at 11:14 AM, Janardhan Pulivarthi < > > janardhan.pulivar...@gmail.com> wrote: > > > > > Hi Nakul, > > > > > > Thanks for explaining me about pros and cons of the two approaches. For > > > now, I have gone through the paper carefully over a couple of days and > > > found the following interesting things. > > > > > > 1. This is the algorithm we implemented. > > > > > > > > > 2. In the above algorithm the input matrix A is approximated to another > > > matrix B with the following relation with the error of chi(p, i) [ as > > shown > > > in (3) ] which the author argues will be in an acceptable limit. So, we > > can > > > go with this algorithm. > > > > > > > > > But, one bad thing is that author is not sure about whether the > algorithm > > > supports the sparse matrices or not. So, we may need to test it here. > > > > > > For the time being we need to test the present algorithm implemented by > > > Imran Younus again. Can you help me with the testing? > > > > > > Thanks, > > > Janardhan > > > > > > > > > On Fri, Jul 21, 2017 at 6:07 AM, Nakul Jindal <naku...@gmail.com> > wrote: > > > > > >> Hi Janardhan, > > >> > > >> > > >> > > >> > > >> How will GPU implementation help scale distributed SVD: > > >> > > >> Imran implemented an algorithm he found out about in the paper "A > > >> Distributed and Incremental SVD Algorithm for Agglomerative Data > > Analysis > > >> on Large Networks" ( > > >> https://github.com/apache/systemml/pull/273/files#diff-488f0 > > >> 6e290f7a54db2e125f7bc608971R27 > > >> ). > > >> The idea there was to build up a distributed SVD using invocations of > > svd > > >> on your local machine. He tried to achieve the multi-level parallelism > > >> through the parfor construct. > > >> Each local invocation of svd was done using the Apache Commons Math > > >> library. > > >> If each invocation of this local svd can instead be done on a GPU, the > > >> overall wall time for the distributed version would be decreased. > > >> > > >> Users may not always have a GPU. In that case, the svd falls back to > the > > >> Apache Comons Math implementation. But if they do and if we have a > "svd" > > >> builtin function, then it would be easier to take advantage of the > GPU. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Problem with scalable svd in dml is due to spark backed issues, > > otherwise > > >> there is not problem scaling w/o a local svd(): > > >> > > >> There maybe spark backend issues and more may come to light and more > > >> workloads are executed on SystemML. > > >> For any given operation - we can implement it as a DML bodied function > > or > > >> a > > >> builtin function. > > >> For DML Bodied functions: > > >> Pros: > > >> - The SystemML optimizer can be applied to it > > >> - Distribution of SVD is then taken care of by SystemML. It will > > generate > > >> and run the spark primitives needed. > > >> > > >> Cons: > > >> - Implementing SVD, whether in DML or C, is a fair amount of work > > >> - There would not be a straightforward call to the svd gpu library. In > > >> fact, each of the linear algebra primitives would be accelerated on > the > > >> GPU, but not the entire operation itself. This would involve many more > > JNI > > >> calls. > > >> > > >> For builtin functions: > > >> Pros: > > >> - Use of GPU libraries (cuSolver) and CPU libraries (Apache Commons > > Math) > > >> can be made, these are already optimized (in case of the GPU) > > >> - If a better SVD implementation is available via a library, that can > > >> easily be plugged in. > > >> > > >> Cons: > > >> - Would have to come up with an algorithm to implement distributed SVD > > >> with > > >> spark primitives > > >> > > >> Pick your battle. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Maybe we could try another algorithm for scalable svd() : > > >> > > >> Sure. But before you do that, it may be worth our while to understand > > what > > >> is exactly misleading about the paper that Imran talks about. > > >> > > >> > > >> > > >> > > >> > > >> -Nakul > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> On Thu, Jul 20, 2017 at 4:02 PM, Janardhan Pulivarthi < > > >> janardhan.pulivar...@gmail.com> wrote: > > >> > > >> > Hi Nakul, > > >> > > > >> > Can you help me understand how gpu implementations help scale it. > > >> Whether > > >> > the user always have GPUs in use when using this fn or is it an > > optional > > >> > feature. > > >> > > >> > > >> > The problem with implementing the scalable svd() in dml is due to > the > > >> spark > > >> > backend issues, otherwise there is no problem scaling w/o a local > > svd() > > >> > function. > > >> > > > >> > May be we could try another algorithm for the scalable svd( ), if > the > > >> > present algorithm is misleading as Imran Younus pointed out. > > >> > > > >> > Thank you, > > >> > Janardhan > > >> > > > >> > > > > > > > > >