Just to clarify one thing. For QR based, method, you can assume that R matrix is small enough to fit on driver memory and them perform SVD on the driver. That means your actual matrix has to tall-skinny matrix.
imran On Fri, Jul 28, 2017 at 11:15 AM, Imran Younus <imranyou...@gmail.com> wrote: > Janardhan, > > The papers you're referring may not be relevant. The first paper, as far > as I can tell, is about updating an existing svd decomposition as new data > comes in. The 3rd paper in this list is the one I used, but that method is > not good. > > There is also a method that uses QR decomposition and then calculates SVD > from R matrix. Please have a look at equation 1.3 in this paper: > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.115&rank=1 > > I think this is worth trying out. The distributed QR is already > implemented in SystemlML, so it may quick to try out. > > imran > > > > On Fri, Jul 28, 2017 at 10:10 AM, Janardhan Pulivarthi < > janardhan.pulivar...@gmail.com> wrote: > >> Hi Nakul & all the committers, >> >> Till now I am half way through the literature. But, for now a couple of >> things to mention, in SVD there are three stages >> 1. Bidiagonal reduction step >> 2. Computation of the singular values >> 3. Computation of the singular vectors >> >> of these three, The* Bidiagonal reduction* step is very expensive, so is >> our focus on this( when considering GPU, at times where handling with CPU >> is infeasible). >> >> About literature: >> >> - I took some time to go through " A Stable and Fast Algorithm for >> Updating the Singular Value Decomposition" by "Gu & Stanley", to >> understand >> the numerical stability and round-off errors when we are partitioning >> the >> matrix in this distributed algorithm. The author has assured that each >> component computed will be of high absolute accuracy. And also, the >> properties that the resultant matrix support do not have any conflicts >> with >> parent matrix. [pdf >> <http://www.cs.yale.edu/publications/techreports/tr966.pdf>] >> >> >> - "High performance bidiagonal reduction using the tile algorithms on >> homogeneous multicore clusters ", by "Ltaief et. al", this paper has >> focused on the first stage mainly and has discussed a good about tile >> algorithms and their runtime implementations.(although off-topic, I >> read >> this just to understand.) [pdf >> <http://www.netlib.org/lapack/lawnspdf/lawn247.pdf>] >> >> >> - "A distributed and incremental svd algorithm for agglomerative data >> analysis on large networks", by "Iwen & Ong", *Please go through* the >> (a). TABLE 1, TABLE 2 . (b). APPENDIX A. RAW DATA FROM NUMERICAL >> EXPERIMENTS. [pdf <https://arxiv.org/pdf/1601.07010.pdf>] >> >> Thanks, >> >> Janardhan >> >> On Wed, Jul 26, 2017 at 12:29 AM, Nakul Jindal <naku...@gmail.com> wrote: >> >> > Hi Janardhan, >> > >> > The images you've used as attachments haven't reached my inbox. >> > Could you please send them to me directly, rather than through the dev >> > mailing list. >> > (Or upload it to a image hosting site like imgur and paste the links in >> the >> > email) >> > >> > I would like to point out that my knowledge of machine learning is >> limited. >> > Still, how would you want to test the algorithm? >> > >> > >> > Sparse matrices in SystemML (in Spark Execution Mode) >> > Sparse matrix support in SystemML is implemented at a block level. Each >> > (potentially giant) matrix is stored as blocks (in Spark execution >> mode). >> > The matrix itself doesn't store knowledge of whether it is sparse or >> not. >> > Each block does. Each block of this giant matrix can be stored in dense >> or >> > spare format, depending on how many non-zeroes are in that block. >> > The sparsity of a block is recomputed every so often, and the block >> > representation can be converted from dense to sparse and vice-versa. >> > When operations are performed on blocks, internally, a sparse aware >> > operation maybe performed, depending on how the blocks are stored. >> > >> > (One of the other contributors can clarify, if I've missed something or >> > have said something wrong). >> > >> > Given this context, can you please think about whether missing sparse >> > matrix support is still a problem? >> > >> > >> > -Nakul >> > >> > >> > >> > >> > >> > >> > >> > On Tue, Jul 25, 2017 at 11:14 AM, Janardhan Pulivarthi < >> > janardhan.pulivar...@gmail.com> wrote: >> > >> > > Hi Nakul, >> > > >> > > Thanks for explaining me about pros and cons of the two approaches. >> For >> > > now, I have gone through the paper carefully over a couple of days and >> > > found the following interesting things. >> > > >> > > 1. This is the algorithm we implemented. >> > > >> > > >> > > 2. In the above algorithm the input matrix A is approximated to >> another >> > > matrix B with the following relation with the error of chi(p, i) [ as >> > shown >> > > in (3) ] which the author argues will be in an acceptable limit. So, >> we >> > can >> > > go with this algorithm. >> > > >> > > >> > > But, one bad thing is that author is not sure about whether the >> algorithm >> > > supports the sparse matrices or not. So, we may need to test it here. >> > > >> > > For the time being we need to test the present algorithm implemented >> by >> > > Imran Younus again. Can you help me with the testing? >> > > >> > > Thanks, >> > > Janardhan >> > > >> > > >> > > On Fri, Jul 21, 2017 at 6:07 AM, Nakul Jindal <naku...@gmail.com> >> wrote: >> > > >> > >> Hi Janardhan, >> > >> >> > >> >> > >> >> > >> >> > >> How will GPU implementation help scale distributed SVD: >> > >> >> > >> Imran implemented an algorithm he found out about in the paper "A >> > >> Distributed and Incremental SVD Algorithm for Agglomerative Data >> > Analysis >> > >> on Large Networks" ( >> > >> https://github.com/apache/systemml/pull/273/files#diff-488f0 >> > >> 6e290f7a54db2e125f7bc608971R27 >> > >> ). >> > >> The idea there was to build up a distributed SVD using invocations of >> > svd >> > >> on your local machine. He tried to achieve the multi-level >> parallelism >> > >> through the parfor construct. >> > >> Each local invocation of svd was done using the Apache Commons Math >> > >> library. >> > >> If each invocation of this local svd can instead be done on a GPU, >> the >> > >> overall wall time for the distributed version would be decreased. >> > >> >> > >> Users may not always have a GPU. In that case, the svd falls back to >> the >> > >> Apache Comons Math implementation. But if they do and if we have a >> "svd" >> > >> builtin function, then it would be easier to take advantage of the >> GPU. >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> Problem with scalable svd in dml is due to spark backed issues, >> > otherwise >> > >> there is not problem scaling w/o a local svd(): >> > >> >> > >> There maybe spark backend issues and more may come to light and more >> > >> workloads are executed on SystemML. >> > >> For any given operation - we can implement it as a DML bodied >> function >> > or >> > >> a >> > >> builtin function. >> > >> For DML Bodied functions: >> > >> Pros: >> > >> - The SystemML optimizer can be applied to it >> > >> - Distribution of SVD is then taken care of by SystemML. It will >> > generate >> > >> and run the spark primitives needed. >> > >> >> > >> Cons: >> > >> - Implementing SVD, whether in DML or C, is a fair amount of work >> > >> - There would not be a straightforward call to the svd gpu library. >> In >> > >> fact, each of the linear algebra primitives would be accelerated on >> the >> > >> GPU, but not the entire operation itself. This would involve many >> more >> > JNI >> > >> calls. >> > >> >> > >> For builtin functions: >> > >> Pros: >> > >> - Use of GPU libraries (cuSolver) and CPU libraries (Apache Commons >> > Math) >> > >> can be made, these are already optimized (in case of the GPU) >> > >> - If a better SVD implementation is available via a library, that can >> > >> easily be plugged in. >> > >> >> > >> Cons: >> > >> - Would have to come up with an algorithm to implement distributed >> SVD >> > >> with >> > >> spark primitives >> > >> >> > >> Pick your battle. >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> Maybe we could try another algorithm for scalable svd() : >> > >> >> > >> Sure. But before you do that, it may be worth our while to understand >> > what >> > >> is exactly misleading about the paper that Imran talks about. >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> -Nakul >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> On Thu, Jul 20, 2017 at 4:02 PM, Janardhan Pulivarthi < >> > >> janardhan.pulivar...@gmail.com> wrote: >> > >> >> > >> > Hi Nakul, >> > >> > >> > >> > Can you help me understand how gpu implementations help scale it. >> > >> Whether >> > >> > the user always have GPUs in use when using this fn or is it an >> > optional >> > >> > feature. >> > >> >> > >> >> > >> > The problem with implementing the scalable svd() in dml is due to >> the >> > >> spark >> > >> > backend issues, otherwise there is no problem scaling w/o a local >> > svd() >> > >> > function. >> > >> > >> > >> > May be we could try another algorithm for the scalable svd( ), if >> the >> > >> > present algorithm is misleading as Imran Younus pointed out. >> > >> > >> > >> > Thank you, >> > >> > Janardhan >> > >> > >> > >> >> > > >> > > >> > >> > >