Hi Janardhan, By tall-skinny matrix I mean that the number of columns should be reasonably small (doesn't matter how many rows are there) . When you do QR decomposition, R would a square matrix (upper triangular) of dimensions equal to the number of columns of matrix A. But then R has to fit on driver's memory, and thats why number of columns of A cannot be be very large. (10k x 10k matrix will close to 1 GB). Once you've R, a local SVD implementation will be needed to to compute SVD of R. Now, this is not a very general method, but I think this is good enough for most of the cases.
imran On Fri, Jul 28, 2017 at 10:37 PM, Janardhan Pulivarthi < janardhan.pulivar...@gmail.com> wrote: > Thanks, Imran. As per the paper, at first we perform QR decomposition of > input matrix (A), from which we obtain R. And then, we compute the svd(R) > using the builtin local function (??). I'll try this. > > Tall-skinny matrix: so, do we have problem with square matrices?. or do we > have to partition the matrix into tall-skinny matrices if we have a square > one?. > > Thanks, > > Janardhan > > On Fri, Jul 28, 2017 at 11:52 PM, Imran Younus <imranyou...@gmail.com> > wrote: > > > Just to clarify one thing. For QR based, method, you can assume that R > > matrix is small enough to fit on driver memory and them perform SVD on > the > > driver. That means your actual matrix has to tall-skinny matrix. > > > > imran > > > > On Fri, Jul 28, 2017 at 11:15 AM, Imran Younus <imranyou...@gmail.com> > > wrote: > > > > > Janardhan, > > > > > > The papers you're referring may not be relevant. The first paper, as > far > > > as I can tell, is about updating an existing svd decomposition as new > > data > > > comes in. The 3rd paper in this list is the one I used, but that method > > is > > > not good. > > > > > > There is also a method that uses QR decomposition and then calculates > SVD > > > from R matrix. Please have a look at equation 1.3 in this paper: > > > > > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.115&rank=1 > > > > > > I think this is worth trying out. The distributed QR is already > > > implemented in SystemlML, so it may quick to try out. > > > > > > imran > > > > > > > > > > > > On Fri, Jul 28, 2017 at 10:10 AM, Janardhan Pulivarthi < > > > janardhan.pulivar...@gmail.com> wrote: > > > > > >> Hi Nakul & all the committers, > > >> > > >> Till now I am half way through the literature. But, for now a couple > of > > >> things to mention, in SVD there are three stages > > >> 1. Bidiagonal reduction step > > >> 2. Computation of the singular values > > >> 3. Computation of the singular vectors > > >> > > >> of these three, The* Bidiagonal reduction* step is very expensive, so > is > > >> our focus on this( when considering GPU, at times where handling with > > CPU > > >> is infeasible). > > >> > > >> About literature: > > >> > > >> - I took some time to go through " A Stable and Fast Algorithm for > > >> Updating the Singular Value Decomposition" by "Gu & Stanley", to > > >> understand > > >> the numerical stability and round-off errors when we are > partitioning > > >> the > > >> matrix in this distributed algorithm. The author has assured that > > each > > >> component computed will be of high absolute accuracy. And also, the > > >> properties that the resultant matrix support do not have any > > conflicts > > >> with > > >> parent matrix. [pdf > > >> <http://www.cs.yale.edu/publications/techreports/tr966.pdf>] > > >> > > >> > > >> - "High performance bidiagonal reduction using the tile algorithms > on > > >> homogeneous multicore clusters ", by "Ltaief et. al", this paper > has > > >> focused on the first stage mainly and has discussed a good about > tile > > >> algorithms and their runtime implementations.(although off-topic, I > > >> read > > >> this just to understand.) [pdf > > >> <http://www.netlib.org/lapack/lawnspdf/lawn247.pdf>] > > >> > > >> > > >> - "A distributed and incremental svd algorithm for agglomerative > > data > > >> analysis on large networks", by "Iwen & Ong", *Please go through* > the > > >> (a). TABLE 1, TABLE 2 . (b). APPENDIX A. RAW DATA FROM NUMERICAL > > >> EXPERIMENTS. [pdf <https://arxiv.org/pdf/1601.07010.pdf>] > > >> > > >> Thanks, > > >> > > >> Janardhan > > >> > > >> On Wed, Jul 26, 2017 at 12:29 AM, Nakul Jindal <naku...@gmail.com> > > wrote: > > >> > > >> > Hi Janardhan, > > >> > > > >> > The images you've used as attachments haven't reached my inbox. > > >> > Could you please send them to me directly, rather than through the > dev > > >> > mailing list. > > >> > (Or upload it to a image hosting site like imgur and paste the links > > in > > >> the > > >> > email) > > >> > > > >> > I would like to point out that my knowledge of machine learning is > > >> limited. > > >> > Still, how would you want to test the algorithm? > > >> > > > >> > > > >> > Sparse matrices in SystemML (in Spark Execution Mode) > > >> > Sparse matrix support in SystemML is implemented at a block level. > > Each > > >> > (potentially giant) matrix is stored as blocks (in Spark execution > > >> mode). > > >> > The matrix itself doesn't store knowledge of whether it is sparse or > > >> not. > > >> > Each block does. Each block of this giant matrix can be stored in > > dense > > >> or > > >> > spare format, depending on how many non-zeroes are in that block. > > >> > The sparsity of a block is recomputed every so often, and the block > > >> > representation can be converted from dense to sparse and vice-versa. > > >> > When operations are performed on blocks, internally, a sparse aware > > >> > operation maybe performed, depending on how the blocks are stored. > > >> > > > >> > (One of the other contributors can clarify, if I've missed something > > or > > >> > have said something wrong). > > >> > > > >> > Given this context, can you please think about whether missing > sparse > > >> > matrix support is still a problem? > > >> > > > >> > > > >> > -Nakul > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > On Tue, Jul 25, 2017 at 11:14 AM, Janardhan Pulivarthi < > > >> > janardhan.pulivar...@gmail.com> wrote: > > >> > > > >> > > Hi Nakul, > > >> > > > > >> > > Thanks for explaining me about pros and cons of the two > approaches. > > >> For > > >> > > now, I have gone through the paper carefully over a couple of days > > and > > >> > > found the following interesting things. > > >> > > > > >> > > 1. This is the algorithm we implemented. > > >> > > > > >> > > > > >> > > 2. In the above algorithm the input matrix A is approximated to > > >> another > > >> > > matrix B with the following relation with the error of chi(p, i) [ > > as > > >> > shown > > >> > > in (3) ] which the author argues will be in an acceptable limit. > So, > > >> we > > >> > can > > >> > > go with this algorithm. > > >> > > > > >> > > > > >> > > But, one bad thing is that author is not sure about whether the > > >> algorithm > > >> > > supports the sparse matrices or not. So, we may need to test it > > here. > > >> > > > > >> > > For the time being we need to test the present algorithm > implemented > > >> by > > >> > > Imran Younus again. Can you help me with the testing? > > >> > > > > >> > > Thanks, > > >> > > Janardhan > > >> > > > > >> > > > > >> > > On Fri, Jul 21, 2017 at 6:07 AM, Nakul Jindal <naku...@gmail.com> > > >> wrote: > > >> > > > > >> > >> Hi Janardhan, > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> How will GPU implementation help scale distributed SVD: > > >> > >> > > >> > >> Imran implemented an algorithm he found out about in the paper "A > > >> > >> Distributed and Incremental SVD Algorithm for Agglomerative Data > > >> > Analysis > > >> > >> on Large Networks" ( > > >> > >> https://github.com/apache/systemml/pull/273/files#diff-488f0 > > >> > >> 6e290f7a54db2e125f7bc608971R27 > > >> > >> ). > > >> > >> The idea there was to build up a distributed SVD using > invocations > > of > > >> > svd > > >> > >> on your local machine. He tried to achieve the multi-level > > >> parallelism > > >> > >> through the parfor construct. > > >> > >> Each local invocation of svd was done using the Apache Commons > Math > > >> > >> library. > > >> > >> If each invocation of this local svd can instead be done on a > GPU, > > >> the > > >> > >> overall wall time for the distributed version would be decreased. > > >> > >> > > >> > >> Users may not always have a GPU. In that case, the svd falls back > > to > > >> the > > >> > >> Apache Comons Math implementation. But if they do and if we have > a > > >> "svd" > > >> > >> builtin function, then it would be easier to take advantage of > the > > >> GPU. > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> Problem with scalable svd in dml is due to spark backed issues, > > >> > otherwise > > >> > >> there is not problem scaling w/o a local svd(): > > >> > >> > > >> > >> There maybe spark backend issues and more may come to light and > > more > > >> > >> workloads are executed on SystemML. > > >> > >> For any given operation - we can implement it as a DML bodied > > >> function > > >> > or > > >> > >> a > > >> > >> builtin function. > > >> > >> For DML Bodied functions: > > >> > >> Pros: > > >> > >> - The SystemML optimizer can be applied to it > > >> > >> - Distribution of SVD is then taken care of by SystemML. It will > > >> > generate > > >> > >> and run the spark primitives needed. > > >> > >> > > >> > >> Cons: > > >> > >> - Implementing SVD, whether in DML or C, is a fair amount of work > > >> > >> - There would not be a straightforward call to the svd gpu > library. > > >> In > > >> > >> fact, each of the linear algebra primitives would be accelerated > on > > >> the > > >> > >> GPU, but not the entire operation itself. This would involve many > > >> more > > >> > JNI > > >> > >> calls. > > >> > >> > > >> > >> For builtin functions: > > >> > >> Pros: > > >> > >> - Use of GPU libraries (cuSolver) and CPU libraries (Apache > Commons > > >> > Math) > > >> > >> can be made, these are already optimized (in case of the GPU) > > >> > >> - If a better SVD implementation is available via a library, that > > can > > >> > >> easily be plugged in. > > >> > >> > > >> > >> Cons: > > >> > >> - Would have to come up with an algorithm to implement > distributed > > >> SVD > > >> > >> with > > >> > >> spark primitives > > >> > >> > > >> > >> Pick your battle. > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> Maybe we could try another algorithm for scalable svd() : > > >> > >> > > >> > >> Sure. But before you do that, it may be worth our while to > > understand > > >> > what > > >> > >> is exactly misleading about the paper that Imran talks about. > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> -Nakul > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> On Thu, Jul 20, 2017 at 4:02 PM, Janardhan Pulivarthi < > > >> > >> janardhan.pulivar...@gmail.com> wrote: > > >> > >> > > >> > >> > Hi Nakul, > > >> > >> > > > >> > >> > Can you help me understand how gpu implementations help scale > it. > > >> > >> Whether > > >> > >> > the user always have GPUs in use when using this fn or is it an > > >> > optional > > >> > >> > feature. > > >> > >> > > >> > >> > > >> > >> > The problem with implementing the scalable svd() in dml is due > to > > >> the > > >> > >> spark > > >> > >> > backend issues, otherwise there is no problem scaling w/o a > local > > >> > svd() > > >> > >> > function. > > >> > >> > > > >> > >> > May be we could try another algorithm for the scalable svd( ), > if > > >> the > > >> > >> > present algorithm is misleading as Imran Younus pointed out. > > >> > >> > > > >> > >> > Thank you, > > >> > >> > Janardhan > > >> > >> > > > >> > >> > > >> > > > > >> > > > > >> > > > >> > > > > > > > > >