There's been some work at the AMPLab on a distributed matrix library on top of Spark; see here [1]. In particular, the repo contains a couple factorization algorithms.
[1] https://github.com/amplab/ml-matrix Zongheng On Mon Nov 17 2014 at 7:34:17 PM liaoyuxi <liaoy...@huawei.com> wrote: > Hi, > Matrix computation is critical for algorithm efficiency like least square, > Kalman filter and so on. > For now, the mllib module offers limited linear algebra on matrix, > especially for distributed matrix. > > We have been working on establishing distributed matrix computation APIs > based on data structures in MLlib. > The main idea is to partition the matrix into sub-blocks, based on the > strategy in the following paper. > http://www.cs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf > In our experiment, it's communication-optimal. > But operations like factorization may not be appropriate to carry out in > blocks. > > Any suggestions and guidance are welcome. > > Thanks, > Yuxi > >