Re: matrix computation in spark
Hey Yuxi, We also have implemented a distributed matrix multiplication library in PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We implemented three distributed matrix multiplication algorithms on Spark. As we see, communication-optimal does not always means the total-optimal. Thus, besides the CARMA matrix multiplication you mentioned, we also implemented the Block-splitting matrix multiplication and Broadcast matrix multiplication. They are more efficient than the CARMA matrix multiplication for some situations, for example a large matrix multiplies a small matrix. Actually, We have shared the work on Spark Meetup@Beijing on October 26th.( http://www.meetup.com/spark-user-beijing-Meetup/events/210422112/ ). The slide can be download from the archive here http://pan.baidu.com/s/1dDoyHX3#path=%252Fmeetup-3rd Best, Rong 2014-11-18 13:11 GMT+08:00 顾荣 : > Hey Yuxi, > > We also have implemented a distributed matrix multiplication library in > PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We > implemented three distributed matrix multiplication algorithms on Spark. As > we see, communication-optimal does not always means the total-optimal. > Thus, besides the CARMA matrix multiplication you mentioned, we also > implemented the Block-splitting matrix multiplication and Broadcast matrix > multiplication. They are more efficient than the CARMA matrix > multiplication for some situations, for example a large matrix multiplies a > small matrix. > > Actually, We have shared the work on Spark Meetup@Beijing on October > 26th.( http://www.meetup.com/spark-user-beijing-Meetup/events/210422112/ > ). The slide is also attached in this mail. > > Best, > Rong > > 2014-11-18 11:36 GMT+08:00 Zongheng Yang : > >> There's been some work at the AMPLab on a distributed matrix library on >> top >> of Spark; see here [1]. In particular, the repo contains a couple >> factorization algorithms. >> >> [1] https://github.com/amplab/ml-matrix >> >> Zongheng >> >> On Mon Nov 17 2014 at 7:34:17 PM liaoyuxi wrote: >> >> > Hi, >> > Matrix computation is critical for algorithm efficiency like least >> square, >> > Kalman filter and so on. >> > For now, the mllib module offers limited linear algebra on matrix, >> > especially for distributed matrix. >> > >> > We have been working on establishing distributed matrix computation APIs >> > based on data structures in MLlib. >> > The main idea is to partition the matrix into sub-blocks, based on the >> > strategy in the following paper. >> > http://www.cs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf >> > In our experiment, it's communication-optimal. >> > But operations like factorization may not be appropriate to carry out in >> > blocks. >> > >> > Any suggestions and guidance are welcome. >> > >> > Thanks, >> > Yuxi >> > >> > >> > > > > -- > -- > Rong Gu > Department of Computer Science and Technology > State Key Laboratory for Novel Software Technology > Nanjing University > Phone: +86 15850682791 > Email: gurongwal...@gmail.com > Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/ > -- -- Rong Gu Department of Computer Science and Technology State Key Laboratory for Novel Software Technology Nanjing University Phone: +86 15850682791 Email: gurongwal...@gmail.com Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/
Re: [mllib] Add multiplying large scale matrices
Hi All, Sorry for my late reply! Yu Ishikawa,Thanks for your interests in Saury project. You are welcomed to try that out. If you have questions about that, please email me. We are keeping improving performance/adding features for the project. Xiangrui, thanks for your encouragement. If you have any problems with my CSDN reports, please feel free to contact me. We had some design for Saury on our lab's private JIRA which is in Chinese. I will translate into English then share it to you these days. Acutally, I also have surveyed the related algorithms/systems before we started the Saury project. The survey is attached in this email, not on CSDN report. We also had considered the 2.5D algorithm for reducing communication. However, at that time, MLlib did not have a distributed block matrix representation. So, we decided to firstly implement the distributed matrix multiplication on the IndexRowMatrix as time is limited for the Summer Code project. Also, as far as we know, nobody had tried that at that time. Actually, adopting 2.5D algorithm to reduce network communication is on our roadmap. We are also planning to do that in the next days. Best, Rong 2014-09-08 15:31 GMT+08:00 Xiangrui Meng : > Sorry for my late reply! I'm also very interested in the > implementation of distributed matrix multiplication. As Shivaram > mentioned, the communication is the concern here. But maybe we can > start with a reasonable implementation and then iterate on its > performance. It would be great if eventually we can implement an > algorithm close to the 2.5D algorithm > (http://www.netlib.org/lapack/lawnspdf/lawn248.pdf). > > I created two JIRAs for this topic: > > 1. Distributed block matrix: > https://issues.apache.org/jira/browse/SPARK-3434 > 2. Distributed matrix multiplication: > https://issues.apache.org/jira/browse/SPARK-3435 > > We can move our discussion there. > > Rong, I'm really happy to see the Saury project. It would be great if > you can share your design and experience (maybe on the JIRA page so it > is easier to track). I will read the reports on CSDN and ping you if I > ran into problems. Thanks! > > Best, > Xiangrui > > On Sat, Sep 6, 2014 at 1:28 AM, Yu Ishikawa > wrote: > > Hi Rong, > > > > Great job! Thank you for let me know your work. > > I will read the source code of saury later. > > > > Although AMPLab is working to implement them, would you like to merge it > > into Spark? > > > > Best, > > > > -- Yu Ishikawa > > > > > > > > > > -- > > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8310.html > > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > > For additional commands, e-mail: dev-h...@spark.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > > -- -- Rong Gu Department of Computer Science and Technology State Key Laboratory for Novel Software Technology Nanjing University Phone: +86 15850682791 Email: gurongwal...@gmail.com Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
Missed the dev-list last email. Resent it again. Please ignore the duplicated one. 2014-09-06 11:22 GMT+08:00 顾荣 : > Hi All, > > This is RongGu from PasaLab at Nanjing Universtiy,China. Actually, we have > been working on a distributed matrix operations library on Spark this > summer. It is a Summer Code project hosted by CSDN and Intel Lab ( > http://code.csdn.net/os_camp/8/proposals/26). Previously, the codebase of > the project is hosted on CSDN's code platform( > https://code.csdn.net/u014252240/sparkmatrixlib) and we have been writing > weekly reports on the blog(http://blog.csdn.net/u014252240). > > Now, the project comes to end now. I have moved the project to github > these days. *Please see the link here *https://github.com/PasaLab/saury . > We name the project Saury and provide documents to help people know it > better. > > Technically, we implement the matrix manipulation on Spark with block > matrix parallel algorithms to distribute large scale matrix computation > among cluster nodes. Also, we take advantage of the native linear algebra > library(e.g BLAS)on each worker node to accelerate the computing process. > That really makes a difference! See the preliminary performance evaluation > report at > https://github.com/PasaLab/saury/wiki/Performance-comparison-on-matrices-multiply > > Currently, we are working on adding more advanced matrix manipulation > algorithms into Saury, such as matrix factorization and diagonalization > algorithms. In fact, Saury contains an alpha version distributed LU > factorization implementation now. Also, we are trying to use Tachyon to > hold and share the matrix data across the cluster with faster speed. > > Best, > Rong > > -- > -- > Rong Gu > Department of Computer Science and Technology > State Key Laboratory for Novel Software Technology > Nanjing University > Email: gurongwal...@gmail.com > Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/ > > > 2014-09-06 1:29 GMT+08:00 Jeremy Freeman : > >> Hey all, >> >> Definitely agreed this would be nice! In our own work we've done >> element-wise addition, subtraction, and scalar multiplication of similarly >> partitioned matrices very efficiently with zipping. We've also done >> matrix-matrix multiplication with zipping, but that only works in certain >> circumstances, and it's otherwise very communication intensive (as Shivaram >> says). Another tricky thing with addition / subtraction is how to handle >> sparse vs. dense arrays. >> >> Would be happy to contribute anything we did, but definitely first worth >> knowing what progress has been made from the AMPLab. >> >> -- Jeremy >> >> - >> jeremy freeman, phd >> neuroscientist >> @thefreemanlab >> >> On Sep 5, 2014, at 12:23 PM, Patrick Wendell wrote: >> >> > Hey There, >> > >> > I believe this is on the roadmap for the 1.2 next release. But >> > Xiangrui can comment on this. >> > >> > - Patrick >> > >> > On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa >> > wrote: >> >> Hi Evan, >> >> >> >> That's sounds interesting. >> >> >> >> Here is the ticket which I created. >> >> https://issues.apache.org/jira/browse/SPARK-3416 >> >> >> >> thanks, >> >> >> >> >> >> >> >> -- >> >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html >> >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> >> >> - >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >> > >> > - >> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> > For additional commands, e-mail: dev-h...@spark.apache.org >> > >> >> > > > -- > -- > Rong Gu > Department of Computer Science and Technology > State Key Laboratory for Novel Software Technology > Nanjing University > Phone: +86 15850682791 > Email: gurongwal...@gmail.com > Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/ > -- -- Rong Gu Department of Computer Science and Technology State Key Laboratory for Novel Software Technology Nanjing University Phone: +86 15850682791 Email: gurongwal...@gmail.com Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/