Re: [mllib] Add multiplying large scale matrices
Hi All, Sorry for my late reply! Yu Ishikawa,Thanks for your interests in Saury project. You are welcomed to try that out. If you have questions about that, please email me. We are keeping improving performance/adding features for the project. Xiangrui, thanks for your encouragement. If you have any problems with my CSDN reports, please feel free to contact me. We had some design for Saury on our lab's private JIRA which is in Chinese. I will translate into English then share it to you these days. Acutally, I also have surveyed the related algorithms/systems before we started the Saury project. The survey is attached in this email, not on CSDN report. We also had considered the 2.5D algorithm for reducing communication. However, at that time, MLlib did not have a distributed block matrix representation. So, we decided to firstly implement the distributed matrix multiplication on the IndexRowMatrix as time is limited for the Summer Code project. Also, as far as we know, nobody had tried that at that time. Actually, adopting 2.5D algorithm to reduce network communication is on our roadmap. We are also planning to do that in the next days. Best, Rong 2014-09-08 15:31 GMT+08:00 Xiangrui Meng : > Sorry for my late reply! I'm also very interested in the > implementation of distributed matrix multiplication. As Shivaram > mentioned, the communication is the concern here. But maybe we can > start with a reasonable implementation and then iterate on its > performance. It would be great if eventually we can implement an > algorithm close to the 2.5D algorithm > (http://www.netlib.org/lapack/lawnspdf/lawn248.pdf). > > I created two JIRAs for this topic: > > 1. Distributed block matrix: > https://issues.apache.org/jira/browse/SPARK-3434 > 2. Distributed matrix multiplication: > https://issues.apache.org/jira/browse/SPARK-3435 > > We can move our discussion there. > > Rong, I'm really happy to see the Saury project. It would be great if > you can share your design and experience (maybe on the JIRA page so it > is easier to track). I will read the reports on CSDN and ping you if I > ran into problems. Thanks! > > Best, > Xiangrui > > On Sat, Sep 6, 2014 at 1:28 AM, Yu Ishikawa > wrote: > > Hi Rong, > > > > Great job! Thank you for let me know your work. > > I will read the source code of saury later. > > > > Although AMPLab is working to implement them, would you like to merge it > > into Spark? > > > > Best, > > > > -- Yu Ishikawa > > > > > > > > > > -- > > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8310.html > > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > > For additional commands, e-mail: dev-h...@spark.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > > -- -- Rong Gu Department of Computer Science and Technology State Key Laboratory for Novel Software Technology Nanjing University Phone: +86 15850682791 Email: gurongwal...@gmail.com Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
Hi Xiangrui Meng, Thank you for your comment and creating tickets. The ticket which I created would be moved to your tickets. I will close my ticket, and then will link it to yours later. Best, Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8333.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
Sorry for my late reply! I'm also very interested in the implementation of distributed matrix multiplication. As Shivaram mentioned, the communication is the concern here. But maybe we can start with a reasonable implementation and then iterate on its performance. It would be great if eventually we can implement an algorithm close to the 2.5D algorithm (http://www.netlib.org/lapack/lawnspdf/lawn248.pdf). I created two JIRAs for this topic: 1. Distributed block matrix: https://issues.apache.org/jira/browse/SPARK-3434 2. Distributed matrix multiplication: https://issues.apache.org/jira/browse/SPARK-3435 We can move our discussion there. Rong, I'm really happy to see the Saury project. It would be great if you can share your design and experience (maybe on the JIRA page so it is easier to track). I will read the reports on CSDN and ping you if I ran into problems. Thanks! Best, Xiangrui On Sat, Sep 6, 2014 at 1:28 AM, Yu Ishikawa wrote: > Hi Rong, > > Great job! Thank you for let me know your work. > I will read the source code of saury later. > > Although AMPLab is working to implement them, would you like to merge it > into Spark? > > Best, > > -- Yu Ishikawa > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8310.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
Hi Rong, Great job! Thank you for let me know your work. I will read the source code of saury later. Although AMPLab is working to implement them, would you like to merge it into Spark? Best, -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8310.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
Hi Jeremy, Great work! I'm interested in your work. If there is your code on github, could you let me know? -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8309.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
Missed the dev-list last email. Resent it again. Please ignore the duplicated one. 2014-09-06 11:22 GMT+08:00 顾荣 : > Hi All, > > This is RongGu from PasaLab at Nanjing Universtiy,China. Actually, we have > been working on a distributed matrix operations library on Spark this > summer. It is a Summer Code project hosted by CSDN and Intel Lab ( > http://code.csdn.net/os_camp/8/proposals/26). Previously, the codebase of > the project is hosted on CSDN's code platform( > https://code.csdn.net/u014252240/sparkmatrixlib) and we have been writing > weekly reports on the blog(http://blog.csdn.net/u014252240). > > Now, the project comes to end now. I have moved the project to github > these days. *Please see the link here *https://github.com/PasaLab/saury . > We name the project Saury and provide documents to help people know it > better. > > Technically, we implement the matrix manipulation on Spark with block > matrix parallel algorithms to distribute large scale matrix computation > among cluster nodes. Also, we take advantage of the native linear algebra > library(e.g BLAS)on each worker node to accelerate the computing process. > That really makes a difference! See the preliminary performance evaluation > report at > https://github.com/PasaLab/saury/wiki/Performance-comparison-on-matrices-multiply > > Currently, we are working on adding more advanced matrix manipulation > algorithms into Saury, such as matrix factorization and diagonalization > algorithms. In fact, Saury contains an alpha version distributed LU > factorization implementation now. Also, we are trying to use Tachyon to > hold and share the matrix data across the cluster with faster speed. > > Best, > Rong > > -- > -- > Rong Gu > Department of Computer Science and Technology > State Key Laboratory for Novel Software Technology > Nanjing University > Email: gurongwal...@gmail.com > Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/ > > > 2014-09-06 1:29 GMT+08:00 Jeremy Freeman : > >> Hey all, >> >> Definitely agreed this would be nice! In our own work we've done >> element-wise addition, subtraction, and scalar multiplication of similarly >> partitioned matrices very efficiently with zipping. We've also done >> matrix-matrix multiplication with zipping, but that only works in certain >> circumstances, and it's otherwise very communication intensive (as Shivaram >> says). Another tricky thing with addition / subtraction is how to handle >> sparse vs. dense arrays. >> >> Would be happy to contribute anything we did, but definitely first worth >> knowing what progress has been made from the AMPLab. >> >> -- Jeremy >> >> - >> jeremy freeman, phd >> neuroscientist >> @thefreemanlab >> >> On Sep 5, 2014, at 12:23 PM, Patrick Wendell wrote: >> >> > Hey There, >> > >> > I believe this is on the roadmap for the 1.2 next release. But >> > Xiangrui can comment on this. >> > >> > - Patrick >> > >> > On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa >> > wrote: >> >> Hi Evan, >> >> >> >> That's sounds interesting. >> >> >> >> Here is the ticket which I created. >> >> https://issues.apache.org/jira/browse/SPARK-3416 >> >> >> >> thanks, >> >> >> >> >> >> >> >> -- >> >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html >> >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> >> >> - >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >> > >> > - >> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> > For additional commands, e-mail: dev-h...@spark.apache.org >> > >> >> > > > -- > -- > Rong Gu > Department of Computer Science and Technology > State Key Laboratory for Novel Software Technology > Nanjing University > Phone: +86 15850682791 > Email: gurongwal...@gmail.com > Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/ > -- -- Rong Gu Department of Computer Science and Technology State Key Laboratory for Novel Software Technology Nanjing University Phone: +86 15850682791 Email: gurongwal...@gmail.com Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/
Re: [mllib] Add multiplying large scale matrices
Hey all, Definitely agreed this would be nice! In our own work we've done element-wise addition, subtraction, and scalar multiplication of similarly partitioned matrices very efficiently with zipping. We've also done matrix-matrix multiplication with zipping, but that only works in certain circumstances, and it's otherwise very communication intensive (as Shivaram says). Another tricky thing with addition / subtraction is how to handle sparse vs. dense arrays. Would be happy to contribute anything we did, but definitely first worth knowing what progress has been made from the AMPLab. -- Jeremy - jeremy freeman, phd neuroscientist @thefreemanlab On Sep 5, 2014, at 12:23 PM, Patrick Wendell wrote: > Hey There, > > I believe this is on the roadmap for the 1.2 next release. But > Xiangrui can comment on this. > > - Patrick > > On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa > wrote: >> Hi Evan, >> >> That's sounds interesting. >> >> Here is the ticket which I created. >> https://issues.apache.org/jira/browse/SPARK-3416 >> >> thanks, >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org >
Re: [mllib] Add multiplying large scale matrices
Hey There, I believe this is on the roadmap for the 1.2 next release. But Xiangrui can comment on this. - Patrick On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa wrote: > Hi Evan, > > That's sounds interesting. > > Here is the ticket which I created. > https://issues.apache.org/jira/browse/SPARK-3416 > > thanks, > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
FWIW matrix multiplication is extremely communication intensive when you have two row partitioned matrices and there are often other ways to solve problems. Regardless, it would be good to have a more complete matrix library and it would be good to contribute some of the stuff we have done in the AMPLab to MLLib. Shivaram On Fri, Sep 5, 2014 at 9:12 AM, Evan R. Sparks wrote: > There's some work on this going on in the AMP Lab. Create a ticket and we > can update with our progress so that we don't duplicate effort. > > > On Fri, Sep 5, 2014 at 8:18 AM, Yu Ishikawa > wrote: > >> Hi RJ, >> >> Thank you for your comment. I am interested in to have other matrix >> operations too. >> I will create a JIRA issue in the first place. >> >> thanks, >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
Hi Evan, That's sounds interesting. Here is the ticket which I created. https://issues.apache.org/jira/browse/SPARK-3416 thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
There's some work on this going on in the AMP Lab. Create a ticket and we can update with our progress so that we don't duplicate effort. On Fri, Sep 5, 2014 at 8:18 AM, Yu Ishikawa wrote: > Hi RJ, > > Thank you for your comment. I am interested in to have other matrix > operations too. > I will create a JIRA issue in the first place. > > thanks, > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Re: [mllib] Add multiplying large scale matrices
Hi RJ, Thank you for your comment. I am interested in to have other matrix operations too. I will create a JIRA issue in the first place. thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Add multiplying large scale matrices
I think it would be interesting to have a variety of matrix operations (multiplication, addition / subtraction, powers, scalar multiply, etc.) available in Spark. Diagonalization may be more difficult but iterative approximation approaches may be quite amenable. On Fri, Sep 5, 2014 at 5:26 AM, Yu Ishikawa wrote: > Hi all, > > It seems that there is a method to multiply a RowMatrix and a (local) > Matrix. > However, there is not a method to multiply a large scale matrix and another > one in Spark. > It would be helpful. Does anyone have a plan to add multiplying large scale > matrices? > Or shouldn't we support it in Spark? > > thanks, > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > > -- em rnowl...@gmail.com c 954.496.2314
[mllib] Add multiplying large scale matrices
Hi all, It seems that there is a method to multiply a RowMatrix and a (local) Matrix. However, there is not a method to multiply a large scale matrix and another one in Spark. It would be helpful. Does anyone have a plan to add multiplying large scale matrices? Or shouldn't we support it in Spark? thanks, -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org