Re: matrix computation in spark

2014-11-17 Thread
Hey Yuxi,

We also have implemented a distributed matrix multiplication library in
PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We
implemented three distributed matrix multiplication algorithms on Spark. As
we see, communication-optimal does not always means the total-optimal.
Thus, besides the CARMA matrix multiplication you mentioned, we also
implemented the Block-splitting matrix multiplication and Broadcast matrix
multiplication. They are more efficient than the CARMA matrix
multiplication for some situations, for example a large matrix multiplies a
small matrix.

Actually, We have shared the work on Spark Meetup@Beijing on October 26th.(
http://www.meetup.com/spark-user-beijing-Meetup/events/210422112/ ). The
slide can be download from the archive here
http://pan.baidu.com/s/1dDoyHX3#path=%252Fmeetup-3rd

Best,
Rong

2014-11-18 13:11 GMT+08:00 顾荣 :

> Hey Yuxi,
>
> We also have implemented a distributed matrix multiplication library in
> PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We
> implemented three distributed matrix multiplication algorithms on Spark. As
> we see, communication-optimal does not always means the total-optimal.
> Thus, besides the CARMA matrix multiplication you mentioned, we also
> implemented the Block-splitting matrix multiplication and Broadcast matrix
> multiplication. They are more efficient than the CARMA matrix
> multiplication for some situations, for example a large matrix multiplies a
> small matrix.
>
> Actually, We have shared the work on Spark Meetup@Beijing on October
> 26th.( http://www.meetup.com/spark-user-beijing-Meetup/events/210422112/
> ). The slide is also attached in this mail.
>
> Best,
> Rong
>
> 2014-11-18 11:36 GMT+08:00 Zongheng Yang :
>
>> There's been some work at the AMPLab on a distributed matrix library on
>> top
>> of Spark; see here [1]. In particular, the repo contains a couple
>> factorization algorithms.
>>
>> [1] https://github.com/amplab/ml-matrix
>>
>> Zongheng
>>
>> On Mon Nov 17 2014 at 7:34:17 PM liaoyuxi  wrote:
>>
>> > Hi,
>> > Matrix computation is critical for algorithm efficiency like least
>> square,
>> > Kalman filter and so on.
>> > For now, the mllib module offers limited linear algebra on matrix,
>> > especially for distributed matrix.
>> >
>> > We have been working on establishing distributed matrix computation APIs
>> > based on data structures in MLlib.
>> > The main idea is to partition the matrix into sub-blocks, based on the
>> > strategy in the following paper.
>> > http://www.cs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf
>> > In our experiment, it's communication-optimal.
>> > But operations like factorization may not be appropriate to carry out in
>> > blocks.
>> >
>> > Any suggestions and guidance are welcome.
>> >
>> > Thanks,
>> > Yuxi
>> >
>> >
>>
>
>
>
> --
> --
> Rong Gu
> Department of Computer Science and Technology
> State Key Laboratory for Novel Software Technology
> Nanjing University
> Phone: +86 15850682791
> Email: gurongwal...@gmail.com
> Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/
>



-- 
--
Rong Gu
Department of Computer Science and Technology
State Key Laboratory for Novel Software Technology
Nanjing University
Phone: +86 15850682791
Email: gurongwal...@gmail.com
Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/


Re: [mllib] Add multiplying large scale matrices

2014-09-09 Thread
Hi All,

Sorry for my late reply!

Yu Ishikawa,Thanks for your interests in Saury project. You are welcomed to
try that out. If you have questions about that, please email me. We are
keeping improving performance/adding features for the project.

Xiangrui, thanks for your encouragement. If you have any problems with my
CSDN reports, please feel free to contact me. We had some design for Saury
on our lab's private JIRA which is in Chinese. I will translate into
English then share it to you these days. Acutally, I also have surveyed the
related algorithms/systems before we started the Saury project. The survey
is attached in this email, not on CSDN report. We also had considered the
2.5D algorithm for reducing communication. However, at that time, MLlib did
not have a distributed block matrix representation. So, we decided to
firstly implement the distributed matrix multiplication on the
IndexRowMatrix as time is limited for the Summer Code project. Also, as far
as we know, nobody had tried that at that time. Actually, adopting 2.5D
algorithm to reduce network communication is on our roadmap. We are also
planning to do that in the next days.

Best,
Rong


2014-09-08 15:31 GMT+08:00 Xiangrui Meng :

> Sorry for my late reply! I'm also very interested in the
> implementation of distributed matrix multiplication. As Shivaram
> mentioned, the communication is the concern here. But maybe we can
> start with a reasonable implementation and then iterate on its
> performance. It would be great if eventually we can implement an
> algorithm close to the 2.5D algorithm
> (http://www.netlib.org/lapack/lawnspdf/lawn248.pdf).
>
> I created two JIRAs for this topic:
>
> 1. Distributed block matrix:
> https://issues.apache.org/jira/browse/SPARK-3434
> 2. Distributed matrix multiplication:
> https://issues.apache.org/jira/browse/SPARK-3435
>
> We can move our discussion there.
>
> Rong, I'm really happy to see the Saury project. It would be great if
> you can share your design and experience (maybe on the JIRA page so it
> is easier to track). I will read the reports on CSDN and ping you if I
> ran into problems. Thanks!
>
> Best,
> Xiangrui
>
> On Sat, Sep 6, 2014 at 1:28 AM, Yu Ishikawa
>  wrote:
> > Hi Rong,
> >
> > Great job! Thank you for let me know your work.
> > I will read the source code of saury later.
> >
> > Although AMPLab is working to implement them, would you like to merge it
> > into Spark?
> >
> > Best,
> >
> > -- Yu Ishikawa
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8310.html
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 
--
Rong Gu
Department of Computer Science and Technology
State Key Laboratory for Novel Software Technology
Nanjing University
Phone: +86 15850682791
Email: gurongwal...@gmail.com
Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread
Missed the dev-list last email. Resent it again. Please ignore the
duplicated one.

2014-09-06 11:22 GMT+08:00 顾荣 :

> Hi All,
>
> This is RongGu from PasaLab at Nanjing Universtiy,China. Actually, we have
> been working on a distributed matrix operations library on Spark this
> summer. It is a Summer Code project hosted by CSDN and Intel Lab (
> http://code.csdn.net/os_camp/8/proposals/26). Previously, the codebase of
> the project is hosted on CSDN's code platform(
> https://code.csdn.net/u014252240/sparkmatrixlib) and we have been writing
> weekly reports on the blog(http://blog.csdn.net/u014252240).
>
> Now, the project comes to end now. I have moved the project to github
> these days. *Please see the link here *https://github.com/PasaLab/saury .
> We name the project Saury and provide documents to help people know  it
> better.
>
> Technically, we implement the matrix manipulation on Spark with block
> matrix parallel algorithms to distribute large scale matrix computation
> among cluster nodes. Also, we take advantage of the native linear algebra
> library(e.g BLAS)on each worker node to accelerate the computing process.
> That really makes a difference! See the preliminary performance evaluation
> report at
> https://github.com/PasaLab/saury/wiki/Performance-comparison-on-matrices-multiply
>
> Currently, we are working on adding more advanced matrix manipulation
> algorithms into Saury, such as matrix factorization and diagonalization
> algorithms. In fact, Saury contains an alpha version distributed LU
> factorization implementation now. Also, we are trying to use Tachyon to
> hold and share the matrix data across the cluster with faster speed.
>
> Best,
> Rong
>
> --
> --
> Rong Gu
> Department of Computer Science and Technology
> State Key Laboratory for Novel Software Technology
> Nanjing University
> Email: gurongwal...@gmail.com
> Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/
>
>
> 2014-09-06 1:29 GMT+08:00 Jeremy Freeman :
>
>> Hey all,
>>
>> Definitely agreed this would be nice! In our own work we've done
>> element-wise addition, subtraction, and scalar multiplication of similarly
>> partitioned matrices very efficiently with zipping. We've also done
>> matrix-matrix multiplication with zipping, but that only works in certain
>> circumstances, and it's otherwise very communication intensive (as Shivaram
>> says). Another tricky thing with addition / subtraction is how to handle
>> sparse vs. dense arrays.
>>
>> Would be happy to contribute anything we did, but definitely first worth
>> knowing what progress has been made from the AMPLab.
>>
>> -- Jeremy
>>
>> -
>> jeremy freeman, phd
>> neuroscientist
>> @thefreemanlab
>>
>> On Sep 5, 2014, at 12:23 PM, Patrick Wendell  wrote:
>>
>> > Hey There,
>> >
>> > I believe this is on the roadmap for the 1.2 next release. But
>> > Xiangrui can comment on this.
>> >
>> > - Patrick
>> >
>> > On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa
>> >  wrote:
>> >> Hi Evan,
>> >>
>> >> That's sounds interesting.
>> >>
>> >> Here is the ticket which I created.
>> >> https://issues.apache.org/jira/browse/SPARK-3416
>> >>
>> >> thanks,
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
>> >> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: dev-h...@spark.apache.org
>> >
>>
>>
>
>
> --
> --
> Rong Gu
> Department of Computer Science and Technology
> State Key Laboratory for Novel Software Technology
> Nanjing University
> Phone: +86 15850682791
> Email: gurongwal...@gmail.com
> Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/
>



-- 
--
Rong Gu
Department of Computer Science and Technology
State Key Laboratory for Novel Software Technology
Nanjing University
Phone: +86 15850682791
Email: gurongwal...@gmail.com
Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/