Re: [mllib] Add multiplying large scale matrices

2014-09-09 Thread 顾荣
Hi All,

Sorry for my late reply!

Yu Ishikawa,Thanks for your interests in Saury project. You are welcomed to
try that out. If you have questions about that, please email me. We are
keeping improving performance/adding features for the project.

Xiangrui, thanks for your encouragement. If you have any problems with my
CSDN reports, please feel free to contact me. We had some design for Saury
on our lab's private JIRA which is in Chinese. I will translate into
English then share it to you these days. Acutally, I also have surveyed the
related algorithms/systems before we started the Saury project. The survey
is attached in this email, not on CSDN report. We also had considered the
2.5D algorithm for reducing communication. However, at that time, MLlib did
not have a distributed block matrix representation. So, we decided to
firstly implement the distributed matrix multiplication on the
IndexRowMatrix as time is limited for the Summer Code project. Also, as far
as we know, nobody had tried that at that time. Actually, adopting 2.5D
algorithm to reduce network communication is on our roadmap. We are also
planning to do that in the next days.

Best,
Rong


2014-09-08 15:31 GMT+08:00 Xiangrui Meng :

> Sorry for my late reply! I'm also very interested in the
> implementation of distributed matrix multiplication. As Shivaram
> mentioned, the communication is the concern here. But maybe we can
> start with a reasonable implementation and then iterate on its
> performance. It would be great if eventually we can implement an
> algorithm close to the 2.5D algorithm
> (http://www.netlib.org/lapack/lawnspdf/lawn248.pdf).
>
> I created two JIRAs for this topic:
>
> 1. Distributed block matrix:
> https://issues.apache.org/jira/browse/SPARK-3434
> 2. Distributed matrix multiplication:
> https://issues.apache.org/jira/browse/SPARK-3435
>
> We can move our discussion there.
>
> Rong, I'm really happy to see the Saury project. It would be great if
> you can share your design and experience (maybe on the JIRA page so it
> is easier to track). I will read the reports on CSDN and ping you if I
> ran into problems. Thanks!
>
> Best,
> Xiangrui
>
> On Sat, Sep 6, 2014 at 1:28 AM, Yu Ishikawa
>  wrote:
> > Hi Rong,
> >
> > Great job! Thank you for let me know your work.
> > I will read the source code of saury later.
> >
> > Although AMPLab is working to implement them, would you like to merge it
> > into Spark?
> >
> > Best,
> >
> > -- Yu Ishikawa
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8310.html
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 
--
Rong Gu
Department of Computer Science and Technology
State Key Laboratory for Novel Software Technology
Nanjing University
Phone: +86 15850682791
Email: gurongwal...@gmail.com
Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [mllib] Add multiplying large scale matrices

2014-09-08 Thread Yu Ishikawa
Hi Xiangrui Meng,

Thank you for your comment and creating tickets.

The ticket which I created would be moved to your tickets.
I will close my ticket, and then will link it to yours later.

Best,
Yu Ishikawa



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8333.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-08 Thread Xiangrui Meng
Sorry for my late reply! I'm also very interested in the
implementation of distributed matrix multiplication. As Shivaram
mentioned, the communication is the concern here. But maybe we can
start with a reasonable implementation and then iterate on its
performance. It would be great if eventually we can implement an
algorithm close to the 2.5D algorithm
(http://www.netlib.org/lapack/lawnspdf/lawn248.pdf).

I created two JIRAs for this topic:

1. Distributed block matrix: https://issues.apache.org/jira/browse/SPARK-3434
2. Distributed matrix multiplication:
https://issues.apache.org/jira/browse/SPARK-3435

We can move our discussion there.

Rong, I'm really happy to see the Saury project. It would be great if
you can share your design and experience (maybe on the JIRA page so it
is easier to track). I will read the reports on CSDN and ping you if I
ran into problems. Thanks!

Best,
Xiangrui

On Sat, Sep 6, 2014 at 1:28 AM, Yu Ishikawa
 wrote:
> Hi Rong,
>
> Great job! Thank you for let me know your work.
> I will read the source code of saury later.
>
> Although AMPLab is working to implement them, would you like to merge it
> into Spark?
>
> Best,
>
> -- Yu Ishikawa
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8310.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-06 Thread Yu Ishikawa
Hi Rong, 

Great job! Thank you for let me know your work.
I will read the source code of saury later.

Although AMPLab is working to implement them, would you like to merge it
into Spark?

Best,

-- Yu Ishikawa




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8310.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-06 Thread Yu Ishikawa
Hi  Jeremy, 

Great work!

I'm interested in your work. If there is your code on github, could you let
me know?

-- Yu Ishikawa



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8309.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread 顾荣
Missed the dev-list last email. Resent it again. Please ignore the
duplicated one.

2014-09-06 11:22 GMT+08:00 顾荣 :

> Hi All,
>
> This is RongGu from PasaLab at Nanjing Universtiy,China. Actually, we have
> been working on a distributed matrix operations library on Spark this
> summer. It is a Summer Code project hosted by CSDN and Intel Lab (
> http://code.csdn.net/os_camp/8/proposals/26). Previously, the codebase of
> the project is hosted on CSDN's code platform(
> https://code.csdn.net/u014252240/sparkmatrixlib) and we have been writing
> weekly reports on the blog(http://blog.csdn.net/u014252240).
>
> Now, the project comes to end now. I have moved the project to github
> these days. *Please see the link here *https://github.com/PasaLab/saury .
> We name the project Saury and provide documents to help people know  it
> better.
>
> Technically, we implement the matrix manipulation on Spark with block
> matrix parallel algorithms to distribute large scale matrix computation
> among cluster nodes. Also, we take advantage of the native linear algebra
> library(e.g BLAS)on each worker node to accelerate the computing process.
> That really makes a difference! See the preliminary performance evaluation
> report at
> https://github.com/PasaLab/saury/wiki/Performance-comparison-on-matrices-multiply
>
> Currently, we are working on adding more advanced matrix manipulation
> algorithms into Saury, such as matrix factorization and diagonalization
> algorithms. In fact, Saury contains an alpha version distributed LU
> factorization implementation now. Also, we are trying to use Tachyon to
> hold and share the matrix data across the cluster with faster speed.
>
> Best,
> Rong
>
> --
> --
> Rong Gu
> Department of Computer Science and Technology
> State Key Laboratory for Novel Software Technology
> Nanjing University
> Email: gurongwal...@gmail.com
> Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/
>
>
> 2014-09-06 1:29 GMT+08:00 Jeremy Freeman :
>
>> Hey all,
>>
>> Definitely agreed this would be nice! In our own work we've done
>> element-wise addition, subtraction, and scalar multiplication of similarly
>> partitioned matrices very efficiently with zipping. We've also done
>> matrix-matrix multiplication with zipping, but that only works in certain
>> circumstances, and it's otherwise very communication intensive (as Shivaram
>> says). Another tricky thing with addition / subtraction is how to handle
>> sparse vs. dense arrays.
>>
>> Would be happy to contribute anything we did, but definitely first worth
>> knowing what progress has been made from the AMPLab.
>>
>> -- Jeremy
>>
>> -
>> jeremy freeman, phd
>> neuroscientist
>> @thefreemanlab
>>
>> On Sep 5, 2014, at 12:23 PM, Patrick Wendell  wrote:
>>
>> > Hey There,
>> >
>> > I believe this is on the roadmap for the 1.2 next release. But
>> > Xiangrui can comment on this.
>> >
>> > - Patrick
>> >
>> > On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa
>> >  wrote:
>> >> Hi Evan,
>> >>
>> >> That's sounds interesting.
>> >>
>> >> Here is the ticket which I created.
>> >> https://issues.apache.org/jira/browse/SPARK-3416
>> >>
>> >> thanks,
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
>> >> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: dev-h...@spark.apache.org
>> >
>>
>>
>
>
> --
> --
> Rong Gu
> Department of Computer Science and Technology
> State Key Laboratory for Novel Software Technology
> Nanjing University
> Phone: +86 15850682791
> Email: gurongwal...@gmail.com
> Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/
>



-- 
--
Rong Gu
Department of Computer Science and Technology
State Key Laboratory for Novel Software Technology
Nanjing University
Phone: +86 15850682791
Email: gurongwal...@gmail.com
Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/


Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Jeremy Freeman
Hey all, 

Definitely agreed this would be nice! In our own work we've done element-wise 
addition, subtraction, and scalar multiplication of similarly partitioned 
matrices very efficiently with zipping. We've also done matrix-matrix 
multiplication with zipping, but that only works in certain circumstances, and 
it's otherwise very communication intensive (as Shivaram says). Another tricky 
thing with addition / subtraction is how to handle sparse vs. dense arrays.

Would be happy to contribute anything we did, but definitely first worth 
knowing what progress has been made from the AMPLab.

-- Jeremy

-
jeremy freeman, phd
neuroscientist
@thefreemanlab

On Sep 5, 2014, at 12:23 PM, Patrick Wendell  wrote:

> Hey There,
> 
> I believe this is on the roadmap for the 1.2 next release. But
> Xiangrui can comment on this.
> 
> - Patrick
> 
> On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa
>  wrote:
>> Hi Evan,
>> 
>> That's sounds interesting.
>> 
>> Here is the ticket which I created.
>> https://issues.apache.org/jira/browse/SPARK-3416
>> 
>> thanks,
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
>> Sent from the Apache Spark Developers List mailing list archive at 
>> Nabble.com.
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 



Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Patrick Wendell
Hey There,

I believe this is on the roadmap for the 1.2 next release. But
Xiangrui can comment on this.

- Patrick

On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa
 wrote:
> Hi Evan,
>
> That's sounds interesting.
>
> Here is the ticket which I created.
> https://issues.apache.org/jira/browse/SPARK-3416
>
> thanks,
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Shivaram Venkataraman
FWIW matrix multiplication is extremely communication intensive when
you have two row partitioned matrices and there are often other ways
to solve problems. Regardless, it would be good to have a more
complete matrix library and it would be good to contribute some of the
stuff we have done in the AMPLab to MLLib.

Shivaram

On Fri, Sep 5, 2014 at 9:12 AM, Evan R. Sparks  wrote:
> There's some work on this going on in the AMP Lab. Create a ticket and we
> can update with our progress so that we don't duplicate effort.
>
>
> On Fri, Sep 5, 2014 at 8:18 AM, Yu Ishikawa 
> wrote:
>
>> Hi RJ,
>>
>> Thank you for your comment. I am interested in to have other matrix
>> operations too.
>> I will create a JIRA issue in the first place.
>>
>> thanks,
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Yu Ishikawa
Hi Evan, 

That's sounds interesting. 

Here is the ticket which I created.
https://issues.apache.org/jira/browse/SPARK-3416

thanks,



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8296.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Evan R. Sparks
There's some work on this going on in the AMP Lab. Create a ticket and we
can update with our progress so that we don't duplicate effort.


On Fri, Sep 5, 2014 at 8:18 AM, Yu Ishikawa 
wrote:

> Hi RJ,
>
> Thank you for your comment. I am interested in to have other matrix
> operations too.
> I will create a JIRA issue in the first place.
>
> thanks,
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Yu Ishikawa
Hi RJ,

Thank you for your comment. I am interested in to have other matrix
operations too.
I will create a JIRA issue in the first place.

thanks,



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291p8293.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread RJ Nowling
I think it would be interesting to have a variety of matrix operations
(multiplication, addition / subtraction, powers, scalar multiply, etc.)
available in Spark.

Diagonalization may be more difficult but iterative approximation
approaches may be quite amenable.


On Fri, Sep 5, 2014 at 5:26 AM, Yu Ishikawa 
wrote:

> Hi all,
>
> It seems that there is a method to multiply a RowMatrix and a (local)
> Matrix.
> However, there is not a method to multiply a large scale matrix and another
> one in Spark.
> It would be helpful. Does anyone have a plan to add multiplying large scale
> matrices?
> Or shouldn't  we support it in Spark?
>
> thanks,
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 
em rnowl...@gmail.com
c 954.496.2314


[mllib] Add multiplying large scale matrices

2014-09-05 Thread Yu Ishikawa
Hi all, 

It seems that there is a method to multiply a RowMatrix and a (local)
Matrix. 
However, there is not a method to multiply a large scale matrix and another
one in Spark.
It would be helpful. Does anyone have a plan to add multiplying large scale
matrices? 
Or shouldn't  we support it in Spark?

thanks,



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Add-multiplying-large-scale-matrices-tp8291.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org