matrix computation in spark

2014-11-17 Thread liaoyuxi
Hi,
Matrix computation is critical for algorithm efficiency like least square, 
Kalman filter and so on.
For now, the mllib module offers limited linear algebra on matrix, especially 
for distributed matrix.

We have been working on establishing distributed matrix computation APIs based 
on data structures in MLlib.
The main idea is to partition the matrix into sub-blocks, based on the strategy 
in the following paper.
http://www.cs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf
In our experiment, it's communication-optimal.
But operations like factorization may not be appropriate to carry out in blocks.

Any suggestions and guidance are welcome.

Thanks,
Yuxi



Re: matrix computation in spark

2014-11-17 Thread Zongheng Yang
There's been some work at the AMPLab on a distributed matrix library on top
of Spark; see here [1]. In particular, the repo contains a couple
factorization algorithms.

[1] https://github.com/amplab/ml-matrix

Zongheng

On Mon Nov 17 2014 at 7:34:17 PM liaoyuxi liaoy...@huawei.com wrote:

 Hi,
 Matrix computation is critical for algorithm efficiency like least square,
 Kalman filter and so on.
 For now, the mllib module offers limited linear algebra on matrix,
 especially for distributed matrix.

 We have been working on establishing distributed matrix computation APIs
 based on data structures in MLlib.
 The main idea is to partition the matrix into sub-blocks, based on the
 strategy in the following paper.
 http://www.cs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf
 In our experiment, it's communication-optimal.
 But operations like factorization may not be appropriate to carry out in
 blocks.

 Any suggestions and guidance are welcome.

 Thanks,
 Yuxi




Re: matrix computation in spark

2014-11-17 Thread 顾荣
Hey Yuxi,

We also have implemented a distributed matrix multiplication library in
PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We
implemented three distributed matrix multiplication algorithms on Spark. As
we see, communication-optimal does not always means the total-optimal.
Thus, besides the CARMA matrix multiplication you mentioned, we also
implemented the Block-splitting matrix multiplication and Broadcast matrix
multiplication. They are more efficient than the CARMA matrix
multiplication for some situations, for example a large matrix multiplies a
small matrix.

Actually, We have shared the work on Spark Meetup@Beijing on October 26th.(
http://www.meetup.com/spark-user-beijing-Meetup/events/210422112/ ). The
slide can be download from the archive here
http://pan.baidu.com/s/1dDoyHX3#path=%252Fmeetup-3rd

Best,
Rong

2014-11-18 13:11 GMT+08:00 顾荣 gurongwal...@gmail.com:

 Hey Yuxi,

 We also have implemented a distributed matrix multiplication library in
 PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We
 implemented three distributed matrix multiplication algorithms on Spark. As
 we see, communication-optimal does not always means the total-optimal.
 Thus, besides the CARMA matrix multiplication you mentioned, we also
 implemented the Block-splitting matrix multiplication and Broadcast matrix
 multiplication. They are more efficient than the CARMA matrix
 multiplication for some situations, for example a large matrix multiplies a
 small matrix.

 Actually, We have shared the work on Spark Meetup@Beijing on October
 26th.( http://www.meetup.com/spark-user-beijing-Meetup/events/210422112/
 ). The slide is also attached in this mail.

 Best,
 Rong

 2014-11-18 11:36 GMT+08:00 Zongheng Yang zonghen...@gmail.com:

 There's been some work at the AMPLab on a distributed matrix library on
 top
 of Spark; see here [1]. In particular, the repo contains a couple
 factorization algorithms.

 [1] https://github.com/amplab/ml-matrix

 Zongheng

 On Mon Nov 17 2014 at 7:34:17 PM liaoyuxi liaoy...@huawei.com wrote:

  Hi,
  Matrix computation is critical for algorithm efficiency like least
 square,
  Kalman filter and so on.
  For now, the mllib module offers limited linear algebra on matrix,
  especially for distributed matrix.
 
  We have been working on establishing distributed matrix computation APIs
  based on data structures in MLlib.
  The main idea is to partition the matrix into sub-blocks, based on the
  strategy in the following paper.
  http://www.cs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf
  In our experiment, it's communication-optimal.
  But operations like factorization may not be appropriate to carry out in
  blocks.
 
  Any suggestions and guidance are welcome.
 
  Thanks,
  Yuxi
 
 




 --
 --
 Rong Gu
 Department of Computer Science and Technology
 State Key Laboratory for Novel Software Technology
 Nanjing University
 Phone: +86 15850682791
 Email: gurongwal...@gmail.com
 Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/




-- 
--
Rong Gu
Department of Computer Science and Technology
State Key Laboratory for Novel Software Technology
Nanjing University
Phone: +86 15850682791
Email: gurongwal...@gmail.com
Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/


答复: matrix computation in spark

2014-11-17 Thread liaoyuxi
Hi,
I checked the work of ml-matrix. For now, it doesn’t include matrix multiply 
and LU decomposition. What’s your plan? Can we contribute our work to these 
parts?
Otherwise, the block number of row/column is decided manually, As we mentioned, 
the CARMA method in paper is communication-optimal.

发件人: Zongheng Yang [mailto:zonghen...@gmail.com]
发送时间: 2014年11月18日 11:37
收件人: liaoyuxi; d...@spark.incubator.apache.org
抄送: Shivaram Venkataraman
主题: Re: matrix computation in spark

There's been some work at the AMPLab on a distributed matrix library on top of 
Spark; see here [1]. In particular, the repo contains a couple factorization 
algorithms.

[1] https://github.com/amplab/ml-matrix

Zongheng

On Mon Nov 17 2014 at 7:34:17 PM liaoyuxi 
liaoy...@huawei.commailto:liaoy...@huawei.com wrote:
Hi,
Matrix computation is critical for algorithm efficiency like least square, 
Kalman filter and so on.
For now, the mllib module offers limited linear algebra on matrix, especially 
for distributed matrix.

We have been working on establishing distributed matrix computation APIs based 
on data structures in MLlib.
The main idea is to partition the matrix into sub-blocks, based on the strategy 
in the following paper.
http://www.cs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf
In our experiment, it's communication-optimal.
But operations like factorization may not be appropriate to carry out in blocks.

Any suggestions and guidance are welcome.

Thanks,
Yuxi


Re: matrix computation in spark

2014-11-17 Thread Reza Zadeh
Hi Yuxi,

We are integrating the ml-matrix from the AMPlab repo into MLlib, tracked
by this JIRA: https://issues.apache.org/jira/browse/SPARK-3434

We already have matrix multiply, but are missing LU decomposition. Could
you please track that JIRA, once the initial design is in, we can sync on
how to contribute LU decomposition.

Let's move the discussion to the JIRA.

Thanks!

On Mon, Nov 17, 2014 at 9:49 PM, 顾荣 gurongwal...@gmail.com wrote:

 Hey Yuxi,

 We also have implemented a distributed matrix multiplication library in
 PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We
 implemented three distributed matrix multiplication algorithms on Spark. As
 we see, communication-optimal does not always means the total-optimal.
 Thus, besides the CARMA matrix multiplication you mentioned, we also
 implemented the Block-splitting matrix multiplication and Broadcast matrix
 multiplication. They are more efficient than the CARMA matrix
 multiplication for some situations, for example a large matrix multiplies a
 small matrix.

 Actually, We have shared the work on Spark Meetup@Beijing on October
 26th.(
 http://www.meetup.com/spark-user-beijing-Meetup/events/210422112/ ). The
 slide can be download from the archive here
 http://pan.baidu.com/s/1dDoyHX3#path=%252Fmeetup-3rd

 Best,
 Rong

 2014-11-18 13:11 GMT+08:00 顾荣 gurongwal...@gmail.com:

  Hey Yuxi,
 
  We also have implemented a distributed matrix multiplication library in
  PasaLab. The repo is host on here https://github.com/PasaLab/marlin . We
  implemented three distributed matrix multiplication algorithms on Spark.
 As
  we see, communication-optimal does not always means the total-optimal.
  Thus, besides the CARMA matrix multiplication you mentioned, we also
  implemented the Block-splitting matrix multiplication and Broadcast
 matrix
  multiplication. They are more efficient than the CARMA matrix
  multiplication for some situations, for example a large matrix
 multiplies a
  small matrix.
 
  Actually, We have shared the work on Spark Meetup@Beijing on October
  26th.( http://www.meetup.com/spark-user-beijing-Meetup/events/210422112/
  ). The slide is also attached in this mail.
 
  Best,
  Rong
 
  2014-11-18 11:36 GMT+08:00 Zongheng Yang zonghen...@gmail.com:
 
  There's been some work at the AMPLab on a distributed matrix library on
  top
  of Spark; see here [1]. In particular, the repo contains a couple
  factorization algorithms.
 
  [1] https://github.com/amplab/ml-matrix
 
  Zongheng
 
  On Mon Nov 17 2014 at 7:34:17 PM liaoyuxi liaoy...@huawei.com wrote:
 
   Hi,
   Matrix computation is critical for algorithm efficiency like least
  square,
   Kalman filter and so on.
   For now, the mllib module offers limited linear algebra on matrix,
   especially for distributed matrix.
  
   We have been working on establishing distributed matrix computation
 APIs
   based on data structures in MLlib.
   The main idea is to partition the matrix into sub-blocks, based on the
   strategy in the following paper.
   http://www.cs.berkeley.edu/~odedsc/papers/bfsdfs-mm-ipdps13.pdf
   In our experiment, it's communication-optimal.
   But operations like factorization may not be appropriate to carry out
 in
   blocks.
  
   Any suggestions and guidance are welcome.
  
   Thanks,
   Yuxi
  
  
 
 
 
 
  --
  --
  Rong Gu
  Department of Computer Science and Technology
  State Key Laboratory for Novel Software Technology
  Nanjing University
  Phone: +86 15850682791
  Email: gurongwal...@gmail.com
  Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/
 



 --
 --
 Rong Gu
 Department of Computer Science and Technology
 State Key Laboratory for Novel Software Technology
 Nanjing University
 Phone: +86 15850682791
 Email: gurongwal...@gmail.com
 Homepage: http://pasa-bigdata.nju.edu.cn/people/ronggu/