Re: Optimized way to multiply two large matrices and save output using Spark and Scala

Burak Yavuz Wed, 13 Jan 2016 11:30:33 -0800

BlockMatrix.multiply is the suggested method of multiplying two large
matrices. Is there a reason that you didn't use BlockMatrices?


You can load the matrices and convert to and from RowMatrix. If it's in
sparse format (i, j, v), then you can also use the CoordinateMatrix to
load, BlockMatrix to multiply, and CoordinateMatrix to save it back again.

Thanks,
Burak

On Wed, Jan 13, 2016 at 8:16 PM, Devi P.V <devip2...@gmail.com> wrote:

> I want to multiply two large matrices (from csv files)using Spark and
> Scala and save output.I use the following code
>
>   val rows=file1.coalesce(1,false).map(x=>{
>       val line=x.split(delimiter).map(_.toDouble)
>       Vectors.sparse(line.length,
>         line.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))
>
>     })
>
>     val rmat = new RowMatrix(rows)
>
>     val dm=file2.coalesce(1,false).map(x=>{
>       val line=x.split(delimiter).map(_.toDouble)
>       Vectors.dense(line)
>     })
>
>     val ma = dm.map(_.toArray).take(dm.count.toInt)
>     val localMat = Matrices.dense( dm.count.toInt,
>       dm.take(1)(0).size,
>
>       transpose(ma).flatten)
>
>     // Multiply two matrices
>     val s=rmat.multiply(localMat).rows
>
>     s.map(x=>x.toArray.mkString(delimiter)).saveAsTextFile(OutputPath)
>
>   }
>
>   def transpose(m: Array[Array[Double]]): Array[Array[Double]] = {
>     (for {
>       c <- m(0).indices
>     } yield m.map(_(c)) ).toArray
>   }
>
> When I save file it takes more time and output file has very large in
> size.what is the optimized way to multiply two large files and save the
> output to a text file ?
>

Re: Optimized way to multiply two large matrices and save output using Spark and Scala

Reply via email to