I want to multiply two large matrices (from csv files)using Spark and Scala
and save output.I use the following code

  val rows=file1.coalesce(1,false).map(x=>{
      val line=x.split(delimiter).map(_.toDouble)
      Vectors.sparse(line.length,
        line.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))

    })

    val rmat = new RowMatrix(rows)

    val dm=file2.coalesce(1,false).map(x=>{
      val line=x.split(delimiter).map(_.toDouble)
      Vectors.dense(line)
    })

    val ma = dm.map(_.toArray).take(dm.count.toInt)
    val localMat = Matrices.dense( dm.count.toInt,
      dm.take(1)(0).size,

      transpose(ma).flatten)

    // Multiply two matrices
    val s=rmat.multiply(localMat).rows

    s.map(x=>x.toArray.mkString(delimiter)).saveAsTextFile(OutputPath)

  }

  def transpose(m: Array[Array[Double]]): Array[Array[Double]] = {
    (for {
      c <- m(0).indices
    } yield m.map(_(c)) ).toArray
  }

When I save file it takes more time and output file has very large in
size.what is the optimized way to multiply two large files and save the
output to a text file ?

Reply via email to