Repository: spark
Updated Branches:
  refs/heads/branch-1.5 7e5a90651 -> 13f0f4892


[SPARK-11507][MLLIB] add compact in Matrices fromBreeze

jira: https://issues.apache.org/jira/browse/SPARK-11507
"In certain situations when adding two block matrices, I get an error regarding 
colPtr and the operation fails. External issue URL includes full error and code 
for reproducing the problem."

root cause: colPtr.last does NOT always equal to values.length in breeze 
SCSMatrix, which fails the require in SparseMatrix.

easy step to repro:
```
val m1: BM[Double] = new CSCMatrix[Double] (Array (1.0, 1, 1), 3, 3, Array (0, 
1, 2, 3), Array (0, 1, 2) )
val m2: BM[Double] = new CSCMatrix[Double] (Array (1.0, 2, 2, 4), 3, 3, Array 
(0, 0, 2, 4), Array (1, 2, 1, 2) )
val sum = m1 + m2
Matrices.fromBreeze(sum)
```

Solution: By checking the code in 
[CSCMatrix](https://github.com/scalanlp/breeze/blob/28000a7b901bc3cfbbbf5c0bce1d0a5dda8281b0/math/src/main/scala/breeze/linalg/CSCMatrix.scala),
 CSCMatrix in breeze can have extra zeros in the end of data array. Invoking 
compact will make sure it aligns with the require of SparseMatrix. This should 
add limited overhead as the actual compact operation is only performed when 
necessary.

Author: Yuhao Yang <hhb...@gmail.com>

Closes #9520 from hhbyyh/matricesFromBreeze.

(cherry picked from commit ca458618d8ee659ffa9a081083cd475a440fa8ff)
Signed-off-by: Joseph K. Bradley <jos...@databricks.com>

Conflicts:
        mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/13f0f489
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/13f0f489
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/13f0f489

Branch: refs/heads/branch-1.5
Commit: 13f0f4892c5e9dab0b1758748250826f705ad915
Parents: 7e5a906
Author: Yuhao Yang <hhb...@gmail.com>
Authored: Wed Mar 30 15:58:19 2016 -0700
Committer: Joseph K. Bradley <jos...@databricks.com>
Committed: Wed Mar 30 16:15:19 2016 -0700

----------------------------------------------------------------------
 .../scala/org/apache/spark/mllib/linalg/Matrices.scala  | 10 +++++++++-
 .../org/apache/spark/mllib/linalg/MatricesSuite.scala   | 12 ++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/13f0f489/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
index 013c511..7ba5e44 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
@@ -874,8 +874,16 @@ object Matrices {
       case dm: BDM[Double] =>
         new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
       case sm: BSM[Double] =>
+        // Spark-11507. work around breeze issue 479.
+        val mat = if (sm.colPtrs.last != sm.data.length) {
+          val matCopy = sm.copy
+          matCopy.compact()
+          matCopy
+        } else {
+          sm
+        }
         // There is no isTranspose flag for sparse matrices in Breeze
-        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
+        new SparseMatrix(mat.rows, mat.cols, mat.colPtrs, mat.rowIndices, 
mat.data)
       case _ =>
         throw new UnsupportedOperationException(
           s"Do not support conversion from type ${breeze.getClass.getName}.")

http://git-wip-us.apache.org/repos/asf/spark/blob/13f0f489/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala 
b/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
index 5e28167..0fb6bd0 100644
--- a/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
@@ -19,6 +19,7 @@ package org.apache.spark.mllib.linalg
 
 import java.util.Random
 
+import breeze.linalg.{CSCMatrix, Matrix => BM}
 import org.mockito.Mockito.when
 import org.scalatest.mock.MockitoSugar._
 import scala.collection.mutable.{Map => MutableMap}
@@ -487,4 +488,15 @@ class MatricesSuite extends SparkFunSuite {
     assert(sm1.numNonzeros === 1)
     assert(sm1.numActives === 3)
   }
+
+  test("fromBreeze with sparse matrix") {
+    // colPtr.last does NOT always equal to values.length in breeze SCSMatrix 
and
+    // invocation of compact() may be necessary. Refer to SPARK-11507
+    val bm1: BM[Double] = new CSCMatrix[Double](
+      Array(1.0, 1, 1), 3, 3, Array(0, 1, 2, 3), Array(0, 1, 2))
+    val bm2: BM[Double] = new CSCMatrix[Double](
+      Array(1.0, 2, 2, 4), 3, 3, Array(0, 0, 2, 4), Array(1, 2, 1, 2))
+    val sum = bm1 + bm2
+    Matrices.fromBreeze(sum)
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to