Repository: spark
Updated Branches:
  refs/heads/master a2b3b6762 -> 06dda1d58


[SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash when converting from 
Breeze sparse matrix

## What changes were proposed in this pull request?

When two Breeze SparseMatrices are operated, the result matrix may contain 
provisional 0 values extra in rowIndices and data arrays. This causes an 
incoherence with the colPtrs data, but Breeze get away with this incoherence by 
keeping a counter of the valid data.

In spark, when this matrices are converted to SparseMatrices, Sparks relies 
solely on rowIndices, data, and colPtrs, but these might be incorrect because 
of breeze internal hacks. Therefore, we need to slice both rowIndices and data, 
using their counter of active data

This method is at least called by BlockMatrix when performing distributed block 
operations, causing exceptions on valid operations.

See 
http://stackoverflow.com/questions/33528555/error-thrown-when-using-blockmatrix-add

## How was this patch tested?

Added a test to MatricesSuite that verifies that the conversions are valid and 
that code doesn't crash. Originally the same code would crash on Spark.

Bugfix for https://issues.apache.org/jira/browse/SPARK-20687

Author: Ignacio Bermudez <ignaciobermu...@gmail.com>
Author: Ignacio Bermudez Corrales <icorra...@splunk.com>

Closes #17940 from ghoto/bug-fix/SPARK-20687.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/06dda1d5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/06dda1d5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/06dda1d5

Branch: refs/heads/master
Commit: 06dda1d58f8670e996921e935d5f5402d664699e
Parents: a2b3b67
Author: Ignacio Bermudez <ignaciobermu...@gmail.com>
Authored: Mon May 22 10:27:28 2017 +0100
Committer: Sean Owen <so...@cloudera.com>
Committed: Mon May 22 10:27:28 2017 +0100

----------------------------------------------------------------------
 .../apache/spark/mllib/linalg/Matrices.scala    | 11 ++++++++++-
 .../spark/mllib/linalg/MatricesSuite.scala      | 20 ++++++++++++++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/06dda1d5/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
index 6c39fe5..2b2b5fe 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
@@ -992,7 +992,16 @@ object Matrices {
         new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
       case sm: BSM[Double] =>
         // There is no isTranspose flag for sparse matrices in Breeze
-        new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
+        val nsm = if (sm.rowIndices.length > sm.activeSize) {
+          // This sparse matrix has trailing zeros.
+          // Remove them by compacting the matrix.
+          val csm = sm.copy
+          csm.compact()
+          csm
+        } else {
+          sm
+        }
+        new SparseMatrix(nsm.rows, nsm.cols, nsm.colPtrs, nsm.rowIndices, 
nsm.data)
       case _ =>
         throw new UnsupportedOperationException(
           s"Do not support conversion from type ${breeze.getClass.getName}.")

http://git-wip-us.apache.org/repos/asf/spark/blob/06dda1d5/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala 
b/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
index 5637569..93c00d8 100644
--- a/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
@@ -513,6 +513,26 @@ class MatricesSuite extends SparkFunSuite {
     Matrices.fromBreeze(sum)
   }
 
+  test("Test FromBreeze when Breeze.CSCMatrix.rowIndices has trailing zeros. - 
SPARK-20687") {
+    // (2, 0, 0)
+    // (2, 0, 0)
+    val mat1Brz = Matrices.sparse(2, 3, Array(0, 2, 2, 2), Array(0, 1), 
Array(2, 2)).asBreeze
+    // (2, 1E-15, 1E-15)
+    // (2, 1E-15, 1E-15)
+    val mat2Brz = Matrices.sparse(2, 3,
+      Array(0, 2, 4, 6),
+      Array(0, 0, 0, 1, 1, 1),
+      Array(2, 1E-15, 1E-15, 2, 1E-15, 1E-15)).asBreeze
+    val t1Brz = mat1Brz - mat2Brz
+    val t2Brz = mat2Brz - mat1Brz
+    // The following operations raise exceptions on un-patch 
Matrices.fromBreeze
+    val t1 = Matrices.fromBreeze(t1Brz)
+    val t2 = Matrices.fromBreeze(t2Brz)
+    // t1 == t1Brz && t2 == t2Brz
+    assert((t1.asBreeze - t1Brz).iterator.map((x) => math.abs(x._2)).sum < 
1E-15)
+    assert((t2.asBreeze - t2Brz).iterator.map((x) => math.abs(x._2)).sum < 
1E-15)
+  }
+
   test("row/col iterator") {
     val dm = new DenseMatrix(3, 2, Array(0, 1, 2, 3, 4, 0))
     val sm = dm.toSparse


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to