[ 
https://issues.apache.org/jira/browse/SPARK-20687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004975#comment-16004975
 ] 

Ignacio Bermudez Corrales commented on SPARK-20687:
---------------------------------------------------

When you try to do operations like addition or subtraction between 2 
mllib.distributed.BlockMatrices that store in blocks sparse matrices, these are 
operated using breeze and then converted back to Matrices again. Sometimes this 
conversion back produces crashes, even though the resulting matrix is valid, 
because this method in Matrices.fromBreeze doesn't extract correctly the data 
hold in CSC breeze matrix.

Unfortunately, I'm not able to show some code with block matrices, but I can 
show you some backtrace. I manually debugged the crashes, and found the 
culprit, so that's why I posted in the description a quite more simplified 
snippet that reproduces the error.

The snippet that causes the crash in BlockMatrix lines 374-379

{code:title:BlockMatrix.scala:blockMap}
          } else if (b.isEmpty) {
            new MatrixBlock((blockRowIndex, blockColIndex), a.head)
          } else {
            val result = binMap(a.head.asBreeze, b.head.asBreeze)
            new MatrixBlock((blockRowIndex, blockColIndex), 
Matrices.fromBreeze(result)) // <--not able to get results
          }
{code}


The trace after the operation between 2 spark block matrices:

{code:text}
Job aborted due to stage failure: Task 0 in stage 31.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 31.0 (TID 34, localhost, executor 
driver): java.lang.IllegalArgumentException: requirement failed: The last value 
of colPtrs must equal the number of elements. values.length: 28, colPtrs.last: 
15
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.mllib.linalg.SparseMatrix.<init>(Matrices.scala:590)
        at org.apache.spark.mllib.linalg.SparseMatrix.<init>(Matrices.scala:618)
        at 
org.apache.spark.mllib.linalg.Matrices$.fromBreeze(Matrices.scala:995)
        at 
org.apache.spark.mllib.linalg.distributed.BlockMatrix$$anonfun$10.apply(BlockMatrix.scala:378)
        at 
org.apache.spark.mllib.linalg.distributed.BlockMatrix$$anonfun$10.apply(BlockMatrix.scala:365)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
        at 
scala.collection.TraversableOnce$class.fold(TraversableOnce.scala:212)
        at scala.collection.AbstractIterator.fold(Iterator.scala:1336)
        at 
org.apache.spark.rdd.RDD$$anonfun$fold$1$$anonfun$20.apply(RDD.scala:1087)
        at 
org.apache.spark.rdd.RDD$$anonfun$fold$1$$anonfun$20.apply(RDD.scala:1087)
        at 
org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2119)
        at 
org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2119)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}


> mllib.Matrices.fromBreeze may crash when converting breeze CSCMatrix
> --------------------------------------------------------------------
>
>                 Key: SPARK-20687
>                 URL: https://issues.apache.org/jira/browse/SPARK-20687
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 2.1.1
>            Reporter: Ignacio Bermudez Corrales
>            Priority: Minor
>
> Conversion of Breeze sparse matrices to Matrix is broken when matrices are 
> product of certain operations. This problem I think is caused by the update 
> method in Breeze CSCMatrix when they add provisional zeros to the data for 
> efficiency.
> This bug is serious and may affect at least BlockMatrix addition and 
> substraction
> http://stackoverflow.com/questions/33528555/error-thrown-when-using-blockmatrix-add/43883458#43883458
> The following code, reproduces the bug (Check test("breeze conversion bug"))
> https://github.com/ghoto/spark/blob/test-bug/CSCMatrixBreeze/mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala
> {code:title=MatricesSuite.scala|borderStyle=solid}
>   test("breeze conversion bug") {
>     // (2, 0, 0)
>     // (2, 0, 0)
>     val mat1Brz = Matrices.sparse(2, 3, Array(0, 2, 2, 2), Array(0, 1), 
> Array(2, 2)).asBreeze
>     // (2, 1E-15, 1E-15)
>     // (2, 1E-15, 1E-15
>     val mat2Brz = Matrices.sparse(2, 3, Array(0, 2, 4, 6), Array(0, 0, 0, 1, 
> 1, 1), Array(2, 1E-15, 1E-15, 2, 1E-15, 1E-15)).asBreeze
>     // The following shouldn't break
>     val t01 = mat1Brz - mat1Brz
>     val t02 = mat2Brz - mat2Brz
>     val t02Brz = Matrices.fromBreeze(t02)
>     val t01Brz = Matrices.fromBreeze(t01)
>     val t1Brz = mat1Brz - mat2Brz
>     val t2Brz = mat2Brz - mat1Brz
>     // The following ones should break
>     val t1 = Matrices.fromBreeze(t1Brz)
>     val t2 = Matrices.fromBreeze(t2Brz)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to