Hi, I'm attempting to use the distributed matrix data structure BlockMatrix (Spark 1.5.0, scala) and having some issues when attempting to add two block matrices together (error attached below).
I'm constructing the two matrices by creating a collection of MatrixEntry's, putting that into CoordinateMatrix (with specifying Nrows,Ncols), and then using the CoordinateMatrix routine toBlockMatrix(Rpb,Cpb). For both matrices the Rpb/Cpb's are the same. Unfortunately when attempting to use the BlockMatrix.add routine I'm getting: 15/11/04 10:17:27 ERROR executor.Executor: Exception in task 0.0 in stage 11.0 (TID 30) java.lang.IllegalArgumentException: requirement failed: The last value of colPtrs must equal the number of elements. values.length: 9164, colPtrs.last: 5118 at scala.Predef$.require(Predef.scala:233) at org.apache.spark.mllib.linalg.SparseMatrix.<init>(Matrices.scala:373) at org.apache.spark.mllib.linalg.SparseMatrix.<init>(Matrices.scala:400) at org.apache.spark.mllib.linalg.Matrices$.fromBreeze(Matrices.scala:701) at org.apache.spark.mllib.linalg.distributed.BlockMatrix$$anonfun$5.apply(BlockMatrix.scala:321) at org.apache.spark.mllib.linalg.distributed.BlockMatrix$$anonfun$5.apply(BlockMatrix.scala:310) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:202) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/11/04 10:17:27 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 11.0 (TID 32, localhost, PROCESS_LOCAL, 2171 bytes) 15/11/04 10:17:27 INFO executor.Executor: Running task 2.0 in stage 11.0 (TID 32) 15/11/04 10:17:27 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 11.0 (TID 30, localhost): java.lang.IllegalArgumentException: requirement failed: The last value of colPtrs must equal the number of elements. values.length: 9164, colPtrs.last: 5118 at scala.Predef$.require(Predef.scala:233) at org.apache.spark.mllib.linalg.SparseMatrix.<init>(Matrices.scala:373) at org.apache.spark.mllib.linalg.SparseMatrix.<init>(Matrices.scala:400) at org.apache.spark.mllib.linalg.Matrices$.fromBreeze(Matrices.scala:701) at org.apache.spark.mllib.linalg.distributed.BlockMatrix$$anonfun$5.apply(BlockMatrix.scala:321) at org.apache.spark.mllib.linalg.distributed.BlockMatrix$$anonfun$5.apply(BlockMatrix.scala:310) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:202) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/11/04 10:17:27 ERROR scheduler.TaskSetManager: Task 0 in stage 11.0 failed 1 times; aborting job 15/11/04 10:17:27 INFO storage.ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks 15/11/04 10:17:27 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 15/11/04 10:17:27 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 4 blocks 15/11/04 10:17:27 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 15/11/04 10:17:27 INFO scheduler.TaskSchedulerImpl: Cancelling stage 11 15/11/04 10:17:27 INFO executor.Executor: Executor is trying to kill task 1.0 in stage 11.0 (TID 31) 15/11/04 10:17:27 INFO scheduler.TaskSchedulerImpl: Stage 11 was cancelled 15/11/04 10:17:27 INFO executor.Executor: Executor is trying to kill task 2.0 in stage 11.0 (TID 32) 15/11/04 10:17:27 INFO scheduler.DAGScheduler: Stage 11 (map at kmv.scala:26) failed in 0.114 s 15/11/04 10:17:27 INFO executor.Executor: Executor killed task 2.0 in stage 11.0 (TID 32) 15/11/04 10:17:27 INFO scheduler.DAGScheduler: Job 2 failed: reduce at CoordinateMatrix.scala:143, took 6.046350 s Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in stage 11.0 (TID 30, localhost): java.lang.IllegalArgumentException: requirement failed: The last value of colPtrs must equal the number of elements. values.length: 9164, colPtrs.last: 5118 at scala.Predef$.require(Predef.scala:233) at org.apache.spark.mllib.linalg.SparseMatrix.<init>(Matrices.scala:373) at org.apache.spark.mllib.linalg.SparseMatrix.<init>(Matrices.scala:400) at org.apache.spark.mllib.linalg.Matrices$.fromBreeze(Matrices.scala:701) at org.apache.spark.mllib.linalg.distributed.BlockMatrix$$anonfun$5.apply(BlockMatrix.scala:321) at org.apache.spark.mllib.linalg.distributed.BlockMatrix$$anonfun$5.apply(BlockMatrix.scala:310) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:202) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:56) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/11/04 10:17:27 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 11.0 (TID 32, localhost): TaskKilled (killed intentionally) 15/11/04 10:17:27 INFO executor.Executor: Executor killed task 1.0 in stage 11.0 (TID 31) 15/11/04 10:17:27 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 11.0 (TID 31, localhost): TaskKilled (killed intentionally) 15/11/04 10:17:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool Thanks for any help! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Problem-using-BlockMatrix-add-tp25273.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org