[ https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fei Hu updated SYSTEMML-1762: ----------------------------- Description: When running the [distributed MNIST LeNet example | https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml], it works well in the hybrid mode. But in the Spark mode, there are some errors about {{java.lang.NullPointerException}} and {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. The involved functions are {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense(org.apache.sysml.runtime.matrix.data.MatrixBlock, long, long, java.util.HashMap<org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock>, long, long, long, long, int, int, boolean)}}. The reason is that the output matrix index computed by {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} does not match the keys in the {{HashMap<MatrixIndexes,MatrixBlock> rix}}. To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} could be used to run the distributed MNIST example. In addition, if adding some codes to ignore the null output matrix block from {{MatrixBlock out = rix.get(ixtmp)}}, the distributed MNIST example could run in the Spark mode, but the result may not be right. was: When running the [distributed MNIST LeNet example | https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml], it works well in the hybrid mode. But in the Spark mode, there are some errors about {{java.lang.NullPointerException}} and {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the sparse matrix. The involved function is {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The reason is that the output matrix index computed by {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} does not match the keys in the {{HashMap<MatrixIndexes,MatrixBlock> rix}}. To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} could be used to run the distributed MNIST example. In addition, if adding some codes to ignore the null output matrix block from {{MatrixBlock out = rix.get(ixtmp)}}, the distributed MNIST example could run in the Spark mode, but the result may not be right. > Fix the matrix reshape function for the Spark mode > -------------------------------------------------- > > Key: SYSTEMML-1762 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1762 > Project: SystemML > Issue Type: Bug > Components: Algorithms, ParFor, Runtime > Reporter: Fei Hu > Assignee: Fei Hu > Attachments: MNIST_Distrib_Sgd.scala > > > When running the [distributed MNIST LeNet example | > https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml], > it works well in the hybrid mode. But in the Spark mode, there are some > errors about > {{java.lang.NullPointerException}} and > {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. > The involved functions are > {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and > {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense(org.apache.sysml.runtime.matrix.data.MatrixBlock, > long, long, > java.util.HashMap<org.apache.sysml.runtime.matrix.data.MatrixIndexes,org.apache.sysml.runtime.matrix.data.MatrixBlock>, > long, long, long, long, int, int, boolean)}}. The reason is that the output > matrix index computed by > {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} > does not match the keys in the {{HashMap<MatrixIndexes,MatrixBlock> rix}}. > To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} > could be used to run the distributed MNIST example. > In addition, if adding some codes to ignore the null output matrix block from > {{MatrixBlock out = rix.get(ixtmp)}}, the distributed MNIST example could > run in the Spark mode, but the result may not be right. -- This message was sent by Atlassian JIRA (v6.4.14#64029)