[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1774:
-----------------------------
    Description: When running the  [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 each mini-batch could ideally run in parallel without interaction. We try to 
force {{parfor (j in 1:parallel_batches)}} at line 137 of 
{{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use {{REMOTE_SPARK}} 
mode, but got some errors about {{org.apache.sysml.runtime.DMLRuntimeException: 
Not supported: Instructions of type other than CP instructions}}. More log 
information can be found at the following comments. One example of the errors 
is that at the convolutional layer, we need to randomly generate some matrixes, 
but SystemML choose {{RandSPInstruction}} instead of {{DataGenCPInstruction}}, 
which may be because SystemML could not determine the row number of the matrix. 
For this distributed MNIST LeNet  example, using CPInstruction may achieve 
better performance.   (was: When running the  [distributed MNIST LeNet example 
| 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 each mini-batch could ideally run in parallel without interaction. We try to 
force {{parfor (j in 1:parallel_batches)}} at line 137 of 
{{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use {{REMOTE_SPARK}} 
mode, but got some errors about {{org.apache.sysml.runtime.DMLRuntimeException: 
Not supported: Instructions of type other than CP instructions}}. More log 
information can be found at the following comments. One example of the errors 
is that at the convolution layer, we need to randomly generate a matrix, but 
SystemML choose {{RandSPInstruction}} instead of {{DataGenCPInstruction}}, 
which may be because SystemML could not determine the row number of the matrix. 
For this distributed MNIST LeNet  example, using CPInstruction may achieve 
better performance. )

> Improve Parfor parallelism for deep learning
> --------------------------------------------
>
>                 Key: SYSTEMML-1774
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, Compiler, ParFor
>    Affects Versions: SystemML 1.0
>            Reporter: Fei Hu
>              Labels: deeplearning
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}}. More log information can be found at 
> the following comments. One example of the errors is that at the 
> convolutional layer, we need to randomly generate some matrixes, but SystemML 
> choose {{RandSPInstruction}} instead of {{DataGenCPInstruction}}, which may 
> be because SystemML could not determine the row number of the matrix. For 
> this distributed MNIST LeNet  example, using CPInstruction may achieve better 
> performance. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to