[ 
https://issues.apache.org/jira/browse/SYSTEMML-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847664#comment-15847664
 ] 

Niketan Pansare commented on SYSTEMML-1140:
-------------------------------------------

Sorry, I forgot to update this JIRA with series of improvements related to this 
PR:
1. Many CP convolution operators now have sparse support (except im2col). 
However, since CuDNN doesnot have a sparse equivalent, we only support dense 
convolution on GPU.
2. Fused operators such as relu_maxpooling and relu_backward has been added to 
reduce the conversion overhead of sparsity-introducing operators such as relu. 
In fact, the performance of relu_maxpooling is exactly same as that of 
maxpooling in CP, making relu a no-op in the fused implementation :)

[~mboehm7] I used Mike's Lenet script with MNIST dataset as an example. Please 
see 
https://github.com/apache/incubator-systemml/blob/master/scripts/staging/SystemML-NN/examples/Example%20-%20MNIST%20LeNet.ipynb
 ... Here is the Cache statistics from a sample run after adding the above 
mentioned fused operators (date: Jan 13th, 2017):

Cache hits (Mem, WB, FS, HDFS): 1096424/0/0/2.
Cache writes (WB, FS, HDFS): 603950/15/8.
Cache times (ACQr/m, RLS, EXP): 3.659/0.456/273.799/1.275 sec.

I have seen anywhere betweeh 250 to 500 seconds spent in Cache times.

You can also use Mike's Breast Cancer Project as an example workload.

> Sparse/Caching performance bugs related to deep learning scripts
> ----------------------------------------------------------------
>
>                 Key: SYSTEMML-1140
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1140
>             Project: SystemML
>          Issue Type: Bug
>    Affects Versions: SystemML 1.0
>            Reporter: Niketan Pansare
>            Priority: Blocker
>
> We have identified two performance bugs that frequently occurs in deep 
> learning script.
> First, we repeatedly perform unnecessary conversion to sparse format. Also, 
> the operations such as matrix multiplication (including BLAS and CuBLAS) are  
> optimized for dense.
>       
> Second, even with large memory budget, we sometimes spend almost 20-30% time 
> in caching.
> [~mboehm7] [~reinwald] [~mwdus...@us.ibm.com] I am labeling this bug as 
> blocker for SystemML 1.0. Please feel free to assign this issue to yourself.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to