[jira] [Commented] (SYSTEMML-1140) Sparse/Caching performance bugs related to deep learning scripts
[ https://issues.apache.org/jira/browse/SYSTEMML-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847738#comment-15847738 ] Mike Dusenberry commented on SYSTEMML-1140: --- Awesome, thanks [~mboehm7]. I agree with [~niketanpansare] that this will greatly help with the breast cancer project as well. > Sparse/Caching performance bugs related to deep learning scripts > > > Key: SYSTEMML-1140 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1140 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 1.0 >Reporter: Niketan Pansare >Priority: Blocker > > We have identified two performance bugs that frequently occurs in deep > learning script. > First, we repeatedly perform unnecessary conversion to sparse format. Also, > the operations such as matrix multiplication (including BLAS and CuBLAS) are > optimized for dense. > > Second, even with large memory budget, we sometimes spend almost 20-30% time > in caching. > [~mboehm7] [~reinwald] [~mwdus...@us.ibm.com] I am labeling this bug as > blocker for SystemML 1.0. Please feel free to assign this issue to yourself. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1140) Sparse/Caching performance bugs related to deep learning scripts
[ https://issues.apache.org/jira/browse/SYSTEMML-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847691#comment-15847691 ] Niketan Pansare commented on SYSTEMML-1140: --- Thanks [~mboehm7]. That will also help speedup the breast cancer use-case as well :) > Sparse/Caching performance bugs related to deep learning scripts > > > Key: SYSTEMML-1140 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1140 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 1.0 >Reporter: Niketan Pansare >Priority: Blocker > > We have identified two performance bugs that frequently occurs in deep > learning script. > First, we repeatedly perform unnecessary conversion to sparse format. Also, > the operations such as matrix multiplication (including BLAS and CuBLAS) are > optimized for dense. > > Second, even with large memory budget, we sometimes spend almost 20-30% time > in caching. > [~mboehm7] [~reinwald] [~mwdus...@us.ibm.com] I am labeling this bug as > blocker for SystemML 1.0. Please feel free to assign this issue to yourself. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1140) Sparse/Caching performance bugs related to deep learning scripts
[ https://issues.apache.org/jira/browse/SYSTEMML-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847687#comment-15847687 ] Matthias Boehm commented on SYSTEMML-1140: -- OK thanks, as it's on algorithm it will take a while but I'll look into it next week. > Sparse/Caching performance bugs related to deep learning scripts > > > Key: SYSTEMML-1140 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1140 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 1.0 >Reporter: Niketan Pansare >Priority: Blocker > > We have identified two performance bugs that frequently occurs in deep > learning script. > First, we repeatedly perform unnecessary conversion to sparse format. Also, > the operations such as matrix multiplication (including BLAS and CuBLAS) are > optimized for dense. > > Second, even with large memory budget, we sometimes spend almost 20-30% time > in caching. > [~mboehm7] [~reinwald] [~mwdus...@us.ibm.com] I am labeling this bug as > blocker for SystemML 1.0. Please feel free to assign this issue to yourself. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1140) Sparse/Caching performance bugs related to deep learning scripts
[ https://issues.apache.org/jira/browse/SYSTEMML-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847664#comment-15847664 ] Niketan Pansare commented on SYSTEMML-1140: --- Sorry, I forgot to update this JIRA with series of improvements related to this PR: 1. Many CP convolution operators now have sparse support (except im2col). However, since CuDNN doesnot have a sparse equivalent, we only support dense convolution on GPU. 2. Fused operators such as relu_maxpooling and relu_backward has been added to reduce the conversion overhead of sparsity-introducing operators such as relu. In fact, the performance of relu_maxpooling is exactly same as that of maxpooling in CP, making relu a no-op in the fused implementation :) [~mboehm7] I used Mike's Lenet script with MNIST dataset as an example. Please see https://github.com/apache/incubator-systemml/blob/master/scripts/staging/SystemML-NN/examples/Example%20-%20MNIST%20LeNet.ipynb ... Here is the Cache statistics from a sample run after adding the above mentioned fused operators (date: Jan 13th, 2017): Cache hits (Mem, WB, FS, HDFS): 1096424/0/0/2. Cache writes (WB, FS, HDFS): 603950/15/8. Cache times (ACQr/m, RLS, EXP): 3.659/0.456/273.799/1.275 sec. I have seen anywhere betweeh 250 to 500 seconds spent in Cache times. You can also use Mike's Breast Cancer Project as an example workload. > Sparse/Caching performance bugs related to deep learning scripts > > > Key: SYSTEMML-1140 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1140 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 1.0 >Reporter: Niketan Pansare >Priority: Blocker > > We have identified two performance bugs that frequently occurs in deep > learning script. > First, we repeatedly perform unnecessary conversion to sparse format. Also, > the operations such as matrix multiplication (including BLAS and CuBLAS) are > optimized for dense. > > Second, even with large memory budget, we sometimes spend almost 20-30% time > in caching. > [~mboehm7] [~reinwald] [~mwdus...@us.ibm.com] I am labeling this bug as > blocker for SystemML 1.0. Please feel free to assign this issue to yourself. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1140) Sparse/Caching performance bugs related to deep learning scripts
[ https://issues.apache.org/jira/browse/SYSTEMML-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805317#comment-15805317 ] Mike Dusenberry commented on SYSTEMML-1140: --- Copying from GitHub [PR 329 | https://github.com/apache/incubator-systemml/pull/329]: I think that given the newer & expanded range of ML workloads that are now being run on SystemML, we should go back through the system and challenge previous assumptions that were made, such as the sparsity threshold. We've already started this by breaking previous assumptions of tall-skinny matrices, for example. A big issue we're seeing now is models that contain several computational transformation ("layers") rather than just a couple. Let's pause [...] and focus on [...] a more general approach of reevaluating previous assumptions in the system. Sparse support, caching, and native BLAS come to mind immediately. > Sparse/Caching performance bugs related to deep learning scripts > > > Key: SYSTEMML-1140 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1140 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 1.0 >Reporter: Niketan Pansare >Priority: Blocker > > We have identified two performance bugs that frequently occurs in deep > learning script. > First, we repeatedly perform unnecessary conversion to sparse format. Also, > the operations such as matrix multiplication (including BLAS and CuBLAS) are > optimized for dense. > > Second, even with large memory budget, we sometimes spend almost 20-30% time > in caching. > [~mboehm7] [~reinwald] [~mwdus...@us.ibm.com] I am labeling this bug as > blocker for SystemML 1.0. Please feel free to assign this issue to yourself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)