[jira] [Updated] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs
[ https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Dusenberry updated SYSTEMML-633: - Attachment: time_06.11.16.txt log_06.11.16.txt perf-tests.tar.gz @mboehm I ran the experiment again from [commit c76b01a753837150c590c79557acdccb9d756a7e | https://github.com/apache/incubator-systemml/commit/c76b01a753837150c590c79557acdccb9d756a7e] on the same server, with the same singlenode execution mode. The performance is similar, and it does not appear to be applying the update-in-place rule. I also tried without singlenode flagged, but the performance was worse due to MR jobs, despite increasing the amount of memory excessively. I've attached {{time_06.11.16.txt}} with the timings, and {{log_06.11.16.txt}} with the full log of all tests. I also included {{perf-tests.tar.gz}}, which is a full tar archive that has everything needed to reproduce the results, minus the JAR file. Based on upload size limits, I couldn't include the standalone SystemML JAR file -- just build from the above commit and drop the standalone JAR into the {{perf-tests}} folder. For Python, you'll just need to quickly pip install TensorFlow with [these directions | https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#pip-installation]. Execute the experiments with {{run.sh}}. > Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs > > > Key: SYSTEMML-633 > URL: https://issues.apache.org/jira/browse/SYSTEMML-633 > Project: SystemML > Issue Type: Improvement > Components: ParFor >Reporter: Mike Dusenberry >Priority: Blocker > Attachments: Im2colWrapper.java, log.txt, log.txt, log_06.11.16.txt, > perf-dml.dml, perf-tests.tar.gz, perf-tf.py, perf.sh, run.sh, > systemml-nn-05.16.16.zip, systemml-nn.zip, time.txt, time_06.11.16.txt > > > In the experimental deep learning DML library I've been building > ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]), > I've experienced severe bottlenecks due to *left-indexing* in parfor loops. > Here, I will highlight a few particular instances with simplified examples, > but the same issue is shared across many areas of the library, particularly > in the convolution and max pooling layers, and is exaggerated in real > use-cases. > *Quick note* on setup for any of the below experiments. Please grab a copy > of the above repo (particularly the {{nn}} directory), and run any > experiments with the {{nn}} package available at the base directory of the > experiment. > Scenario: *Convolution* > * In the library above, the forward pass of the convolution function > ([{{conv::forward(...)}} | > https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8] > in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a > matrix of weights {{W}}, and several other parameters corresponding to image > sizes, filter sizes, etc. It then loops through the images with a {{parfor}} > loop, and for each image it pads the image with {{util::pad_image}}, extracts > "patches" of the image into columns of a matrix in a sliding fashion across > the image with {{util::im2col}}, performs a matrix multiplication between the > matrix of patch columns and the weight matrix, and then saves the result into > a matrix defined outside of the parfor loop using left-indexing. > * Left-indexing has been identified as the bottleneck by a wide margin. > * Left-indexing is used in the main {{conv::forward(...)}} function in the > [last line in the parfor > loop|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L61], > in the > [{{util::pad_image(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L196] > function used by {{conv::forward(...)}}, as well as in the > [{{util::im2col(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L96] > function used by {{conv::forward(...)}}. > * Test script (assuming the {{nn}} package is available): > ** {{speed-633.dml}} {code} > source("nn/layers/conv.dml") as conv > source("nn/util.dml") as util > # Generate data > N = 64 # num examples > C = 30 # num channels > Hin = 28 # input height > Win = 28 # input width > F = 20 # num filters > Hf = 3 # filter height > Wf = 3 # filter width > stride = 1 > pad = 1 > X = rand(rows=N, cols=C*Hin*Win) > # Create layer > [W, b] = conv::init(F, C, Hf, Wf) > # Forward > [out, Hout, Wout] = conv::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, > stride, pad, pad) > print("Out: " + nrow(out) + "x" + ncol
[jira] [Comment Edited] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs
[ https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328735#comment-15328735 ] Mike Dusenberry edited comment on SYSTEMML-633 at 6/14/16 1:00 AM: --- [~mboehm7] I ran the experiment again from [commit c76b01a753837150c590c79557acdccb9d756a7e | https://github.com/apache/incubator-systemml/commit/c76b01a753837150c590c79557acdccb9d756a7e] on the same server, with the same singlenode execution mode. The performance is similar, and it does not appear to be applying the update-in-place rule. I also tried without singlenode flagged, but the performance was worse due to MR jobs, despite increasing the amount of memory excessively. I've attached {{time_06.11.16.txt}} with the timings, and {{log_06.11.16.txt}} with the full log of all tests. I also included {{perf-tests.tar.gz}}, which is a full tar archive that has everything needed to reproduce the results, minus the JAR file. Based on upload size limits, I couldn't include the standalone SystemML JAR file -- just build from the above commit and drop the standalone JAR into the {{perf-tests}} folder. For Python, you'll just need to quickly pip install TensorFlow with [these directions | https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#pip-installation]. Execute the experiments with {{run.sh}}. was (Author: mwdus...@us.ibm.com): @mboehm I ran the experiment again from [commit c76b01a753837150c590c79557acdccb9d756a7e | https://github.com/apache/incubator-systemml/commit/c76b01a753837150c590c79557acdccb9d756a7e] on the same server, with the same singlenode execution mode. The performance is similar, and it does not appear to be applying the update-in-place rule. I also tried without singlenode flagged, but the performance was worse due to MR jobs, despite increasing the amount of memory excessively. I've attached {{time_06.11.16.txt}} with the timings, and {{log_06.11.16.txt}} with the full log of all tests. I also included {{perf-tests.tar.gz}}, which is a full tar archive that has everything needed to reproduce the results, minus the JAR file. Based on upload size limits, I couldn't include the standalone SystemML JAR file -- just build from the above commit and drop the standalone JAR into the {{perf-tests}} folder. For Python, you'll just need to quickly pip install TensorFlow with [these directions | https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#pip-installation]. Execute the experiments with {{run.sh}}. > Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs > > > Key: SYSTEMML-633 > URL: https://issues.apache.org/jira/browse/SYSTEMML-633 > Project: SystemML > Issue Type: Improvement > Components: ParFor >Reporter: Mike Dusenberry >Priority: Blocker > Attachments: Im2colWrapper.java, log.txt, log.txt, log_06.11.16.txt, > perf-dml.dml, perf-tests.tar.gz, perf-tf.py, perf.sh, run.sh, > systemml-nn-05.16.16.zip, systemml-nn.zip, time.txt, time_06.11.16.txt > > > In the experimental deep learning DML library I've been building > ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]), > I've experienced severe bottlenecks due to *left-indexing* in parfor loops. > Here, I will highlight a few particular instances with simplified examples, > but the same issue is shared across many areas of the library, particularly > in the convolution and max pooling layers, and is exaggerated in real > use-cases. > *Quick note* on setup for any of the below experiments. Please grab a copy > of the above repo (particularly the {{nn}} directory), and run any > experiments with the {{nn}} package available at the base directory of the > experiment. > Scenario: *Convolution* > * In the library above, the forward pass of the convolution function > ([{{conv::forward(...)}} | > https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8] > in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a > matrix of weights {{W}}, and several other parameters corresponding to image > sizes, filter sizes, etc. It then loops through the images with a {{parfor}} > loop, and for each image it pads the image with {{util::pad_image}}, extracts > "patches" of the image into columns of a matrix in a sliding fashion across > the image with {{util::im2col}}, performs a matrix multiplication between the > matrix of patch columns and the weight matrix, and then saves the result into > a matrix defined outside of the parfor loop using left-indexing. > * Left-indexing has been identified as the bottleneck by a wide margin. > * Left-indexing is used in the main {{c
[jira] [Commented] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing
[ https://issues.apache.org/jira/browse/SYSTEMML-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328718#comment-15328718 ] Deron Eriksson commented on SYSTEMML-760: - Thank you [~mboehm7]. Keeping these as 1-based is probably best since as you point out, it would be nice for DML and PYDML to be able to read/write the same files without modifications. We probably need something in the documentation for Python users since they need to be aware that these two formats are 1-based indexing. I have to say though that for myself as a developer who is naive about R/Python/ML, having ijv format index from 1 rather than 0 for PYDML is rather unexpected. However, I think the benefits of allowing both DML and PYDML to use the same file formats outweighs this. Does anyone else have any feedback WRT this issue? [~mwdus...@us.ibm.com]? I will close this issue tomorrow unless someone feels there is an issue here. > PYDML save function ijv and mm formats use 1-based indexing > --- > > Key: SYSTEMML-760 > URL: https://issues.apache.org/jira/browse/SYSTEMML-760 > Project: SystemML > Issue Type: Task > Components: APIs >Reporter: Deron Eriksson > > PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj > (text) and mm (matrix market) formats uses 1-based matrices. > The following code: > {code} > m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3) > save(m, "m.txt", format="text") > save(m, "m.mm", format="mm") > {code} > generates: > m.txt: > {code} > 1 1 1.0 > 1 2 2.0 > 1 3 3.0 > 3 1 7.0 > 3 2 8.0 > 3 3 9.0 > {code} > and > m.mm: > {code} > %%MatrixMarket matrix coordinate real general > 4 3 6 > 1 1 1.0 > 1 2 2.0 > 1 3 3.0 > 3 1 7.0 > 3 2 8.0 > 3 3 9.0 > {code} > 0-based indexing for m.txt would be: > {code} > 0 0 1.0 > 0 1 2.0 > 0 2 3.0 > 2 0 7.0 > 2 1 8.0 > 2 2 9.0 > {code} > A similar situation would exist for the m.mm file. > Note: The reading of the matrices should also be 0-based if PYDML is 0-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing
[ https://issues.apache.org/jira/browse/SYSTEMML-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328629#comment-15328629 ] Matthias Boehm commented on SYSTEMML-760: - Unfortunately, there is nothing we can do here because we want to (1) keep format consistency (matrix market is an external format with 1-based indexing, text is directly derived from it), and (2) ensure input/output compatibility, independent of the frontend syntax used (a user should be free to produce a file with pydml and read with dml). > PYDML save function ijv and mm formats use 1-based indexing > --- > > Key: SYSTEMML-760 > URL: https://issues.apache.org/jira/browse/SYSTEMML-760 > Project: SystemML > Issue Type: Task > Components: APIs >Reporter: Deron Eriksson > > PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj > (text) and mm (matrix market) formats uses 1-based matrices. > The following code: > {code} > m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3) > save(m, "m.txt", format="text") > save(m, "m.mm", format="mm") > {code} > generates: > m.txt: > {code} > 1 1 1.0 > 1 2 2.0 > 1 3 3.0 > 3 1 7.0 > 3 2 8.0 > 3 3 9.0 > {code} > and > m.mm: > {code} > %%MatrixMarket matrix coordinate real general > 4 3 6 > 1 1 1.0 > 1 2 2.0 > 1 3 3.0 > 3 1 7.0 > 3 2 8.0 > 3 3 9.0 > {code} > 0-based indexing for m.txt would be: > {code} > 0 0 1.0 > 0 1 2.0 > 0 2 3.0 > 2 0 7.0 > 2 1 8.0 > 2 2 9.0 > {code} > A similar situation would exist for the m.mm file. > Note: The reading of the matrices should also be 0-based if PYDML is 0-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing
[ https://issues.apache.org/jira/browse/SYSTEMML-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328201#comment-15328201 ] Deron Eriksson commented on SYSTEMML-760: - cc [~mwdus...@us.ibm.com] > PYDML save function ijv and mm formats use 1-based indexing > --- > > Key: SYSTEMML-760 > URL: https://issues.apache.org/jira/browse/SYSTEMML-760 > Project: SystemML > Issue Type: Task > Components: APIs >Reporter: Deron Eriksson > > PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj > (text) and mm (matrix market) formats uses 1-based matrices. > The following code: > {code} > m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3) > save(m, "m.txt", format="text") > save(m, "m.mm", format="mm") > {code} > generates: > m.txt: > {code} > 1 1 1.0 > 1 2 2.0 > 1 3 3.0 > 3 1 7.0 > 3 2 8.0 > 3 3 9.0 > {code} > and > m.mm: > {code} > %%MatrixMarket matrix coordinate real general > 4 3 6 > 1 1 1.0 > 1 2 2.0 > 1 3 3.0 > 3 1 7.0 > 3 2 8.0 > 3 3 9.0 > {code} > 0-based indexing for m.txt would be: > {code} > 0 0 1.0 > 0 1 2.0 > 0 2 3.0 > 2 0 7.0 > 2 1 8.0 > 2 2 9.0 > {code} > A similar situation would exist for the m.mm file. > Note: The reading of the matrices should also be 0-based if PYDML is 0-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing
[ https://issues.apache.org/jira/browse/SYSTEMML-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328063#comment-15328063 ] Deron Eriksson commented on SYSTEMML-760: - I don't know how the binary format is implemented, but it should also be checked. > PYDML save function ijv and mm formats use 1-based indexing > --- > > Key: SYSTEMML-760 > URL: https://issues.apache.org/jira/browse/SYSTEMML-760 > Project: SystemML > Issue Type: Task > Components: APIs >Reporter: Deron Eriksson > > PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj > (text) and mm (matrix market) formats uses 1-based matrices. > The following code: > {code} > m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3) > save(m, "m.txt", format="text") > save(m, "m.mm", format="mm") > {code} > generates: > m.txt: > {code} > 1 1 1.0 > 1 2 2.0 > 1 3 3.0 > 3 1 7.0 > 3 2 8.0 > 3 3 9.0 > {code} > and > m.mm: > {code} > %%MatrixMarket matrix coordinate real general > 4 3 6 > 1 1 1.0 > 1 2 2.0 > 1 3 3.0 > 3 1 7.0 > 3 2 8.0 > 3 3 9.0 > {code} > 0-based indexing for m.txt would be: > {code} > 0 0 1.0 > 0 1 2.0 > 0 2 3.0 > 2 0 7.0 > 2 1 8.0 > 2 2 9.0 > {code} > A similar situation would exist for the m.mm file. > Note: The reading of the matrices should also be 0-based if PYDML is 0-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing
Deron Eriksson created SYSTEMML-760: --- Summary: PYDML save function ijv and mm formats use 1-based indexing Key: SYSTEMML-760 URL: https://issues.apache.org/jira/browse/SYSTEMML-760 Project: SystemML Issue Type: Task Components: APIs Reporter: Deron Eriksson PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj (text) and mm (matrix market) formats uses 1-based matrices. The following code: {code} m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3) save(m, "m.txt", format="text") save(m, "m.mm", format="mm") {code} generates: m.txt: {code} 1 1 1.0 1 2 2.0 1 3 3.0 3 1 7.0 3 2 8.0 3 3 9.0 {code} and m.mm: {code} %%MatrixMarket matrix coordinate real general 4 3 6 1 1 1.0 1 2 2.0 1 3 3.0 3 1 7.0 3 2 8.0 3 3 9.0 {code} 0-based indexing for m.txt would be: {code} 0 0 1.0 0 1 2.0 0 2 3.0 2 0 7.0 2 1 8.0 2 2 9.0 {code} A similar situation would exist for the m.mm file. Note: The reading of the matrices should also be 0-based if PYDML is 0-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SYSTEMML-759) Update Beginner's Guide for toString and PYDML 0-based indexing
Deron Eriksson created SYSTEMML-759: --- Summary: Update Beginner's Guide for toString and PYDML 0-based indexing Key: SYSTEMML-759 URL: https://issues.apache.org/jira/browse/SYSTEMML-759 Project: SystemML Issue Type: Task Reporter: Deron Eriksson Assignee: Deron Eriksson Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SYSTEMML-759) Update Beginner's Guide for toString and PYDML 0-based indexing
[ https://issues.apache.org/jira/browse/SYSTEMML-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson updated SYSTEMML-759: Description: The toString function has been added to DML/PYDML, the use of which should be included in the Beginner's Guide. The guide needs to be updated for the PYDML 0-based indexing. > Update Beginner's Guide for toString and PYDML 0-based indexing > --- > > Key: SYSTEMML-759 > URL: https://issues.apache.org/jira/browse/SYSTEMML-759 > Project: SystemML > Issue Type: Task >Reporter: Deron Eriksson >Assignee: Deron Eriksson >Priority: Minor > > The toString function has been added to DML/PYDML, the use of which should be > included in the Beginner's Guide. > The guide needs to be updated for the PYDML 0-based indexing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SYSTEMML-758) PYDML toString sparse displays 1 vs 0 indexing
Deron Eriksson created SYSTEMML-758: --- Summary: PYDML toString sparse displays 1 vs 0 indexing Key: SYSTEMML-758 URL: https://issues.apache.org/jira/browse/SYSTEMML-758 Project: SystemML Issue Type: Bug Components: APIs Reporter: Deron Eriksson Priority: Minor Recently PYDML changed from 1-based indexing to 0-based indexing for matrices. The toString sparse display needs to be changed from 1-based to 0-based indexing. {code} m = full("1 2 3 4 5 6 7 8 9 10 11 12", rows=4, cols=3) print(toString(m, sparse=True)) {code} displays {code} 1 1 1.000 1 2 2.000 1 3 3.000 2 1 4.000 2 2 5.000 2 3 6.000 3 1 7.000 3 2 8.000 3 3 9.000 4 1 10.000 4 2 11.000 4 3 12.000 {code} The first line should be {code} 0 0 1.000 {code} and the other numbers should be updated accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)