[jira] [Updated] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs

2016-06-13 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-633:
-
Attachment: time_06.11.16.txt
log_06.11.16.txt
perf-tests.tar.gz

@mboehm I ran the experiment again from [commit 
c76b01a753837150c590c79557acdccb9d756a7e | 
https://github.com/apache/incubator-systemml/commit/c76b01a753837150c590c79557acdccb9d756a7e]
 on the same server, with the same singlenode execution mode.  The performance 
is similar, and it does not appear to be applying the update-in-place rule.  I 
also tried without singlenode flagged, but the performance was worse due to MR 
jobs, despite increasing the amount of memory excessively.

I've attached {{time_06.11.16.txt}} with the timings, and {{log_06.11.16.txt}} 
with the full log of all tests.

I also included {{perf-tests.tar.gz}}, which is a full tar archive that has 
everything needed to reproduce the results, minus the JAR file.  Based on 
upload size limits, I couldn't include the standalone SystemML JAR file -- just 
build from the above commit and drop the standalone JAR into the {{perf-tests}} 
folder.  For Python, you'll just need to quickly pip install TensorFlow with 
[these directions | 
https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#pip-installation].
   Execute the experiments with {{run.sh}}.

> Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs
> 
>
> Key: SYSTEMML-633
> URL: https://issues.apache.org/jira/browse/SYSTEMML-633
> Project: SystemML
>  Issue Type: Improvement
>  Components: ParFor
>Reporter: Mike Dusenberry
>Priority: Blocker
> Attachments: Im2colWrapper.java, log.txt, log.txt, log_06.11.16.txt, 
> perf-dml.dml, perf-tests.tar.gz, perf-tf.py, perf.sh, run.sh, 
> systemml-nn-05.16.16.zip, systemml-nn.zip, time.txt, time_06.11.16.txt
>
>
> In the experimental deep learning DML library I've been building 
> ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]),
>  I've experienced severe bottlenecks due to *left-indexing* in parfor loops.  
> Here, I will highlight a few particular instances with simplified examples, 
> but the same issue is shared across many areas of the library, particularly 
> in the convolution and max pooling layers, and is exaggerated in real 
> use-cases.
> *Quick note* on setup for any of the below experiments.  Please grab a copy 
> of the above repo (particularly the {{nn}} directory), and run any 
> experiments with the {{nn}} package available at the base directory of the 
> experiment.
> Scenario: *Convolution*
> * In the library above, the forward pass of the convolution function 
> ([{{conv::forward(...)}} | 
> https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8]
>  in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a 
> matrix of weights {{W}}, and several other parameters corresponding to image 
> sizes, filter sizes, etc.  It then loops through the images with a {{parfor}} 
> loop, and for each image it pads the image with {{util::pad_image}}, extracts 
> "patches" of the image into columns of a matrix in a sliding fashion across 
> the image with {{util::im2col}}, performs a matrix multiplication between the 
> matrix of patch columns and the weight matrix, and then saves the result into 
> a matrix defined outside of the parfor loop using left-indexing.
> * Left-indexing has been identified as the bottleneck by a wide margin.
> * Left-indexing is used in the main {{conv::forward(...)}} function in the 
> [last line in the parfor 
> loop|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L61],
>  in the 
> [{{util::pad_image(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L196]
>  function used by {{conv::forward(...)}}, as well as in the 
> [{{util::im2col(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L96]
>  function used by {{conv::forward(...)}}.
> * Test script (assuming the {{nn}} package is available):
> ** {{speed-633.dml}} {code}
> source("nn/layers/conv.dml") as conv
> source("nn/util.dml") as util
> # Generate data
> N = 64  # num examples
> C = 30  # num channels
> Hin = 28  # input height
> Win = 28  # input width
> F = 20  # num filters
> Hf = 3  # filter height
> Wf = 3  # filter width
> stride = 1
> pad = 1
> X = rand(rows=N, cols=C*Hin*Win)
> # Create layer
> [W, b] = conv::init(F, C, Hf, Wf)
> # Forward
> [out, Hout, Wout] = conv::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, 
> stride, pad, pad)
> print("Out: " + nrow(out) + "x" + ncol

[jira] [Comment Edited] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs

2016-06-13 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328735#comment-15328735
 ] 

Mike Dusenberry edited comment on SYSTEMML-633 at 6/14/16 1:00 AM:
---

[~mboehm7] I ran the experiment again from [commit 
c76b01a753837150c590c79557acdccb9d756a7e | 
https://github.com/apache/incubator-systemml/commit/c76b01a753837150c590c79557acdccb9d756a7e]
 on the same server, with the same singlenode execution mode.  The performance 
is similar, and it does not appear to be applying the update-in-place rule.  I 
also tried without singlenode flagged, but the performance was worse due to MR 
jobs, despite increasing the amount of memory excessively.

I've attached {{time_06.11.16.txt}} with the timings, and {{log_06.11.16.txt}} 
with the full log of all tests.

I also included {{perf-tests.tar.gz}}, which is a full tar archive that has 
everything needed to reproduce the results, minus the JAR file.  Based on 
upload size limits, I couldn't include the standalone SystemML JAR file -- just 
build from the above commit and drop the standalone JAR into the {{perf-tests}} 
folder.  For Python, you'll just need to quickly pip install TensorFlow with 
[these directions | 
https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#pip-installation].
   Execute the experiments with {{run.sh}}.


was (Author: mwdus...@us.ibm.com):
@mboehm I ran the experiment again from [commit 
c76b01a753837150c590c79557acdccb9d756a7e | 
https://github.com/apache/incubator-systemml/commit/c76b01a753837150c590c79557acdccb9d756a7e]
 on the same server, with the same singlenode execution mode.  The performance 
is similar, and it does not appear to be applying the update-in-place rule.  I 
also tried without singlenode flagged, but the performance was worse due to MR 
jobs, despite increasing the amount of memory excessively.

I've attached {{time_06.11.16.txt}} with the timings, and {{log_06.11.16.txt}} 
with the full log of all tests.

I also included {{perf-tests.tar.gz}}, which is a full tar archive that has 
everything needed to reproduce the results, minus the JAR file.  Based on 
upload size limits, I couldn't include the standalone SystemML JAR file -- just 
build from the above commit and drop the standalone JAR into the {{perf-tests}} 
folder.  For Python, you'll just need to quickly pip install TensorFlow with 
[these directions | 
https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#pip-installation].
   Execute the experiments with {{run.sh}}.

> Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs
> 
>
> Key: SYSTEMML-633
> URL: https://issues.apache.org/jira/browse/SYSTEMML-633
> Project: SystemML
>  Issue Type: Improvement
>  Components: ParFor
>Reporter: Mike Dusenberry
>Priority: Blocker
> Attachments: Im2colWrapper.java, log.txt, log.txt, log_06.11.16.txt, 
> perf-dml.dml, perf-tests.tar.gz, perf-tf.py, perf.sh, run.sh, 
> systemml-nn-05.16.16.zip, systemml-nn.zip, time.txt, time_06.11.16.txt
>
>
> In the experimental deep learning DML library I've been building 
> ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]),
>  I've experienced severe bottlenecks due to *left-indexing* in parfor loops.  
> Here, I will highlight a few particular instances with simplified examples, 
> but the same issue is shared across many areas of the library, particularly 
> in the convolution and max pooling layers, and is exaggerated in real 
> use-cases.
> *Quick note* on setup for any of the below experiments.  Please grab a copy 
> of the above repo (particularly the {{nn}} directory), and run any 
> experiments with the {{nn}} package available at the base directory of the 
> experiment.
> Scenario: *Convolution*
> * In the library above, the forward pass of the convolution function 
> ([{{conv::forward(...)}} | 
> https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8]
>  in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a 
> matrix of weights {{W}}, and several other parameters corresponding to image 
> sizes, filter sizes, etc.  It then loops through the images with a {{parfor}} 
> loop, and for each image it pads the image with {{util::pad_image}}, extracts 
> "patches" of the image into columns of a matrix in a sliding fashion across 
> the image with {{util::im2col}}, performs a matrix multiplication between the 
> matrix of patch columns and the weight matrix, and then saves the result into 
> a matrix defined outside of the parfor loop using left-indexing.
> * Left-indexing has been identified as the bottleneck by a wide margin.
> * Left-indexing is used in the main {{c

[jira] [Commented] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing

2016-06-13 Thread Deron Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328718#comment-15328718
 ] 

Deron Eriksson commented on SYSTEMML-760:
-

Thank you [~mboehm7]. Keeping these as 1-based is probably best since as you 
point out, it would be nice for DML and PYDML to be able to read/write the same 
files without modifications. We probably need something in the documentation 
for Python users since they need to be aware that these two formats are 1-based 
indexing.

I have to say though that for myself as a developer who is naive about 
R/Python/ML, having ijv format index from 1 rather than 0 for PYDML is rather 
unexpected. However, I think the benefits of allowing both DML and PYDML to use 
the same file formats outweighs this.

Does anyone else have any feedback WRT this issue? [~mwdus...@us.ibm.com]?

I will close this issue tomorrow unless someone feels there is an issue here.



> PYDML save function ijv and mm formats use 1-based indexing
> ---
>
> Key: SYSTEMML-760
> URL: https://issues.apache.org/jira/browse/SYSTEMML-760
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Reporter: Deron Eriksson
>
> PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj 
> (text) and mm (matrix market) formats uses 1-based matrices.
> The following code:
> {code}
> m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3)
> save(m, "m.txt", format="text")
> save(m, "m.mm", format="mm")
> {code}
> generates:
> m.txt:
> {code}
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
> {code}
> and
> m.mm:
> {code}
> %%MatrixMarket matrix coordinate real general
> 4 3 6
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
> {code}
> 0-based indexing for m.txt would be:
> {code}
> 0 0 1.0
> 0 1 2.0
> 0 2 3.0
> 2 0 7.0
> 2 1 8.0
> 2 2 9.0
> {code}
> A similar situation would exist for the m.mm file.
> Note: The reading of the matrices should also be 0-based if PYDML is 0-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing

2016-06-13 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328629#comment-15328629
 ] 

Matthias Boehm commented on SYSTEMML-760:
-

Unfortunately, there is nothing we can do here because we want to (1) keep 
format consistency (matrix market is an external format with 1-based indexing, 
text is directly derived from it), and (2) ensure input/output compatibility, 
independent of the frontend syntax used (a user should be free to produce a 
file with pydml and read with dml).  

> PYDML save function ijv and mm formats use 1-based indexing
> ---
>
> Key: SYSTEMML-760
> URL: https://issues.apache.org/jira/browse/SYSTEMML-760
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Reporter: Deron Eriksson
>
> PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj 
> (text) and mm (matrix market) formats uses 1-based matrices.
> The following code:
> {code}
> m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3)
> save(m, "m.txt", format="text")
> save(m, "m.mm", format="mm")
> {code}
> generates:
> m.txt:
> {code}
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
> {code}
> and
> m.mm:
> {code}
> %%MatrixMarket matrix coordinate real general
> 4 3 6
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
> {code}
> 0-based indexing for m.txt would be:
> {code}
> 0 0 1.0
> 0 1 2.0
> 0 2 3.0
> 2 0 7.0
> 2 1 8.0
> 2 2 9.0
> {code}
> A similar situation would exist for the m.mm file.
> Note: The reading of the matrices should also be 0-based if PYDML is 0-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing

2016-06-13 Thread Deron Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328201#comment-15328201
 ] 

Deron Eriksson commented on SYSTEMML-760:
-

cc [~mwdus...@us.ibm.com]

> PYDML save function ijv and mm formats use 1-based indexing
> ---
>
> Key: SYSTEMML-760
> URL: https://issues.apache.org/jira/browse/SYSTEMML-760
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Reporter: Deron Eriksson
>
> PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj 
> (text) and mm (matrix market) formats uses 1-based matrices.
> The following code:
> {code}
> m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3)
> save(m, "m.txt", format="text")
> save(m, "m.mm", format="mm")
> {code}
> generates:
> m.txt:
> {code}
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
> {code}
> and
> m.mm:
> {code}
> %%MatrixMarket matrix coordinate real general
> 4 3 6
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
> {code}
> 0-based indexing for m.txt would be:
> {code}
> 0 0 1.0
> 0 1 2.0
> 0 2 3.0
> 2 0 7.0
> 2 1 8.0
> 2 2 9.0
> {code}
> A similar situation would exist for the m.mm file.
> Note: The reading of the matrices should also be 0-based if PYDML is 0-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing

2016-06-13 Thread Deron Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328063#comment-15328063
 ] 

Deron Eriksson commented on SYSTEMML-760:
-

I don't know how the binary format is implemented, but it should also be 
checked.


> PYDML save function ijv and mm formats use 1-based indexing
> ---
>
> Key: SYSTEMML-760
> URL: https://issues.apache.org/jira/browse/SYSTEMML-760
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Reporter: Deron Eriksson
>
> PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj 
> (text) and mm (matrix market) formats uses 1-based matrices.
> The following code:
> {code}
> m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3)
> save(m, "m.txt", format="text")
> save(m, "m.mm", format="mm")
> {code}
> generates:
> m.txt:
> {code}
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
> {code}
> and
> m.mm:
> {code}
> %%MatrixMarket matrix coordinate real general
> 4 3 6
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
> {code}
> 0-based indexing for m.txt would be:
> {code}
> 0 0 1.0
> 0 1 2.0
> 0 2 3.0
> 2 0 7.0
> 2 1 8.0
> 2 2 9.0
> {code}
> A similar situation would exist for the m.mm file.
> Note: The reading of the matrices should also be 0-based if PYDML is 0-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing

2016-06-13 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-760:
---

 Summary: PYDML save function ijv and mm formats use 1-based 
indexing
 Key: SYSTEMML-760
 URL: https://issues.apache.org/jira/browse/SYSTEMML-760
 Project: SystemML
  Issue Type: Task
  Components: APIs
Reporter: Deron Eriksson


PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj 
(text) and mm (matrix market) formats uses 1-based matrices.

The following code:
{code}
m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3)
save(m, "m.txt", format="text")
save(m, "m.mm", format="mm")
{code}
generates:
m.txt:
{code}
1 1 1.0
1 2 2.0
1 3 3.0
3 1 7.0
3 2 8.0
3 3 9.0
{code}
and
m.mm:
{code}
%%MatrixMarket matrix coordinate real general
4 3 6
1 1 1.0
1 2 2.0
1 3 3.0
3 1 7.0
3 2 8.0
3 3 9.0
{code}

0-based indexing for m.txt would be:
{code}
0 0 1.0
0 1 2.0
0 2 3.0
2 0 7.0
2 1 8.0
2 2 9.0
{code}
A similar situation would exist for the m.mm file.

Note: The reading of the matrices should also be 0-based if PYDML is 0-based.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-759) Update Beginner's Guide for toString and PYDML 0-based indexing

2016-06-13 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-759:
---

 Summary: Update Beginner's Guide for toString and PYDML 0-based 
indexing
 Key: SYSTEMML-759
 URL: https://issues.apache.org/jira/browse/SYSTEMML-759
 Project: SystemML
  Issue Type: Task
Reporter: Deron Eriksson
Assignee: Deron Eriksson
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-759) Update Beginner's Guide for toString and PYDML 0-based indexing

2016-06-13 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson updated SYSTEMML-759:

Description: 
The toString function has been added to DML/PYDML, the use of which should be 
included in the Beginner's Guide.

The guide needs to be updated for the PYDML 0-based indexing.


> Update Beginner's Guide for toString and PYDML 0-based indexing
> ---
>
> Key: SYSTEMML-759
> URL: https://issues.apache.org/jira/browse/SYSTEMML-759
> Project: SystemML
>  Issue Type: Task
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>Priority: Minor
>
> The toString function has been added to DML/PYDML, the use of which should be 
> included in the Beginner's Guide.
> The guide needs to be updated for the PYDML 0-based indexing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-758) PYDML toString sparse displays 1 vs 0 indexing

2016-06-13 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-758:
---

 Summary: PYDML toString sparse displays 1 vs 0 indexing
 Key: SYSTEMML-758
 URL: https://issues.apache.org/jira/browse/SYSTEMML-758
 Project: SystemML
  Issue Type: Bug
  Components: APIs
Reporter: Deron Eriksson
Priority: Minor


Recently PYDML changed from 1-based indexing to 0-based indexing for matrices. 
The toString sparse display needs to be changed from 1-based to 0-based 
indexing.

{code}
m = full("1 2 3 4 5 6 7 8 9 10 11 12", rows=4, cols=3)
print(toString(m, sparse=True))
{code}
displays
{code}
1 1 1.000
1 2 2.000
1 3 3.000
2 1 4.000
2 2 5.000
2 3 6.000
3 1 7.000
3 2 8.000
3 3 9.000
4 1 10.000
4 2 11.000
4 3 12.000
{code}

The first line should be
{code}
0 0 1.000
{code}
and the other numbers should be updated accordingly.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)