[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003710#comment-16003710
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

Well, I tried logging the rewrites with {{ProgramRewriter.LDEBUG = true}} 
enabled and log4j set to DEBUG, but it only displayed the common subexpression 
elimination rewrites during the second chance pass.  Looking into it further, 
rewrites like the constant folding don't seem to ever emit debug logging, so I 
don't think the log isn't showing the whole picture.  Regardless, here's the 
trace (look for the {{ABOUT TO START STATIC REWRITE + IPA SECOND 
CHANCE}} section).

{code}
17/05/09 15:50:35 DEBUG DMLScript:
DML config:
INFO: localtmpdir: /tmp/systemml
INFO: scratch: scratch_space
INFO: optlevel: 2
INFO: numreducers: 10
INFO: defaultblocksize: 1000
INFO: dml.yarn.appmaster: false
INFO: dml.yarn.appmaster.mem: 2048
INFO: dml.yarn.mapreduce.mem: -1
INFO: cp.parallel.matrixmult: true
INFO: cp.parallel.textio: true
INFO: native.blas: auto
INFO: compressed.linalg: false
INFO: codegen.enabled: false
INFO: codegen.literals: 1
INFO: codegen.plancache: true
INFO: systemml.stats.extraGPU: false
INFO: systemml.stats.extraDNN: false

17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/examples/mnist_lenet.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/affine.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/conv2d_builtin.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/cross_entropy_loss.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/dropout.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/l2_reg.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/max_pool2d_builtin.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/relu.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/softmax.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/optim/sgd_nesterov.dml
17/05/09 15:50:36 DEBUG MRConfigurationNames: Hadoop build version: 2.6.5 from 
e8c9fe0b4c252caf2ebf1464220599650f119997 by sjlee source checksum 
f05c9fa095a395faa9db9f7ba5d754
17/05/09 15:50:36 DEBUG MRConfigurationNames: Using hadoop 2.x configuration 
properties.
17/05/09 15:50:36 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
about=, always=false, type=DEFAULT, value=[Rate of successful kerberos logins 
and latency (milliseconds)], valueName=Time)
17/05/09 15:50:36 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
about=, always=false, type=DEFAULT, value=[Rate of failed kerberos logins and 
latency (milliseconds)], valueName=Time)
17/05/09 15:50:36 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
about=, always=false, type=DEFAULT, value=[GetGroups], valueName=Time)
17/05/09 15:50:36 DEBUG MetricsSystemImpl: UgiMetrics, User and group related 
metrics
17/05/09 15:50:36 DEBUG KerberosName: Kerberos krb5 configuration not found, 
setting default realm to empty
17/05/09 15:50:36 DEBUG Groups:  Creating new Groups object
17/05/09 15:50:36 DEBUG NativeCodeLoader: Trying to load the custom-built 
native-hadoop library...
17/05/09 15:50:36 DEBUG NativeCodeLoader: Failed to load native-hadoop with 
error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
17/05/09 15:50:36 DEBUG NativeCodeLoader: 
java.library.path=/Users/mwdusenb/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
17/05/09 15:50:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/05/09 15:50:36 DEBUG PerformanceAdvisory: Falling back to shell based
17/05/09 15:50:36 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
17/05/09 15:50:36 DEBUG Groups: Group mapping 

[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003628#comment-16003628
 ] 

Matthias Boehm commented on SYSTEMML-1561:
--

and it's great to see that the recompilation times are still in a reasonable 
range: 5978 DAGs in 3.2s - generally, we try to keep recompilation of average 
DAGs at around 1ms. 

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Fix For: SystemML 1.0
>
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003618#comment-16003618
 ] 

Matthias Boehm commented on SYSTEMML-1561:
--

that's awesome - just one question: do we understand what reduced the number of 
cache writes to HDFS (export) from 2100 to 8?

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Fix For: SystemML 1.0
>
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003546#comment-16003546
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

As I noted on SystemML-1566, I ran experiments again using (1) the commit 
before the IPA scalar replacement update, (2) the commit with the IPA scalar 
replacement update, and (3) the proposed commit with the updated constant 
folding (which relies on the IPA update for usefulness), and measured the 
following results:

commit 2c5c3b14e1906cda70ae1581b19a5e908b3ab329 (pre IPA update)
{code}
17/05/05 14:39:49 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 712.183 sec.
Total compilation time: 1.996 sec.
Total execution time:   710.187 sec.
Number of compiled Spark inst:  134.
Number of executed Spark inst:  2513.
Cache hits (Mem, WB, FS, HDFS): 153624/0/0/862.
Cache writes (WB, FS, HDFS):79043/0/2170.
Cache times (ACQr/m, RLS, EXP): 32.052/0.038/5.508/55.790 sec.
HOP DAGs recompiled (PRED, SB): 0/5979.
HOP DAGs recompile time:3.670 sec.
Functions recompiled:   10.
Functions recompile time:   0.082 sec.
Spark ctx create time (lazy):   0.959 sec.
Spark trans counts (par,bc,col):347/1649/862.
Spark trans times (par,bc,col): 0.671/25.076/31.988 secs.
Total JIT compile time: 118.9 sec.
Total JVM GC count: 267.
Total JVM GC time:  7.523 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   671.994 sec 1
-- 2)   conv2d_bias_add 198.398 sec 3298
-- 3)   maxpooling_backward 174.666 sec 1720
-- 4)   predict 140.782 sec 9
-- 5)   sp_mapmm94.035 sec  1649
-- 6)   conv2d_backward_filter  63.328 sec  1720
-- 7)   sp_sel+ 39.259 sec  860
-- 8)   ba+*18.615 sec  5089
-- 9)   +*  16.627 sec  10320
-- 10)  conv2d_backward_data14.297 sec  860
{code}

commit abc9686fbaaa11c12cfa02c49c7675165acdf176 (w/ IPA update)
{code}
17/05/05 15:05:16 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 673.900 sec.
Total compilation time: 1.938 sec.
Total execution time:   671.962 sec.
Number of compiled Spark inst:  128.
Number of executed Spark inst:  2513.
Cache hits (Mem, WB, FS, HDFS): 153645/0/0/862.
Cache writes (WB, FS, HDFS):79043/0/2149.
Cache times (ACQr/m, RLS, EXP): 31.568/0.038/4.639/54.790 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:3.705 sec.
Functions recompiled:   10.
Functions recompile time:   0.068 sec.
Spark ctx create time (lazy):   0.948 sec.
Spark trans counts (par,bc,col):368/1649/862.
Spark trans times (par,bc,col): 0.689/26.035/31.503 secs.
Total JIT compile time: 111.921 sec.
Total JVM GC count: 265.
Total JVM GC time:  7.118 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   634.306 sec 1
-- 2)   conv2d_bias_add 190.557 sec 3298
-- 3)   maxpooling_backward 141.588 sec 1720
-- 4)   predict 135.222 sec 9
-- 5)   sp_mapmm94.025 sec  1649
-- 6)   conv2d_backward_filter  66.058 sec  1720
-- 7)   sp_sel+ 39.204 sec  860
-- 8)   +*  18.272 sec  10320
-- 9)   ba+*15.804 sec  5089
-- 10)  conv2d_backward_data13.627 sec  860
{code}

w/ updated constant folding
{code}
17/05/05 15:15:19 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 405.615 sec.
Total compilation time: 2.070 sec.
Total execution time:   403.545 sec.
Number of compiled Spark inst:  139.
Number of executed Spark inst:  793.
Cache hits (Mem, WB, FS, HDFS): 156654/0/0/2.
Cache writes (WB, FS, HDFS):79043/0/8.
Cache times (ACQr/m, RLS, EXP): 3.467/0.043/3.566/1.175 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:3.178 sec.
Functions recompiled:   10.
Functions recompile time:   0.072 sec.
Spark ctx create time (lazy):   1.024 sec.
Spark trans counts (par,bc,col):789/789/2.
Spark trans times (par,bc,col): 0.982/0.299/3.418 secs.
Total JIT compile time: 145.368 sec.
Total JVM GC count: 438.
Total JVM GC time:  8.992 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   370.373 sec 1
-- 2)   conv2d_bias_add 178.914 sec 3298
-- 3)   predict 116.145 sec 9
-- 4)   conv2d_backward_filter  55.582 sec  1720
-- 5)   +*  18.948 sec  10320
-- 6)   sel+18.238 sec  3369
-- 7)   ba+*16.171 sec  5949
-- 8)   conv2d_backward_data15.038 sec  860
-- 9)   sp_mapmm13.980 sec  789
-- 10)  relu_maxpooling 12.415 sec  3298
{code}

With the IPA scalar replacement + constant folding updates, we've gained an 
additional ~300s, for a ~1.75x speedup in this scenario.

> Improve constant 

[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003330#comment-16003330
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

[PR 484 | https://github.com/apache/incubator-systemml/pull/484] submitted.  
[~mboehm7] Can you please review when you get a chance?

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Fix For: SystemML 1.0
>
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-03 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996123#comment-15996123
 ] 

Matthias Boehm commented on SYSTEMML-1561:
--

sounds great - a second chance would be useful for many other scenarios too. 
The 2x runtime improvement is a bit surprising though because very similar 
rewrites would be performed during dynamic recompilation (except constant 
folding, which is covered by size expression over sub dags of scalar operations 
with symbol table inputs) and dynamic recompilation itself was not the 
bottleneck. I would be very interested to know were this is coming from, maybe 
some cascade of other rewrites/fused operator? You can set 
{{ProgramRewriter.LDEBUG = true}} to see the applied simplification rewrites 
along with line numbers where they originate from. 

For your PR, if you want to ensure that future compiler modifications preserve 
this behavior, please add a test into {{functions.recompile}} or 
{{functions.misc}}, similar to other size-dependent rewrites - the easiest way 
is to construct a case, where without size propagation we would compile/execute 
distributed operations and simply compare the number of compiled/executed Spark 
instructions with expected values.

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Fix For: SystemML 1.0
>
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-03 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995732#comment-15995732
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

Just FYI, I'm making some progress on this.  Essentially, by rerunning static 
rewrites + IPA again immediately after the initial IPA pass as a kind of 
"second chance", we're able to apply this constant folding rewrite for this 
scenario.  This make sense because during the initial static rewrite pass, we 
can't apply constant folding to the {{Hout}}, {{Wout}}, etc. DAGs due to the 
leaf nodes being scalar transient reads.  After IPA with the new scalar 
replacement, these DAGs will become entirely operations on literal leaf nodes, 
and thus eligible for constant folding.  Then, after that second pass of static 
rewrites, we can benefit from IPA again by being able to now perform scalar 
replacement for functions/other DAGs that consume the {{Hout}}, {{Wout}}, etc. 
DAGs, which are now literals.  In terms of performance, I'm seeing the 
execution time cut in half (~500s faster) for SYSTEMML-1566.  I can open a PR 
soon.

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Fix For: SystemML 1.0
>
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-04-28 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988311#comment-15988311
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

Yeah that's a good point, in these cases the functions are indeed inlined.  
Also, to be clear, the IPA scalar propagation causes the scalar leaf nodes of 
the {{Hout}} or {{Wout}} sub-dags to be replaced with literals, but {{Hout}} 
and {{Wout}} themselves are still not evaluated.  I.e., for each there is still 
a dag of basic scalar operations that needs to be evaluated, but the good news 
is that it is not dependent on anything except for literals at the ends.

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-04-27 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988262#comment-15988262
 ] 

Matthias Boehm commented on SYSTEMML-1561:
--

so my guess is, this issue right here will be resolved once we rework the 
entire scalar propagation into functions and across the entire program.

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-04-27 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988258#comment-15988258
 ] 

Matthias Boehm commented on SYSTEMML-1561:
--

ok, just to clarify: even the recompile explain output does not show the 
worst-case size estimates and computed size expressions as they are only 
transiently inferred and used for memory estimates. For example, given a right 
indexing B = A[x:y, z] with unknown scalars x, y, and z we would still use a 
worst-case estimate of nrow(A) x 1 for B. Anyway, if you've seen that the 
extended scalar propagation solves it, than that's fine, but likely only works 
because this forward function is inlined.  

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-04-27 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15986181#comment-15986181
 ] 

Matthias Boehm commented on SYSTEMML-1561:
--

sorry I don't have a lot of free cycles right now - but could look into it end 
of next week. [~niketanpansare] it would be good if you could have a detailed 
look.

Generally, I think this is probably just a misunderstanding. We perform 
constant folding during initial compilation but not during dynamic 
recompilation and this scenario seem (without a closer look) to require the 
latter. During dynamic recompilation, we compute worst-case size estimates 
including the evaluation of scalar sub trees but these estimates are simply not 
exposed in our explain output. However, of course there might be issues. 


> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-04-26 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985362#comment-15985362
 ] 

Niketan Pansare commented on SYSTEMML-1561:
---

[~mwdus...@us.ibm.com] I am pretty swamped until mid-June with conferences and 
other features. 

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-04-26 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985353#comment-15985353
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

Yeah, let's just fix this issue properly, and then no need for hacks in 
Caffe2DML or anywhere else!  [~niketanpansare], [~mboehm7] Can either of you 
work on this?

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-04-26 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985257#comment-15985257
 ] 

Niketan Pansare commented on SYSTEMML-1561:
---

[~mwdus...@us.ibm.com] We definitely need to address this issue. As an FYI, 
Caffe2DML addresses this issue by computing the shapes at compile time rather 
than runtime.

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-04-25 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983477#comment-15983477
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

cc [~niketanpansare]

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-04-24 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982214#comment-15982214
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

cc [~mboehm7]

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)