[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003710#comment-16003710 ] Mike Dusenberry commented on SYSTEMML-1561: --- Well, I tried logging the rewrites with {{ProgramRewriter.LDEBUG = true}} enabled and log4j set to DEBUG, but it only displayed the common subexpression elimination rewrites during the second chance pass. Looking into it further, rewrites like the constant folding don't seem to ever emit debug logging, so I don't think the log isn't showing the whole picture. Regardless, here's the trace (look for the {{ABOUT TO START STATIC REWRITE + IPA SECOND CHANCE}} section). {code} 17/05/09 15:50:35 DEBUG DMLScript: DML config: INFO: localtmpdir: /tmp/systemml INFO: scratch: scratch_space INFO: optlevel: 2 INFO: numreducers: 10 INFO: defaultblocksize: 1000 INFO: dml.yarn.appmaster: false INFO: dml.yarn.appmaster.mem: 2048 INFO: dml.yarn.mapreduce.mem: -1 INFO: cp.parallel.matrixmult: true INFO: cp.parallel.textio: true INFO: native.blas: auto INFO: compressed.linalg: false INFO: codegen.enabled: false INFO: codegen.literals: 1 INFO: codegen.plancache: true INFO: systemml.stats.extraGPU: false INFO: systemml.stats.extraDNN: false 17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local file system: ./nn/examples/mnist_lenet.dml 17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local file system: ./nn/layers/affine.dml 17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local file system: ./nn/layers/conv2d_builtin.dml 17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local file system: ./nn/layers/cross_entropy_loss.dml 17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local file system: ./nn/layers/dropout.dml 17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local file system: ./nn/layers/l2_reg.dml 17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local file system: ./nn/layers/max_pool2d_builtin.dml 17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local file system: ./nn/layers/relu.dml 17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local file system: ./nn/layers/softmax.dml 17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local file system: ./nn/optim/sgd_nesterov.dml 17/05/09 15:50:36 DEBUG MRConfigurationNames: Hadoop build version: 2.6.5 from e8c9fe0b4c252caf2ebf1464220599650f119997 by sjlee source checksum f05c9fa095a395faa9db9f7ba5d754 17/05/09 15:50:36 DEBUG MRConfigurationNames: Using hadoop 2.x configuration properties. 17/05/09 15:50:36 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, about=, always=false, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)], valueName=Time) 17/05/09 15:50:36 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, about=, always=false, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)], valueName=Time) 17/05/09 15:50:36 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, about=, always=false, type=DEFAULT, value=[GetGroups], valueName=Time) 17/05/09 15:50:36 DEBUG MetricsSystemImpl: UgiMetrics, User and group related metrics 17/05/09 15:50:36 DEBUG KerberosName: Kerberos krb5 configuration not found, setting default realm to empty 17/05/09 15:50:36 DEBUG Groups: Creating new Groups object 17/05/09 15:50:36 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library... 17/05/09 15:50:36 DEBUG NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 17/05/09 15:50:36 DEBUG NativeCodeLoader: java.library.path=/Users/mwdusenb/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:. 17/05/09 15:50:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/05/09 15:50:36 DEBUG PerformanceAdvisory: Falling back to shell based 17/05/09 15:50:36 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping 17/05/09 15:50:36 DEBUG Groups: Group mapping
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003628#comment-16003628 ] Matthias Boehm commented on SYSTEMML-1561: -- and it's great to see that the recompilation times are still in a reasonable range: 5978 DAGs in 3.2s - generally, we try to keep recompilation of average DAGs at around 1ms. > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry >Assignee: Mike Dusenberry > Fix For: SystemML 1.0 > > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003618#comment-16003618 ] Matthias Boehm commented on SYSTEMML-1561: -- that's awesome - just one question: do we understand what reduced the number of cache writes to HDFS (export) from 2100 to 8? > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry >Assignee: Mike Dusenberry > Fix For: SystemML 1.0 > > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003546#comment-16003546 ] Mike Dusenberry commented on SYSTEMML-1561: --- As I noted on SystemML-1566, I ran experiments again using (1) the commit before the IPA scalar replacement update, (2) the commit with the IPA scalar replacement update, and (3) the proposed commit with the updated constant folding (which relies on the IPA update for usefulness), and measured the following results: commit 2c5c3b14e1906cda70ae1581b19a5e908b3ab329 (pre IPA update) {code} 17/05/05 14:39:49 INFO ScriptExecutorUtils: SystemML Statistics: Total elapsed time: 712.183 sec. Total compilation time: 1.996 sec. Total execution time: 710.187 sec. Number of compiled Spark inst: 134. Number of executed Spark inst: 2513. Cache hits (Mem, WB, FS, HDFS): 153624/0/0/862. Cache writes (WB, FS, HDFS):79043/0/2170. Cache times (ACQr/m, RLS, EXP): 32.052/0.038/5.508/55.790 sec. HOP DAGs recompiled (PRED, SB): 0/5979. HOP DAGs recompile time:3.670 sec. Functions recompiled: 10. Functions recompile time: 0.082 sec. Spark ctx create time (lazy): 0.959 sec. Spark trans counts (par,bc,col):347/1649/862. Spark trans times (par,bc,col): 0.671/25.076/31.988 secs. Total JIT compile time: 118.9 sec. Total JVM GC count: 267. Total JVM GC time: 7.523 sec. Heavy hitter instructions (name, time, count): -- 1) train 671.994 sec 1 -- 2) conv2d_bias_add 198.398 sec 3298 -- 3) maxpooling_backward 174.666 sec 1720 -- 4) predict 140.782 sec 9 -- 5) sp_mapmm94.035 sec 1649 -- 6) conv2d_backward_filter 63.328 sec 1720 -- 7) sp_sel+ 39.259 sec 860 -- 8) ba+*18.615 sec 5089 -- 9) +* 16.627 sec 10320 -- 10) conv2d_backward_data14.297 sec 860 {code} commit abc9686fbaaa11c12cfa02c49c7675165acdf176 (w/ IPA update) {code} 17/05/05 15:05:16 INFO ScriptExecutorUtils: SystemML Statistics: Total elapsed time: 673.900 sec. Total compilation time: 1.938 sec. Total execution time: 671.962 sec. Number of compiled Spark inst: 128. Number of executed Spark inst: 2513. Cache hits (Mem, WB, FS, HDFS): 153645/0/0/862. Cache writes (WB, FS, HDFS):79043/0/2149. Cache times (ACQr/m, RLS, EXP): 31.568/0.038/4.639/54.790 sec. HOP DAGs recompiled (PRED, SB): 0/5978. HOP DAGs recompile time:3.705 sec. Functions recompiled: 10. Functions recompile time: 0.068 sec. Spark ctx create time (lazy): 0.948 sec. Spark trans counts (par,bc,col):368/1649/862. Spark trans times (par,bc,col): 0.689/26.035/31.503 secs. Total JIT compile time: 111.921 sec. Total JVM GC count: 265. Total JVM GC time: 7.118 sec. Heavy hitter instructions (name, time, count): -- 1) train 634.306 sec 1 -- 2) conv2d_bias_add 190.557 sec 3298 -- 3) maxpooling_backward 141.588 sec 1720 -- 4) predict 135.222 sec 9 -- 5) sp_mapmm94.025 sec 1649 -- 6) conv2d_backward_filter 66.058 sec 1720 -- 7) sp_sel+ 39.204 sec 860 -- 8) +* 18.272 sec 10320 -- 9) ba+*15.804 sec 5089 -- 10) conv2d_backward_data13.627 sec 860 {code} w/ updated constant folding {code} 17/05/05 15:15:19 INFO ScriptExecutorUtils: SystemML Statistics: Total elapsed time: 405.615 sec. Total compilation time: 2.070 sec. Total execution time: 403.545 sec. Number of compiled Spark inst: 139. Number of executed Spark inst: 793. Cache hits (Mem, WB, FS, HDFS): 156654/0/0/2. Cache writes (WB, FS, HDFS):79043/0/8. Cache times (ACQr/m, RLS, EXP): 3.467/0.043/3.566/1.175 sec. HOP DAGs recompiled (PRED, SB): 0/5978. HOP DAGs recompile time:3.178 sec. Functions recompiled: 10. Functions recompile time: 0.072 sec. Spark ctx create time (lazy): 1.024 sec. Spark trans counts (par,bc,col):789/789/2. Spark trans times (par,bc,col): 0.982/0.299/3.418 secs. Total JIT compile time: 145.368 sec. Total JVM GC count: 438. Total JVM GC time: 8.992 sec. Heavy hitter instructions (name, time, count): -- 1) train 370.373 sec 1 -- 2) conv2d_bias_add 178.914 sec 3298 -- 3) predict 116.145 sec 9 -- 4) conv2d_backward_filter 55.582 sec 1720 -- 5) +* 18.948 sec 10320 -- 6) sel+18.238 sec 3369 -- 7) ba+*16.171 sec 5949 -- 8) conv2d_backward_data15.038 sec 860 -- 9) sp_mapmm13.980 sec 789 -- 10) relu_maxpooling 12.415 sec 3298 {code} With the IPA scalar replacement + constant folding updates, we've gained an additional ~300s, for a ~1.75x speedup in this scenario. > Improve constant
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003330#comment-16003330 ] Mike Dusenberry commented on SYSTEMML-1561: --- [PR 484 | https://github.com/apache/incubator-systemml/pull/484] submitted. [~mboehm7] Can you please review when you get a chance? > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry >Assignee: Mike Dusenberry > Fix For: SystemML 1.0 > > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996123#comment-15996123 ] Matthias Boehm commented on SYSTEMML-1561: -- sounds great - a second chance would be useful for many other scenarios too. The 2x runtime improvement is a bit surprising though because very similar rewrites would be performed during dynamic recompilation (except constant folding, which is covered by size expression over sub dags of scalar operations with symbol table inputs) and dynamic recompilation itself was not the bottleneck. I would be very interested to know were this is coming from, maybe some cascade of other rewrites/fused operator? You can set {{ProgramRewriter.LDEBUG = true}} to see the applied simplification rewrites along with line numbers where they originate from. For your PR, if you want to ensure that future compiler modifications preserve this behavior, please add a test into {{functions.recompile}} or {{functions.misc}}, similar to other size-dependent rewrites - the easiest way is to construct a case, where without size propagation we would compile/execute distributed operations and simply compare the number of compiled/executed Spark instructions with expected values. > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Fix For: SystemML 1.0 > > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995732#comment-15995732 ] Mike Dusenberry commented on SYSTEMML-1561: --- Just FYI, I'm making some progress on this. Essentially, by rerunning static rewrites + IPA again immediately after the initial IPA pass as a kind of "second chance", we're able to apply this constant folding rewrite for this scenario. This make sense because during the initial static rewrite pass, we can't apply constant folding to the {{Hout}}, {{Wout}}, etc. DAGs due to the leaf nodes being scalar transient reads. After IPA with the new scalar replacement, these DAGs will become entirely operations on literal leaf nodes, and thus eligible for constant folding. Then, after that second pass of static rewrites, we can benefit from IPA again by being able to now perform scalar replacement for functions/other DAGs that consume the {{Hout}}, {{Wout}}, etc. DAGs, which are now literals. In terms of performance, I'm seeing the execution time cut in half (~500s faster) for SYSTEMML-1566. I can open a PR soon. > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Fix For: SystemML 1.0 > > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988311#comment-15988311 ] Mike Dusenberry commented on SYSTEMML-1561: --- Yeah that's a good point, in these cases the functions are indeed inlined. Also, to be clear, the IPA scalar propagation causes the scalar leaf nodes of the {{Hout}} or {{Wout}} sub-dags to be replaced with literals, but {{Hout}} and {{Wout}} themselves are still not evaluated. I.e., for each there is still a dag of basic scalar operations that needs to be evaluated, but the good news is that it is not dependent on anything except for literals at the ends. > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988262#comment-15988262 ] Matthias Boehm commented on SYSTEMML-1561: -- so my guess is, this issue right here will be resolved once we rework the entire scalar propagation into functions and across the entire program. > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988258#comment-15988258 ] Matthias Boehm commented on SYSTEMML-1561: -- ok, just to clarify: even the recompile explain output does not show the worst-case size estimates and computed size expressions as they are only transiently inferred and used for memory estimates. For example, given a right indexing B = A[x:y, z] with unknown scalars x, y, and z we would still use a worst-case estimate of nrow(A) x 1 for B. Anyway, if you've seen that the extended scalar propagation solves it, than that's fine, but likely only works because this forward function is inlined. > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15986181#comment-15986181 ] Matthias Boehm commented on SYSTEMML-1561: -- sorry I don't have a lot of free cycles right now - but could look into it end of next week. [~niketanpansare] it would be good if you could have a detailed look. Generally, I think this is probably just a misunderstanding. We perform constant folding during initial compilation but not during dynamic recompilation and this scenario seem (without a closer look) to require the latter. During dynamic recompilation, we compute worst-case size estimates including the evaluation of scalar sub trees but these estimates are simply not exposed in our explain output. However, of course there might be issues. > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985362#comment-15985362 ] Niketan Pansare commented on SYSTEMML-1561: --- [~mwdus...@us.ibm.com] I am pretty swamped until mid-June with conferences and other features. > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985353#comment-15985353 ] Mike Dusenberry commented on SYSTEMML-1561: --- Yeah, let's just fix this issue properly, and then no need for hacks in Caffe2DML or anywhere else! [~niketanpansare], [~mboehm7] Can either of you work on this? > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985257#comment-15985257 ] Niketan Pansare commented on SYSTEMML-1561: --- [~mwdus...@us.ibm.com] We definitely need to address this issue. As an FYI, Caffe2DML addresses this issue by computing the shapes at compile time rather than runtime. > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983477#comment-15983477 ] Mike Dusenberry commented on SYSTEMML-1561: --- cc [~niketanpansare] > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. > Mailing list thread: > https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
[ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982214#comment-15982214 ] Mike Dusenberry commented on SYSTEMML-1561: --- cc [~mboehm7] > Improve constant folding during compilation > --- > > Key: SYSTEMML-1561 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1561 > Project: SystemML > Issue Type: Improvement >Reporter: Mike Dusenberry > Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, > scenario2.py > > > In our `nn` library, our convolution and pooling layers have to pass around > the spatial dimensions (height and width) of the images that are stretched > out into rows of the input/output matrices. These output dimensions are > computed within the forward functions of the above layers as small scalar > equations. From a mathematical standpoint, these sizes can be determined at > compile time, and it is nice to have these size equations in DML (v.s. hiding > them inside the engine within built-in functions). However, we do not > currently evaluate these expressions during compilation, and thus we are left > with unknown sizes even during recompilation. This naturally leads to max > memory estimates and thus often leads to unnecessary distributed runtime ops > rather than simple CP ones. > I have two related scenarios for which this is a problem. They both involve > the {{Houtc1}} & {{Woutc1}} values that are returned from a > `conv2d::forward(...)` function. These represent the spatial dimensions of > the volume with each of the rows of the output {{outc1}} of the function, and > the third dimension is {{F1}}. Thus, {{outc1}} has a number of columns equal > to {{F1*Houtc1*Wouc1}}. > In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is > created that should have the same dimensions as {{outc1}}. For the columns, > if I use {{cols=ncol(outc1)}} in this rand statement, the size will be > propagated and CP ops will be compiled and run. I I instead use > {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during > recompilation, and thus Spark ops will be compiled and run. I have included > the recompile hops plan ({{scenario1_plan.txt}}). > In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} > function is inserted after the {{conv2d::forward(...)}} function that > requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. > Since those latter variables are not executed during compilation time, the > max pooling sizes remain unknown, even during recompilation, and thus Spark > ops will be compiled and run. I have included the recompile hops plan > ({{scenario2_plan.txt}}). > We should either improve or fix our constant folding rewrites so that these > scenarios are fixed, as they are necessary for performant deep learning > applications. Note too that this issue will be present in other non-deep > learning scenarios as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)