[ https://issues.apache.org/jira/browse/SYSTEMML-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003543#comment-16003543 ]
Mike Dusenberry edited comment on SYSTEMML-1566 at 5/9/17 9:01 PM: ------------------------------------------------------------------- I ran experiments again using (1) the commit before the IPA scalar replacement update, (2) the commit with the IPA scalar replacement update, and (3) the proposed commit with the updated constant folding (which relies on the IPA update for usefulness), and measured the following results: commit 2c5c3b14e1906cda70ae1581b19a5e908b3ab329 (pre IPA update) {code} 17/05/05 14:39:49 INFO ScriptExecutorUtils: SystemML Statistics: Total elapsed time: 712.183 sec. Total compilation time: 1.996 sec. Total execution time: 710.187 sec. Number of compiled Spark inst: 134. Number of executed Spark inst: 2513. Cache hits (Mem, WB, FS, HDFS): 153624/0/0/862. Cache writes (WB, FS, HDFS): 79043/0/2170. Cache times (ACQr/m, RLS, EXP): 32.052/0.038/5.508/55.790 sec. HOP DAGs recompiled (PRED, SB): 0/5979. HOP DAGs recompile time: 3.670 sec. Functions recompiled: 10. Functions recompile time: 0.082 sec. Spark ctx create time (lazy): 0.959 sec. Spark trans counts (par,bc,col):347/1649/862. Spark trans times (par,bc,col): 0.671/25.076/31.988 secs. Total JIT compile time: 118.9 sec. Total JVM GC count: 267. Total JVM GC time: 7.523 sec. Heavy hitter instructions (name, time, count): -- 1) train 671.994 sec 1 -- 2) conv2d_bias_add 198.398 sec 3298 -- 3) maxpooling_backward 174.666 sec 1720 -- 4) predict 140.782 sec 9 -- 5) sp_mapmm 94.035 sec 1649 -- 6) conv2d_backward_filter 63.328 sec 1720 -- 7) sp_sel+ 39.259 sec 860 -- 8) ba+* 18.615 sec 5089 -- 9) +* 16.627 sec 10320 -- 10) conv2d_backward_data 14.297 sec 860 {code} commit abc9686fbaaa11c12cfa02c49c7675165acdf176 (w/ IPA update) {code} 17/05/05 15:05:16 INFO ScriptExecutorUtils: SystemML Statistics: Total elapsed time: 673.900 sec. Total compilation time: 1.938 sec. Total execution time: 671.962 sec. Number of compiled Spark inst: 128. Number of executed Spark inst: 2513. Cache hits (Mem, WB, FS, HDFS): 153645/0/0/862. Cache writes (WB, FS, HDFS): 79043/0/2149. Cache times (ACQr/m, RLS, EXP): 31.568/0.038/4.639/54.790 sec. HOP DAGs recompiled (PRED, SB): 0/5978. HOP DAGs recompile time: 3.705 sec. Functions recompiled: 10. Functions recompile time: 0.068 sec. Spark ctx create time (lazy): 0.948 sec. Spark trans counts (par,bc,col):368/1649/862. Spark trans times (par,bc,col): 0.689/26.035/31.503 secs. Total JIT compile time: 111.921 sec. Total JVM GC count: 265. Total JVM GC time: 7.118 sec. Heavy hitter instructions (name, time, count): -- 1) train 634.306 sec 1 -- 2) conv2d_bias_add 190.557 sec 3298 -- 3) maxpooling_backward 141.588 sec 1720 -- 4) predict 135.222 sec 9 -- 5) sp_mapmm 94.025 sec 1649 -- 6) conv2d_backward_filter 66.058 sec 1720 -- 7) sp_sel+ 39.204 sec 860 -- 8) +* 18.272 sec 10320 -- 9) ba+* 15.804 sec 5089 -- 10) conv2d_backward_data 13.627 sec 860 {code} w/ updated constant folding {code} 17/05/05 15:15:19 INFO ScriptExecutorUtils: SystemML Statistics: Total elapsed time: 405.615 sec. Total compilation time: 2.070 sec. Total execution time: 403.545 sec. Number of compiled Spark inst: 139. Number of executed Spark inst: 793. Cache hits (Mem, WB, FS, HDFS): 156654/0/0/2. Cache writes (WB, FS, HDFS): 79043/0/8. Cache times (ACQr/m, RLS, EXP): 3.467/0.043/3.566/1.175 sec. HOP DAGs recompiled (PRED, SB): 0/5978. HOP DAGs recompile time: 3.178 sec. Functions recompiled: 10. Functions recompile time: 0.072 sec. Spark ctx create time (lazy): 1.024 sec. Spark trans counts (par,bc,col):789/789/2. Spark trans times (par,bc,col): 0.982/0.299/3.418 secs. Total JIT compile time: 145.368 sec. Total JVM GC count: 438. Total JVM GC time: 8.992 sec. Heavy hitter instructions (name, time, count): -- 1) train 370.373 sec 1 -- 2) conv2d_bias_add 178.914 sec 3298 -- 3) predict 116.145 sec 9 -- 4) conv2d_backward_filter 55.582 sec 1720 -- 5) +* 18.948 sec 10320 -- 6) sel+ 18.238 sec 3369 -- 7) ba+* 16.171 sec 5949 -- 8) conv2d_backward_data 15.038 sec 860 -- 9) sp_mapmm 13.980 sec 789 -- 10) relu_maxpooling 12.415 sec 3298 {code} It appears that there was a bug with the max pooling built-in operator that was fixed since the 0.14 release, which brought the execution time down from ~1000s to ~700s. Then, with the IPA scalar replacement + constant folding updates, we've gained an additional ~300s, for a ~1.75x speedup in this scenario. was (Author: mwdus...@us.ibm.com): I ran experiments again using (1) the commit before the IPA scalar replacement update, (2) the commit with the IPA scalar replacement update, and (3) the proposed commit with the updated constant folding (which relies on the IPA update for usefulness), and measured the following results: commit 2c5c3b14e1906cda70ae1581b19a5e908b3ab329 (pre IPA update) {code} 17/05/05 14:39:49 INFO ScriptExecutorUtils: SystemML Statistics: Total elapsed time: 712.183 sec. Total compilation time: 1.996 sec. Total execution time: 710.187 sec. Number of compiled Spark inst: 134. Number of executed Spark inst: 2513. Cache hits (Mem, WB, FS, HDFS): 153624/0/0/862. Cache writes (WB, FS, HDFS): 79043/0/2170. Cache times (ACQr/m, RLS, EXP): 32.052/0.038/5.508/55.790 sec. HOP DAGs recompiled (PRED, SB): 0/5979. HOP DAGs recompile time: 3.670 sec. Functions recompiled: 10. Functions recompile time: 0.082 sec. Spark ctx create time (lazy): 0.959 sec. Spark trans counts (par,bc,col):347/1649/862. Spark trans times (par,bc,col): 0.671/25.076/31.988 secs. Total JIT compile time: 118.9 sec. Total JVM GC count: 267. Total JVM GC time: 7.523 sec. Heavy hitter instructions (name, time, count): -- 1) train 671.994 sec 1 -- 2) conv2d_bias_add 198.398 sec 3298 -- 3) maxpooling_backward 174.666 sec 1720 -- 4) predict 140.782 sec 9 -- 5) sp_mapmm 94.035 sec 1649 -- 6) conv2d_backward_filter 63.328 sec 1720 -- 7) sp_sel+ 39.259 sec 860 -- 8) ba+* 18.615 sec 5089 -- 9) +* 16.627 sec 10320 -- 10) conv2d_backward_data 14.297 sec 860 {code} commit abc9686fbaaa11c12cfa02c49c7675165acdf176 (w/ IPA update) {code} 17/05/05 15:05:16 INFO ScriptExecutorUtils: SystemML Statistics: Total elapsed time: 673.900 sec. Total compilation time: 1.938 sec. Total execution time: 671.962 sec. Number of compiled Spark inst: 128. Number of executed Spark inst: 2513. Cache hits (Mem, WB, FS, HDFS): 153645/0/0/862. Cache writes (WB, FS, HDFS): 79043/0/2149. Cache times (ACQr/m, RLS, EXP): 31.568/0.038/4.639/54.790 sec. HOP DAGs recompiled (PRED, SB): 0/5978. HOP DAGs recompile time: 3.705 sec. Functions recompiled: 10. Functions recompile time: 0.068 sec. Spark ctx create time (lazy): 0.948 sec. Spark trans counts (par,bc,col):368/1649/862. Spark trans times (par,bc,col): 0.689/26.035/31.503 secs. Total JIT compile time: 111.921 sec. Total JVM GC count: 265. Total JVM GC time: 7.118 sec. Heavy hitter instructions (name, time, count): -- 1) train 634.306 sec 1 -- 2) conv2d_bias_add 190.557 sec 3298 -- 3) maxpooling_backward 141.588 sec 1720 -- 4) predict 135.222 sec 9 -- 5) sp_mapmm 94.025 sec 1649 -- 6) conv2d_backward_filter 66.058 sec 1720 -- 7) sp_sel+ 39.204 sec 860 -- 8) +* 18.272 sec 10320 -- 9) ba+* 15.804 sec 5089 -- 10) conv2d_backward_data 13.627 sec 860 {code} w/ updated constant folding {code} 17/05/05 15:15:19 INFO ScriptExecutorUtils: SystemML Statistics: Total elapsed time: 405.615 sec. Total compilation time: 2.070 sec. Total execution time: 403.545 sec. Number of compiled Spark inst: 139. Number of executed Spark inst: 793. Cache hits (Mem, WB, FS, HDFS): 156654/0/0/2. Cache writes (WB, FS, HDFS): 79043/0/8. Cache times (ACQr/m, RLS, EXP): 3.467/0.043/3.566/1.175 sec. HOP DAGs recompiled (PRED, SB): 0/5978. HOP DAGs recompile time: 3.178 sec. Functions recompiled: 10. Functions recompile time: 0.072 sec. Spark ctx create time (lazy): 1.024 sec. Spark trans counts (par,bc,col):789/789/2. Spark trans times (par,bc,col): 0.982/0.299/3.418 secs. Total JIT compile time: 145.368 sec. Total JVM GC count: 438. Total JVM GC time: 8.992 sec. Heavy hitter instructions (name, time, count): -- 1) train 370.373 sec 1 -- 2) conv2d_bias_add 178.914 sec 3298 -- 3) predict 116.145 sec 9 -- 4) conv2d_backward_filter 55.582 sec 1720 -- 5) +* 18.948 sec 10320 -- 6) sel+ 18.238 sec 3369 -- 7) ba+* 16.171 sec 5949 -- 8) conv2d_backward_data 15.038 sec 860 -- 9) sp_mapmm 13.980 sec 789 -- 10) relu_maxpooling 12.415 sec 3298 {code} It appears that there was a bug with the max pooling built-in operator that was fixed since the 0.14 release, which brought the execution time down from ~1000s to ~700s. Then, with the IPA scalar replacement + constant folding updates, we've gained an addition ~300s, for a ~1.75x speedup in this scenario. > Possible regression from 0.13 -> 0.14 for MNIST LeNet script > ------------------------------------------------------------ > > Key: SYSTEMML-1566 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1566 > Project: SystemML > Issue Type: Bug > Reporter: Mike Dusenberry > Fix For: SystemML 1.0 > > Attachments: explain.txt, stats.txt > > > For the 0.14 release testing, I tried out the [MNIST LeNet example | > https://github.com/apache/incubator-systemml/blob/master/scripts/nn/examples/mnist_lenet-train.dml] > on both 0.13 and 0.14 and noticed a possible regression. Basically, on 0.14 > the script took longer to run and had 2513 Spark instructions executed, while > on 0.13 only 864 Spark instructions were executed. This was run locally on a > laptop using the 2 instructions at the top of the script (and copied below). > I've also attached the stats and runtime explain logs. > 1. Download data > {code} > nn/examples/get_mnist_data.sh > {code} > 2. Execute from the {{scripts}} directory. > {code} > spark-submit --master local[*] --driver-memory 10G --conf > spark.driver.maxResultSize=0 --conf spark.rpc.message.maxSize=128 > SystemML.jar -f nn/examples/mnist_lenet-train.dml -stats -explain -nvargs > train=nn/examples/data/mnist/mnist_train.csv > test=nn/examples/data/mnist/mnist_test.csv C=1 Hin=28 Win=28 epochs=1 > out_dir=nn/examples/model/mnist_lenet > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)