[ https://issues.apache.org/jira/browse/SYSTEMML-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Imran Younus updated SYSTEMML-1166: ----------------------------------- Summary: Bad performance of TSMM (was: Performance issues with tsmm RIGHT) > Bad performance of TSMM > ----------------------- > > Key: SYSTEMML-1166 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1166 > Project: SystemML > Issue Type: Bug > Components: Runtime > Environment: spark 1.6.1 > centos > Reporter: Imran Younus > > I'm running the following simple dml code: > {code} > N = 30000 > L = Rand(cols=N, max=1, min=-1, pdf="uniform", rows=N) > LLT = L %*% t(L) > fn = sum(LLT * LLT) > print(fn) > {code} > The line 3 is translated as {{tsmm RIGHT}} and it takes almost 30 min to run > both in standalone and spark modes. > Here is the output from explain and stats: > {code} > > /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit > > --master=spark://rr-ram4.softlayer.com:7077 --executor-memory=20g > > --driver-memory=60g --executor-cores=4 --num-executors=20 > > sysml/target/SystemML.jar -f tsmm_verify.dml -explain -stats > 17/01/05 18:11:10 INFO api.DMLScript: BEGIN DML run 01/05/2017 18:11:10 > 17/01/05 18:11:10 INFO conf.DMLConfig: Using internal default configuration > settings. If you wish to customize any settings, please supply a > `SystemML-config.xml` file. > 17/01/05 18:11:10 INFO api.DMLScript: EXPLAIN (RUNTIME): > # Memory Budget local/remote = 41216MB/?MB/?MB/?MB > # Degree of Parallelism (vcores) local/remote = 48/? > PROGRAM ( size CP/SP = 10/0 ) > --MAIN PROGRAM > ----GENERIC (lines 8-24) [recompile=false] > ------CP createvar _mVar1 scratch_space//_p22241_10.168.31.117//_t0/temp1 > true MATRIX binaryblock 30000 30000 1000 1000 900000000 copy > ------CP rand 30000 30000 1000 1000 -1 1 1.0 -1 uniform 1.0 48 > _mVar1.MATRIX.DOUBLE > ------CP createvar _mVar2 scratch_space//_p22241_10.168.31.117//_t0/temp2 > true MATRIX binaryblock 30000 30000 1000 1000 -1 copy > ------CP tsmm _mVar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT 48 > ------CP rmvar _mVar1 > ------CP uasqk+ _mVar2.MATRIX.DOUBLE _Var3.SCALAR.DOUBLE 48 > ------CP rmvar _mVar2 > ------CP print _Var3.SCALAR.DOUBLE.false _Var4.SCALAR.STRING > ------CP rmvar _Var3 > ------CP rmvar _Var4 > 17/01/05 18:11:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 6.00000141554168E12 > 17/01/05 18:40:38 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 1768.639 sec. > Total compilation time: 0.266 sec. > Total execution time: 1768.373 sec. > Number of compiled Spark inst: 0. > Number of executed Spark inst: 0. > Cache hits (Mem, WB, FS, HDFS): 2/0/0/0. > Cache writes (WB, FS, HDFS): 2/1/0. > Cache times (ACQr/m, RLS, EXP): > 0.000/0.001/13.410/0.000 sec. > HOP DAGs recompiled (PRED, SB): 0/0. > HOP DAGs recompile time: 0.000 sec. > Spark ctx create time (lazy): 0.000 sec. > Spark trans counts (par,bc,col):0/0/0. > Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. > Total JIT compile time: 1.345 sec. > Total JVM GC count: 3. > Total JVM GC time: 4.428 sec. > Heavy hitter instructions (name, time, count): > -- 1) tsmm 1762.014 sec 1 > -- 2) rand 5.210 sec 1 > -- 3) uasqk+ 0.541 sec 1 > -- 4) rmvar 0.002 sec 4 > -- 5) print 0.002 sec 1 > -- 6) createvar 0.001 sec 2 > 17/01/05 18:40:38 INFO api.DMLScript: END DML run 01/05/2017 18:40:38 > {code} > If I calculate {{t(L) %*% L}} which is {{tsmm LEFT}}, then it takes only 5 > min. Here is the output of explain and stats in this case: > {code} > > /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit > > --master=spark://rr-ram4.softlayer.com:7077 --executor-memory=20g > > --driver-memory=60g --executor-cores=4 --num-executors=20 > > sysml/target/SystemML.jar -f tsmm_verify.dml -explain -stats > 17/01/05 18:03:13 INFO api.DMLScript: BEGIN DML run 01/05/2017 18:03:13 > 17/01/05 18:03:13 INFO conf.DMLConfig: Using internal default configuration > settings. If you wish to customize any settings, please supply a > `SystemML-config.xml` file. > 17/01/05 18:03:13 INFO api.DMLScript: EXPLAIN (RUNTIME): > # Memory Budget local/remote = 41216MB/?MB/?MB/?MB > # Degree of Parallelism (vcores) local/remote = 48/? > PROGRAM ( size CP/SP = 10/0 ) > --MAIN PROGRAM > ----GENERIC (lines 8-24) [recompile=false] > ------CP createvar _mVar1 scratch_space//_p19190_10.168.31.117//_t0/temp1 > true MATRIX binaryblock 30000 30000 1000 1000 900000000 copy > ------CP rand 30000 30000 1000 1000 -1 1 1.0 -1 uniform 1.0 48 > _mVar1.MATRIX.DOUBLE > ------CP createvar _mVar2 scratch_space//_p19190_10.168.31.117//_t0/temp2 > true MATRIX binaryblock 30000 30000 1000 1000 -1 copy > ------CP tsmm _mVar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE LEFT 48 > ------CP rmvar _mVar1 > ------CP uasqk+ _mVar2.MATRIX.DOUBLE _Var3.SCALAR.DOUBLE 48 > ------CP rmvar _mVar2 > ------CP print _Var3.SCALAR.DOUBLE.false _Var4.SCALAR.STRING > ------CP rmvar _Var3 > ------CP rmvar _Var4 > 17/01/05 18:03:13 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > q5.999491296172791E12 > 17/01/05 18:09:14 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 361.293 sec. > Total compilation time: 0.267 sec. > Total execution time: 361.026 sec. > Number of compiled Spark inst: 0. > Number of executed Spark inst: 0. > Cache hits (Mem, WB, FS, HDFS): 2/0/0/0. > Cache writes (WB, FS, HDFS): 2/1/0. > Cache times (ACQr/m, RLS, EXP): > 0.000/0.001/12.827/0.000 sec. > HOP DAGs recompiled (PRED, SB): 0/0. > HOP DAGs recompile time: 0.000 sec. > Spark ctx create time (lazy): 0.000 sec. > Spark trans counts (par,bc,col):0/0/0. > Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. > Total JIT compile time: 1.32 sec. > Total JVM GC count: 3. > Total JVM GC time: 4.271 sec. > Heavy hitter instructions (name, time, count): > -- 1) tsmm 354.384 sec 1 > -- 2) rand 5.181 sec 1 > -- 3) uasqk+ 0.842 sec 1 > -- 4) rmvar 0.002 sec 4 > -- 5) createvar 0.001 sec 2 > -- 6) print 0.001 sec 1 > 17/01/05 18:09:14 INFO api.DMLScript: END DML run 01/05/2017 18:09:14 > {code} > This time difference for the two cases is way too large. I'm sure there is > some bug in runtime code that is causing this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)