[ 
https://issues.apache.org/jira/browse/SYSTEMML-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Younus updated SYSTEMML-1166:
-----------------------------------
    Summary: Bad performance of TSMM  (was: Performance issues with tsmm RIGHT)

> Bad performance of TSMM
> -----------------------
>
>                 Key: SYSTEMML-1166
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1166
>             Project: SystemML
>          Issue Type: Bug
>          Components: Runtime
>         Environment: spark 1.6.1
> centos
>            Reporter: Imran Younus
>
> I'm running the following simple dml code:
> {code}
> N = 30000
> L = Rand(cols=N, max=1, min=-1, pdf="uniform", rows=N)
> LLT = L %*% t(L)
> fn = sum(LLT * LLT)
> print(fn)
> {code}
> The line 3 is translated as {{tsmm RIGHT}} and it takes almost 30 min to run 
> both in standalone and spark modes.
> Here is the output from explain and stats:
> {code}
> > /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit 
> > --master=spark://rr-ram4.softlayer.com:7077 --executor-memory=20g 
> > --driver-memory=60g --executor-cores=4 --num-executors=20 
> > sysml/target/SystemML.jar -f tsmm_verify.dml -explain -stats
> 17/01/05 18:11:10 INFO api.DMLScript: BEGIN DML run 01/05/2017 18:11:10
> 17/01/05 18:11:10 INFO conf.DMLConfig: Using internal default configuration 
> settings.  If you wish to customize any settings, please supply a 
> `SystemML-config.xml` file.
> 17/01/05 18:11:10 INFO api.DMLScript: EXPLAIN (RUNTIME):
> # Memory Budget local/remote = 41216MB/?MB/?MB/?MB
> # Degree of Parallelism (vcores) local/remote = 48/?
> PROGRAM ( size CP/SP = 10/0 )
> --MAIN PROGRAM
> ----GENERIC (lines 8-24) [recompile=false]
> ------CP createvar _mVar1 scratch_space//_p22241_10.168.31.117//_t0/temp1 
> true MATRIX binaryblock 30000 30000 1000 1000 900000000 copy
> ------CP rand 30000 30000 1000 1000 -1 1 1.0 -1 uniform 1.0 48 
> _mVar1.MATRIX.DOUBLE
> ------CP createvar _mVar2 scratch_space//_p22241_10.168.31.117//_t0/temp2 
> true MATRIX binaryblock 30000 30000 1000 1000 -1 copy
> ------CP tsmm _mVar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT 48
> ------CP rmvar _mVar1
> ------CP uasqk+ _mVar2.MATRIX.DOUBLE _Var3.SCALAR.DOUBLE 48
> ------CP rmvar _mVar2
> ------CP print _Var3.SCALAR.DOUBLE.false _Var4.SCALAR.STRING
> ------CP rmvar _Var3
> ------CP rmvar _Var4
> 17/01/05 18:11:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 6.00000141554168E12
> 17/01/05 18:40:38 INFO api.DMLScript: SystemML Statistics:
> Total elapsed time:                         1768.639 sec.
> Total compilation time:                              0.266 sec.
> Total execution time:                              1768.373 sec.
> Number of compiled Spark inst:                             0.
> Number of executed Spark inst:                             0.
> Cache hits (Mem, WB, FS, HDFS):                            2/0/0/0.
> Cache writes (WB, FS, HDFS):                       2/1/0.
> Cache times (ACQr/m, RLS, EXP):                            
> 0.000/0.001/13.410/0.000 sec.
> HOP DAGs recompiled (PRED, SB):                            0/0.
> HOP DAGs recompile time:   0.000 sec.
> Spark ctx create time (lazy):  0.000 sec.
> Spark trans counts (par,bc,col):0/0/0.
> Spark trans times (par,bc,col):       0.000/0.000/0.000 secs.
> Total JIT compile time:                                 1.345 sec.
> Total JVM GC count:                                   3.
> Total JVM GC time:                                            4.428 sec.
> Heavy hitter instructions (name, time, count):
> -- 1)              tsmm         1762.014 sec  1
> -- 2)              rand         5.210 sec     1
> -- 3)              uasqk+       0.541 sec     1
> -- 4)              rmvar        0.002 sec     4
> -- 5)              print        0.002 sec     1
> -- 6)              createvar    0.001 sec     2
> 17/01/05 18:40:38 INFO api.DMLScript: END DML run 01/05/2017 18:40:38
> {code}
> If I calculate {{t(L) %*% L}} which is {{tsmm LEFT}}, then it takes only 5 
> min. Here is the output of explain and stats in this case:
> {code}
> > /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit 
> > --master=spark://rr-ram4.softlayer.com:7077 --executor-memory=20g 
> > --driver-memory=60g --executor-cores=4 --num-executors=20 
> > sysml/target/SystemML.jar -f tsmm_verify.dml -explain -stats
> 17/01/05 18:03:13 INFO api.DMLScript: BEGIN DML run 01/05/2017 18:03:13
> 17/01/05 18:03:13 INFO conf.DMLConfig: Using internal default configuration 
> settings.  If you wish to customize any settings, please supply a 
> `SystemML-config.xml` file.
> 17/01/05 18:03:13 INFO api.DMLScript: EXPLAIN (RUNTIME):
> # Memory Budget local/remote = 41216MB/?MB/?MB/?MB
> # Degree of Parallelism (vcores) local/remote = 48/?
> PROGRAM ( size CP/SP = 10/0 )
> --MAIN PROGRAM
> ----GENERIC (lines 8-24) [recompile=false]
> ------CP createvar _mVar1 scratch_space//_p19190_10.168.31.117//_t0/temp1 
> true MATRIX binaryblock 30000 30000 1000 1000 900000000 copy
> ------CP rand 30000 30000 1000 1000 -1 1 1.0 -1 uniform 1.0 48 
> _mVar1.MATRIX.DOUBLE
> ------CP createvar _mVar2 scratch_space//_p19190_10.168.31.117//_t0/temp2 
> true MATRIX binaryblock 30000 30000 1000 1000 -1 copy
> ------CP tsmm _mVar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE LEFT 48
> ------CP rmvar _mVar1
> ------CP uasqk+ _mVar2.MATRIX.DOUBLE _Var3.SCALAR.DOUBLE 48
> ------CP rmvar _mVar2
> ------CP print _Var3.SCALAR.DOUBLE.false _Var4.SCALAR.STRING
> ------CP rmvar _Var3
> ------CP rmvar _Var4
> 17/01/05 18:03:13 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> q5.999491296172791E12
> 17/01/05 18:09:14 INFO api.DMLScript: SystemML Statistics:
> Total elapsed time:                         361.293 sec.
> Total compilation time:                             0.267 sec.
> Total execution time:                             361.026 sec.
> Number of compiled Spark inst:                            0.
> Number of executed Spark inst:                            0.
> Cache hits (Mem, WB, FS, HDFS):                           2/0/0/0.
> Cache writes (WB, FS, HDFS):                      2/1/0.
> Cache times (ACQr/m, RLS, EXP):                           
> 0.000/0.001/12.827/0.000 sec.
> HOP DAGs recompiled (PRED, SB):                           0/0.
> HOP DAGs recompile time:   0.000 sec.
> Spark ctx create time (lazy):  0.000 sec.
> Spark trans counts (par,bc,col):0/0/0.
> Spark trans times (par,bc,col):       0.000/0.000/0.000 secs.
> Total JIT compile time:                                 1.32 sec.
> Total JVM GC count:                                  3.
> Total JVM GC time:                                    4.271 sec.
> Heavy hitter instructions (name, time, count):
> -- 1)              tsmm         354.384 sec  1
> -- 2)              rand         5.181 sec    1
> -- 3)              uasqk+       0.842 sec    1
> -- 4)              rmvar        0.002 sec    4
> -- 5)              createvar    0.001 sec    2
> -- 6)              print        0.001 sec    1
> 17/01/05 18:09:14 INFO api.DMLScript: END DML run 01/05/2017 18:09:14
> {code}
> This time difference for the two cases is way too large. I'm sure there is 
> some bug in runtime code that is causing this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to