[ https://issues.apache.org/jira/browse/SYSTEMML-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405096#comment-15405096 ]
Imran Younus commented on SYSTEMML-843: --------------------------------------- [~nakul02] and I investigated tSNE algorithm further today. After applying the changes suggested by [~mboehm7], there was huge improvement in the function {{x2p}}. Today we looked at the performance of {{tsne}} which implements the gradient descent. It takes more than twice as much time as the python implementation (820sec vs 390sec). I've update tSNE.dml script in github (https://github.com/apache/incubator-systemml/pull/200). Here are the stats: {code} 16/08/02 17:17:16 INFO api.DMLScript: SystemML Statistics: Total elapsed time: 823.878 sec. Total compilation time: 0.458 sec. Total execution time: 823.420 sec. Number of compiled MR Jobs: 1. Number of executed MR Jobs: 0. Cache hits (Mem, WB, FS, HDFS): 1299576/0/0/1. Cache writes (WB, FS, HDFS): 379333/0/2. Cache times (ACQr/m, RLS, EXP): 0.400/0.079/0.996/0.036 sec. HOP DAGs recompiled (PRED, SB): 0/2001. HOP DAGs recompile time: 1.035 sec. Functions recompiled: 2. Functions recompile time: 0.035 sec. Total JIT compile time: 19.167 sec. Total JVM GC count: 1951. Total JVM GC time: 13.888 sec. Heavy hitter instructions (name, time, count): -- 1) tsne 823.061 sec 1 -- 2) * 244.509 sec 120782 -- 3) / 173.698 sec 173172 -- 4) + 168.255 sec 164652 -- 5) - 91.713 sec 118780 -- 6) tak+* 53.855 sec 58390 -- 7) tsmm 48.542 sec 2001 -- 8) uark+ 27.817 sec 2000 -- 9) x2p 10.541 sec 1 -- 10) ba+* 6.135 sec 2000 16/08/02 17:17:16 INFO api.DMLScript: END DML run 08/02/2016 17:17:16 {code} Here is the {{tsne}} function: {code} tsne = function(matrix[double] X, int reduced_dims, int initial_dims, int perplexity) return(matrix[double] Y, matrix[double] C) { d = reduced_dims n = nrow(X) max_iter = 2000 eta = 500 P = x2p(X, 1.0e-5, 20.0) P = P*4 Y = rand(rows=n, cols=d, pdf="normal") C = matrix(0, rows=max_iter, cols=1) ZERODIAG = (diag(matrix(-1, rows=n, cols=1)) + 1) for (itr in 1:max_iter) { D = distance_matrix(Y) Z = 1/(D + 1) Z = Z * ZERODIAG Q = Z/sum(Z) W = (P - Q)*Z sumW = rowSums(W) grad_C = Y * sumW - W %*% Y Y = Y - eta*grad_C Y = Y - colMeans(Y) if (itr%%50 == 0) { #C[itr,] = sum(P * log(pmax(P, 1e-12) / pmax(Q, 1e-12))) #print(as.scalar(C[itr,1])) print(itr) } if (itr == 100) { P = P/4 } } } {code} > leftIndex and cache release extremely slow > ------------------------------------------ > > Key: SYSTEMML-843 > URL: https://issues.apache.org/jira/browse/SYSTEMML-843 > Project: SystemML > Issue Type: Bug > Reporter: Imran Younus > Attachments: tSNT.tar.gz > > > I'm running the tSNE script in standalone mode with a subset of MNIST data > (2500 points). I ran this with and without `-exec singlenode`. Here are the > stats: > (BTW, the same function implemented in python takes less than 10 sec!) > -> with singlenode flag > {code} > ./bin/systemml scripts/staging/tSNE.dml -stats -nvargs > X=/home/iyounus/workspace/tsne_python/mnist2500_X.txt Y=Y_out.txt C=C_out.txt > 16/08/01 16:46:54 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 109.667 sec. > Total compilation time: 0.407 sec. > Total execution time: 109.260 sec. > Number of compiled MR Jobs: 0. > Number of executed MR Jobs: 0. > Cache hits (Mem, WB, FS, HDFS): 223692/0/0/1. > Cache writes (WB, FS, HDFS): 80351/0/2. > Cache times (ACQr/m, RLS, EXP): 0.289/0.015/85.192/0.043 sec. > HOP DAGs recompiled (PRED, SB): 0/0. > HOP DAGs recompile time: 0.007 sec. > Functions recompiled: 1. > Functions recompile time: 0.039 sec. > Total JIT compile time: 4.924 sec. > Total JVM GC count: 312. > Total JVM GC time: 1.12 sec. > Heavy hitter instructions (name, time, count): > -- 1) tsne 109.202 sec 1 > -- 2) x2p 109.189 sec 1 > -- 3) leftIndex 106.728 sec 32136 > -- 4) tsmm 0.564 sec 1 > -- 5) exp 0.376 sec 8034 > -- 6) rangeReIndex 0.201 sec 40170 > -- 7) / 0.183 sec 24103 > -- 8) * 0.161 sec 16069 > -- 9) + 0.144 sec 22840 > -- 10) uak+ 0.106 sec 8036 > 16/08/01 16:46:54 INFO api.DMLScript: END DML run 08/01/2016 16:46:54 > {code} > -> without singlenode flag > {code} > > ./bin/systemml scripts/staging/tSNE.dml -stats -nvargs > > X=/home/iyounus/workspace/tsne_python/mnist2500_X.txt Y=Y_out.txt > > C=C_out.txt > 16/08/01 16:52:59 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 127.290 sec. > Total compilation time: 0.396 sec. > Total execution time: 126.894 sec. > Number of compiled MR Jobs: 1. > Number of executed MR Jobs: 0. > Cache hits (Mem, WB, FS, HDFS): 223693/0/0/1. > Cache writes (WB, FS, HDFS): 80352/0/2. > Cache times (ACQr/m, RLS, EXP): 0.421/0.016/100.974/0.041 sec. > HOP DAGs recompiled (PRED, SB): 0/0. > HOP DAGs recompile time: 0.009 sec. > Functions recompiled: 1. > Functions recompile time: 0.038 sec. > Total JIT compile time: 4.835 sec. > Total JVM GC count: 312. > Total JVM GC time: 1.226 sec. > Heavy hitter instructions (name, time, count): > -- 1) tsne 126.426 sec 1 > -- 2) x2p 126.412 sec 1 > -- 3) leftIndex 123.982 sec 32136 > -- 4) exp 0.427 sec 8034 > -- 5) MR-Job_CSV_REBLOCK 0.412 sec 1 > -- 6) tsmm 0.308 sec 1 > -- 7) rangeReIndex 0.242 sec 40170 > -- 8) / 0.208 sec 24103 > -- 9) + 0.172 sec 22840 > -- 10) * 0.151 sec 16069 > 16/08/01 16:52:59 INFO api.DMLScript: END DML run 08/01/2016 16:52:59 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)