[ 
https://issues.apache.org/jira/browse/SYSTEMML-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403525#comment-15403525
 ] 

Matthias Boehm commented on SYSTEMML-843:
-----------------------------------------

with a slight script modification I get the following
{code}
Total elapsed time:             5.795 sec.
Total compilation time:         1.476 sec.
Total execution time:           4.319 sec.
Number of compiled MR Jobs:     1.
Number of executed MR Jobs:     0.
Cache hits (Mem, WB, FS, HDFS): 136520/0/0/1.
Cache writes (WB, FS, HDFS):    49016/0/2.
Cache times (ACQr/m, RLS, EXP): 0.884/0.019/0.319/0.314 sec.
HOP DAGs recompiled (PRED, SB): 0/0.
HOP DAGs recompile time:        0.011 sec.
Functions recompiled:           1.
Functions recompile time:       0.049 sec.
Total JIT compile time:         9.145 sec.
Total JVM GC count:             11.
Total JVM GC time:              0.226 sec.
Heavy hitter instructions (name, time, count):
-- 1)   tsne    3.089 sec       1
-- 2)   x2p     3.032 sec       1
-- 3)   MR-Job_CSV_REBLOCK      0.893 sec       1
-- 4)   exp     0.544 sec       8034
-- 5)   /       0.330 sec       24103
-- 6)   write   0.315 sec       2
-- 7)   tsmm    0.281 sec       1
-- 8)   *       0.252 sec       16069
-- 9)   leftIndex       0.252 sec       16468
-- 10)  +       0.204 sec       22840
{code}

please do the following change for now, which allows to to update-in-place on P
{code}
      Di = D[i,];
      while (abs(Hdiff) > tol & itr < 50) {
        Pi = exp(-Di * beta[i,1])
        Pi[1,i] = 0.
        sum_Pi = sum(Pi)

        H = log(sum_Pi) + beta[i,1] * sum(Di * Pi) / sum_Pi
        Pi = Pi/sum_Pi
        Hdiff = as.scalar(H - logU)

        #print("iter : " + itr)

        if (Hdiff > 0.) {
          #print ("in Hdiff > 0.0")
          betamin = as.scalar(beta[i,1])
          if (betamax == INF) {
            beta[i,1] = beta[i,1] * 2.
          } else {
             beta[i,1] = (beta[i,1] + betamax) / 2.
          }
        } else {
          #print("in Hdiff <= 0.0")

          betamax = as.scalar(beta[i,1])
          if (betamin == 0.) {
            beta[i,1] = beta[i,1] / 2.
          } else {
            beta[i,1] = (beta[i,1] + betamin) / 2.
          }
        }
        itr = itr + 1
      }

      P[i,] = Pi;
{code}

> leftIndex and cache release extremely slow
> ------------------------------------------
>
>                 Key: SYSTEMML-843
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-843
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Imran Younus
>         Attachments: tSNT.tar.gz
>
>
> I'm running the tSNE script in standalone mode with a subset of MNIST data 
> (2500 points). I ran this with and without  `-exec singlenode`. Here are the 
> stats:
> (BTW, the same function implemented in python takes less than 10 sec!)
> -> with singlenode flag
> {code}
> ./bin/systemml scripts/staging/tSNE.dml -stats -nvargs 
> X=/home/iyounus/workspace/tsne_python/mnist2500_X.txt Y=Y_out.txt C=C_out.txt
> 16/08/01 16:46:54 INFO api.DMLScript: SystemML Statistics:
> Total elapsed time:           109.667 sec.
> Total compilation time:               0.407 sec.
> Total execution time:         109.260 sec.
> Number of compiled MR Jobs:   0.
> Number of executed MR Jobs:   0.
> Cache hits (Mem, WB, FS, HDFS):       223692/0/0/1.
> Cache writes (WB, FS, HDFS):  80351/0/2.
> Cache times (ACQr/m, RLS, EXP):       0.289/0.015/85.192/0.043 sec.
> HOP DAGs recompiled (PRED, SB):       0/0.
> HOP DAGs recompile time:      0.007 sec.
> Functions recompiled:         1.
> Functions recompile time:     0.039 sec.
> Total JIT compile time:               4.924 sec.
> Total JVM GC count:           312.
> Total JVM GC time:            1.12 sec.
> Heavy hitter instructions (name, time, count):
> -- 1)         tsne    109.202 sec     1
> -- 2)         x2p     109.189 sec     1
> -- 3)         leftIndex       106.728 sec     32136
> -- 4)         tsmm    0.564 sec       1
> -- 5)         exp     0.376 sec       8034
> -- 6)         rangeReIndex    0.201 sec       40170
> -- 7)         /       0.183 sec       24103
> -- 8)         *       0.161 sec       16069
> -- 9)         +       0.144 sec       22840
> -- 10)        uak+    0.106 sec       8036
> 16/08/01 16:46:54 INFO api.DMLScript: END DML run 08/01/2016 16:46:54
> {code}
> -> without singlenode flag
> {code}
> > ./bin/systemml scripts/staging/tSNE.dml -stats -nvargs 
> > X=/home/iyounus/workspace/tsne_python/mnist2500_X.txt Y=Y_out.txt 
> > C=C_out.txt
> 16/08/01 16:52:59 INFO api.DMLScript: SystemML Statistics:
> Total elapsed time:           127.290 sec.
> Total compilation time:               0.396 sec.
> Total execution time:         126.894 sec.
> Number of compiled MR Jobs:   1.
> Number of executed MR Jobs:   0.
> Cache hits (Mem, WB, FS, HDFS):       223693/0/0/1.
> Cache writes (WB, FS, HDFS):  80352/0/2.
> Cache times (ACQr/m, RLS, EXP):       0.421/0.016/100.974/0.041 sec.
> HOP DAGs recompiled (PRED, SB):       0/0.
> HOP DAGs recompile time:      0.009 sec.
> Functions recompiled:         1.
> Functions recompile time:     0.038 sec.
> Total JIT compile time:               4.835 sec.
> Total JVM GC count:           312.
> Total JVM GC time:            1.226 sec.
> Heavy hitter instructions (name, time, count):
> -- 1)         tsne    126.426 sec     1
> -- 2)         x2p     126.412 sec     1
> -- 3)         leftIndex       123.982 sec     32136
> -- 4)         exp     0.427 sec       8034
> -- 5)         MR-Job_CSV_REBLOCK      0.412 sec       1
> -- 6)         tsmm    0.308 sec       1
> -- 7)         rangeReIndex    0.242 sec       40170
> -- 8)         /       0.208 sec       24103
> -- 9)         +       0.172 sec       22840
> -- 10)        *       0.151 sec       16069
> 16/08/01 16:52:59 INFO api.DMLScript: END DML run 08/01/2016 16:52:59
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to