[ 
https://issues.apache.org/jira/browse/SYSTEMML-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418035#comment-15418035
 ] 

Imran Younus commented on SYSTEMML-843:
---------------------------------------

[~mboehm7] You're right. This is not very good for the large data points. My 
initial intension was to use this for genome data sets where number of data 
points is quite small but (e.g 1000 genome data) but each data vector can be 
very large.

I do plan to implement the tree version which required kNN. Thats why I was 
asking about kNN some times ago. There is also a Barnes-Hut version, and as far 
as I understand, one parallelize/distribute Barnes-Hut algorithm. I'm still in 
the process of understanding these.



> leftIndex and cache release extremely slow
> ------------------------------------------
>
>                 Key: SYSTEMML-843
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-843
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Imran Younus
>         Attachments: tSNT.tar.gz
>
>
> I'm running the tSNE script in standalone mode with a subset of MNIST data 
> (2500 points). I ran this with and without  `-exec singlenode`. Here are the 
> stats:
> (BTW, the same function implemented in python takes less than 10 sec!)
> -> with singlenode flag
> {code}
> ./bin/systemml scripts/staging/tSNE.dml -stats -nvargs 
> X=/home/iyounus/workspace/tsne_python/mnist2500_X.txt Y=Y_out.txt C=C_out.txt
> 16/08/01 16:46:54 INFO api.DMLScript: SystemML Statistics:
> Total elapsed time:           109.667 sec.
> Total compilation time:               0.407 sec.
> Total execution time:         109.260 sec.
> Number of compiled MR Jobs:   0.
> Number of executed MR Jobs:   0.
> Cache hits (Mem, WB, FS, HDFS):       223692/0/0/1.
> Cache writes (WB, FS, HDFS):  80351/0/2.
> Cache times (ACQr/m, RLS, EXP):       0.289/0.015/85.192/0.043 sec.
> HOP DAGs recompiled (PRED, SB):       0/0.
> HOP DAGs recompile time:      0.007 sec.
> Functions recompiled:         1.
> Functions recompile time:     0.039 sec.
> Total JIT compile time:               4.924 sec.
> Total JVM GC count:           312.
> Total JVM GC time:            1.12 sec.
> Heavy hitter instructions (name, time, count):
> -- 1)         tsne    109.202 sec     1
> -- 2)         x2p     109.189 sec     1
> -- 3)         leftIndex       106.728 sec     32136
> -- 4)         tsmm    0.564 sec       1
> -- 5)         exp     0.376 sec       8034
> -- 6)         rangeReIndex    0.201 sec       40170
> -- 7)         /       0.183 sec       24103
> -- 8)         *       0.161 sec       16069
> -- 9)         +       0.144 sec       22840
> -- 10)        uak+    0.106 sec       8036
> 16/08/01 16:46:54 INFO api.DMLScript: END DML run 08/01/2016 16:46:54
> {code}
> -> without singlenode flag
> {code}
> > ./bin/systemml scripts/staging/tSNE.dml -stats -nvargs 
> > X=/home/iyounus/workspace/tsne_python/mnist2500_X.txt Y=Y_out.txt 
> > C=C_out.txt
> 16/08/01 16:52:59 INFO api.DMLScript: SystemML Statistics:
> Total elapsed time:           127.290 sec.
> Total compilation time:               0.396 sec.
> Total execution time:         126.894 sec.
> Number of compiled MR Jobs:   1.
> Number of executed MR Jobs:   0.
> Cache hits (Mem, WB, FS, HDFS):       223693/0/0/1.
> Cache writes (WB, FS, HDFS):  80352/0/2.
> Cache times (ACQr/m, RLS, EXP):       0.421/0.016/100.974/0.041 sec.
> HOP DAGs recompiled (PRED, SB):       0/0.
> HOP DAGs recompile time:      0.009 sec.
> Functions recompiled:         1.
> Functions recompile time:     0.038 sec.
> Total JIT compile time:               4.835 sec.
> Total JVM GC count:           312.
> Total JVM GC time:            1.226 sec.
> Heavy hitter instructions (name, time, count):
> -- 1)         tsne    126.426 sec     1
> -- 2)         x2p     126.412 sec     1
> -- 3)         leftIndex       123.982 sec     32136
> -- 4)         exp     0.427 sec       8034
> -- 5)         MR-Job_CSV_REBLOCK      0.412 sec       1
> -- 6)         tsmm    0.308 sec       1
> -- 7)         rangeReIndex    0.242 sec       40170
> -- 8)         /       0.208 sec       24103
> -- 9)         +       0.172 sec       22840
> -- 10)        *       0.151 sec       16069
> 16/08/01 16:52:59 INFO api.DMLScript: END DML run 08/01/2016 16:52:59
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to