[ https://issues.apache.org/jira/browse/SYSTEMML-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418035#comment-15418035 ]
Imran Younus commented on SYSTEMML-843: --------------------------------------- [~mboehm7] You're right. This is not very good for the large data points. My initial intension was to use this for genome data sets where number of data points is quite small but (e.g 1000 genome data) but each data vector can be very large. I do plan to implement the tree version which required kNN. Thats why I was asking about kNN some times ago. There is also a Barnes-Hut version, and as far as I understand, one parallelize/distribute Barnes-Hut algorithm. I'm still in the process of understanding these. > leftIndex and cache release extremely slow > ------------------------------------------ > > Key: SYSTEMML-843 > URL: https://issues.apache.org/jira/browse/SYSTEMML-843 > Project: SystemML > Issue Type: Bug > Reporter: Imran Younus > Attachments: tSNT.tar.gz > > > I'm running the tSNE script in standalone mode with a subset of MNIST data > (2500 points). I ran this with and without `-exec singlenode`. Here are the > stats: > (BTW, the same function implemented in python takes less than 10 sec!) > -> with singlenode flag > {code} > ./bin/systemml scripts/staging/tSNE.dml -stats -nvargs > X=/home/iyounus/workspace/tsne_python/mnist2500_X.txt Y=Y_out.txt C=C_out.txt > 16/08/01 16:46:54 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 109.667 sec. > Total compilation time: 0.407 sec. > Total execution time: 109.260 sec. > Number of compiled MR Jobs: 0. > Number of executed MR Jobs: 0. > Cache hits (Mem, WB, FS, HDFS): 223692/0/0/1. > Cache writes (WB, FS, HDFS): 80351/0/2. > Cache times (ACQr/m, RLS, EXP): 0.289/0.015/85.192/0.043 sec. > HOP DAGs recompiled (PRED, SB): 0/0. > HOP DAGs recompile time: 0.007 sec. > Functions recompiled: 1. > Functions recompile time: 0.039 sec. > Total JIT compile time: 4.924 sec. > Total JVM GC count: 312. > Total JVM GC time: 1.12 sec. > Heavy hitter instructions (name, time, count): > -- 1) tsne 109.202 sec 1 > -- 2) x2p 109.189 sec 1 > -- 3) leftIndex 106.728 sec 32136 > -- 4) tsmm 0.564 sec 1 > -- 5) exp 0.376 sec 8034 > -- 6) rangeReIndex 0.201 sec 40170 > -- 7) / 0.183 sec 24103 > -- 8) * 0.161 sec 16069 > -- 9) + 0.144 sec 22840 > -- 10) uak+ 0.106 sec 8036 > 16/08/01 16:46:54 INFO api.DMLScript: END DML run 08/01/2016 16:46:54 > {code} > -> without singlenode flag > {code} > > ./bin/systemml scripts/staging/tSNE.dml -stats -nvargs > > X=/home/iyounus/workspace/tsne_python/mnist2500_X.txt Y=Y_out.txt > > C=C_out.txt > 16/08/01 16:52:59 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 127.290 sec. > Total compilation time: 0.396 sec. > Total execution time: 126.894 sec. > Number of compiled MR Jobs: 1. > Number of executed MR Jobs: 0. > Cache hits (Mem, WB, FS, HDFS): 223693/0/0/1. > Cache writes (WB, FS, HDFS): 80352/0/2. > Cache times (ACQr/m, RLS, EXP): 0.421/0.016/100.974/0.041 sec. > HOP DAGs recompiled (PRED, SB): 0/0. > HOP DAGs recompile time: 0.009 sec. > Functions recompiled: 1. > Functions recompile time: 0.038 sec. > Total JIT compile time: 4.835 sec. > Total JVM GC count: 312. > Total JVM GC time: 1.226 sec. > Heavy hitter instructions (name, time, count): > -- 1) tsne 126.426 sec 1 > -- 2) x2p 126.412 sec 1 > -- 3) leftIndex 123.982 sec 32136 > -- 4) exp 0.427 sec 8034 > -- 5) MR-Job_CSV_REBLOCK 0.412 sec 1 > -- 6) tsmm 0.308 sec 1 > -- 7) rangeReIndex 0.242 sec 40170 > -- 8) / 0.208 sec 24103 > -- 9) + 0.172 sec 22840 > -- 10) * 0.151 sec 16069 > 16/08/01 16:52:59 INFO api.DMLScript: END DML run 08/01/2016 16:52:59 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)