[ https://issues.apache.org/jira/browse/SYSTEMML-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881916#comment-15881916 ]
Matthias Boehm edited comment on SYSTEMML-1336 at 2/24/17 4:32 AM: ------------------------------------------------------------------- With an improved parfor optimizer that now also considers the what-if scenario of conditional partitioning the runtime improved substantially. {code} Total elapsed time: 110.139 sec. Total compilation time: 2.600 sec. Total execution time: 107.539 sec. Number of compiled Spark inst: 1. Number of executed Spark inst: 1. Cache hits (Mem, WB, FS, HDFS): 4/0/0/122. Cache writes (WB, FS, HDFS): 3/0/2. Cache times (ACQr/m, RLS, EXP): 10.512/0.001/0.003/0.357 sec. HOP DAGs recompiled (PRED, SB): 0/0. HOP DAGs recompile time: 0.000 sec. Spark ctx create time (lazy): 33.529 sec. Spark trans counts (par,bc,col):0/0/0. Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. ParFor loops optimized: 1. ParFor optimize time: 2.070 sec. ParFor initialize time: 0.000 sec. ParFor result merge time: 0.863 sec. ParFor total update in-place: 0/0/0 Total JIT compile time: 30.467 sec. Total JVM GC count: 7. Total JVM GC time: 4.105 sec. Heavy hitter instructions (name, time, count): -- 1) ParFor-DPESP 93.557 sec 1 -- 2) uacmax 10.189 sec 1 -- 3) write 0.119 sec 1 -- 4) > 0.073 sec 1 -- 5) rand 0.005 sec 1 -- 6) * 0.002 sec 1 -- 7) rmvar 0.000 sec 8 -- 8) createvar 0.000 sec 6 -- 9) uamax 0.000 sec 1 -- 10) assignvar 0.000 sec 5 {code} was (Author: mboehm7): With an improved parfor optimizer that now also considers the what-if scenario of conditional partitioning the runtime improved substantially. {code} Total elapsed time: 125.192 sec. Total compilation time: 2.085 sec. Total execution time: 123.106 sec. Number of compiled Spark inst: 1. Number of executed Spark inst: 1. Cache hits (Mem, WB, FS, HDFS): 4/0/0/122. Cache writes (WB, FS, HDFS): 3/0/2. Cache times (ACQr/m, RLS, EXP): 11.163/0.001/0.004/0.369 sec. HOP DAGs recompiled (PRED, SB): 0/0. HOP DAGs recompile time: 0.000 sec. Spark ctx create time (lazy): 31.174 sec. Spark trans counts (par,bc,col):0/0/0. Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. ParFor loops optimized: 1. ParFor optimize time: 2.065 sec. ParFor initialize time: 0.000 sec. ParFor result merge time: 0.844 sec. ParFor total update in-place: 0/0/0 Total JIT compile time: 23.855 sec. Total JVM GC count: 7. Total JVM GC time: 4.279 sec. Heavy hitter instructions (name, time, count): -- 1) ParFor-DPESP 108.240 sec 1 -- 2) uacmax 10.992 sec 1 -- 3) write 0.115 sec 1 -- 4) > 0.087 sec 1 -- 5) rand 0.005 sec 1 -- 6) * 0.002 sec 1 -- 7) rmvar 0.000 sec 8 -- 8) createvar 0.000 sec 6 -- 9) uamax 0.000 sec 1 -- 10) assignvar 0.000 sec 5 {code} > Improve parfor exec type selection (w/ potential data partitioning) > ------------------------------------------------------------------- > > Key: SYSTEMML-1336 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1336 > Project: SystemML > Issue Type: Sub-task > Components: Compiler > Reporter: Matthias Boehm > Fix For: SystemML 1.0 > > > This task aims to address suboptimal parfor optimizer choices for > partitionable scenarios with large driver memory. Currently, we only apply > partitioning, if the right indexing operation does not fit in memory of the > driver or remote tasks. The execution type selection is then unaware of > potential partitioning, and does not revert this decision - this is > problematic, because the large input likely exceeds the memory budget of > remote tasks, ultimately causing the optimizer to fall back to a local parfor > with very small degree of parallelism k. > On our perftest 8GB Univariate stats scenario (with 20GB driver, i.e., 14GB > memory budget), this lead to a local parfor with k=1 and thus, unnecessarily > high execution time. > {code} > Total elapsed time: 781.233 sec. > Total compilation time: 2.059 sec. > Total execution time: 779.175 sec. > Number of compiled Spark inst: 0. > Number of executed Spark inst: 0. > Cache hits (Mem, WB, FS, HDFS): 27904/0/0/2. > Cache writes (WB, FS, HDFS): 3134/0/1. > Cache times (ACQr/m, RLS, EXP): 9.200/0.022/0.301/0.300 sec. > HOP DAGs recompiled (PRED, SB): 0/100. > HOP DAGs recompile time: 0.238 sec. > Spark ctx create time (lazy): 0.000 sec. > Spark trans counts (par,bc,col):0/0/0. > Spark trans times (par,bc,col): 0.000/0.000/0.000 secs. > ParFor loops optimized: 1. > ParFor optimize time: 1.985 sec. > ParFor initialize time: 0.007 sec. > ParFor result merge time: 0.003 sec. > ParFor total update in-place: 0/0/13900 > Total JIT compile time: 13.542 sec. > Total JVM GC count: 29. > Total JVM GC time: 3.49 sec. > Heavy hitter instructions (name, time, count): > -- 1) cm 479.000 sec 2700 > -- 2) qsort 228.928 sec 900 > -- 3) qpick 20.598 sec 1800 > -- 4) rangeReIndex 16.051 sec 2999 > -- 5) uamean 12.867 sec 900 > -- 6) uacmax 9.870 sec 1 > -- 7) ctable 3.158 sec 100 > -- 8) uamin 2.589 sec 1000 > -- 9) uamax 2.560 sec 1101 > -- 10) write 0.300 sec 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)