[ 
https://issues.apache.org/jira/browse/SYSTEMML-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881916#comment-15881916
 ] 

Matthias Boehm edited comment on SYSTEMML-1336 at 2/24/17 4:32 AM:
-------------------------------------------------------------------

With an improved parfor optimizer that now also considers the what-if scenario 
of conditional partitioning the runtime improved substantially.

{code}
Total elapsed time:             110.139 sec.
Total compilation time:         2.600 sec.
Total execution time:           107.539 sec.
Number of compiled Spark inst:  1.
Number of executed Spark inst:  1.
Cache hits (Mem, WB, FS, HDFS): 4/0/0/122.
Cache writes (WB, FS, HDFS):    3/0/2.
Cache times (ACQr/m, RLS, EXP): 10.512/0.001/0.003/0.357 sec.
HOP DAGs recompiled (PRED, SB): 0/0.
HOP DAGs recompile time:        0.000 sec.
Spark ctx create time (lazy):   33.529 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
ParFor loops optimized:         1.
ParFor optimize time:           2.070 sec.
ParFor initialize time:         0.000 sec.
ParFor result merge time:       0.863 sec.
ParFor total update in-place:   0/0/0
Total JIT compile time:         30.467 sec.
Total JVM GC count:             7.
Total JVM GC time:              4.105 sec.
Heavy hitter instructions (name, time, count):
-- 1)   ParFor-DPESP    93.557 sec      1
-- 2)   uacmax  10.189 sec      1
-- 3)   write   0.119 sec       1
-- 4)   >       0.073 sec       1
-- 5)   rand    0.005 sec       1
-- 6)   *       0.002 sec       1
-- 7)   rmvar   0.000 sec       8
-- 8)   createvar       0.000 sec       6
-- 9)   uamax   0.000 sec       1
-- 10)  assignvar       0.000 sec       5
{code}


was (Author: mboehm7):
With an improved parfor optimizer that now also considers the what-if scenario 
of conditional partitioning the runtime improved substantially.

{code}
Total elapsed time:             125.192 sec.
Total compilation time:         2.085 sec.
Total execution time:           123.106 sec.
Number of compiled Spark inst:  1.
Number of executed Spark inst:  1.
Cache hits (Mem, WB, FS, HDFS): 4/0/0/122.
Cache writes (WB, FS, HDFS):    3/0/2.
Cache times (ACQr/m, RLS, EXP): 11.163/0.001/0.004/0.369 sec.
HOP DAGs recompiled (PRED, SB): 0/0.
HOP DAGs recompile time:        0.000 sec.
Spark ctx create time (lazy):   31.174 sec.
Spark trans counts (par,bc,col):0/0/0.
Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
ParFor loops optimized:         1.
ParFor optimize time:           2.065 sec.
ParFor initialize time:         0.000 sec.
ParFor result merge time:       0.844 sec.
ParFor total update in-place:   0/0/0
Total JIT compile time:         23.855 sec.
Total JVM GC count:             7.
Total JVM GC time:              4.279 sec.
Heavy hitter instructions (name, time, count):
-- 1)   ParFor-DPESP    108.240 sec     1
-- 2)   uacmax  10.992 sec      1
-- 3)   write   0.115 sec       1
-- 4)   >       0.087 sec       1
-- 5)   rand    0.005 sec       1
-- 6)   *       0.002 sec       1
-- 7)   rmvar   0.000 sec       8
-- 8)   createvar       0.000 sec       6
-- 9)   uamax   0.000 sec       1
-- 10)  assignvar       0.000 sec       5
{code}

> Improve parfor exec type selection (w/ potential data partitioning)
> -------------------------------------------------------------------
>
>                 Key: SYSTEMML-1336
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1336
>             Project: SystemML
>          Issue Type: Sub-task
>          Components: Compiler
>            Reporter: Matthias Boehm
>             Fix For: SystemML 1.0
>
>
> This task aims to address suboptimal parfor optimizer choices for 
> partitionable scenarios with large driver memory. Currently, we only apply 
> partitioning, if the right indexing operation does not fit in memory of the 
> driver or remote tasks. The execution type selection is then unaware of 
> potential partitioning, and does not revert this decision - this is 
> problematic, because the large input likely exceeds the memory budget of 
> remote tasks, ultimately causing the optimizer to fall back to a local parfor 
> with very small degree of parallelism k.
> On our perftest 8GB Univariate stats scenario (with 20GB driver, i.e., 14GB 
> memory budget), this lead to a local parfor with k=1 and thus, unnecessarily 
> high execution time.
> {code}
> Total elapsed time:           781.233 sec.
> Total compilation time:               2.059 sec.
> Total execution time:         779.175 sec.
> Number of compiled Spark inst:        0.
> Number of executed Spark inst:        0.
> Cache hits (Mem, WB, FS, HDFS):       27904/0/0/2.
> Cache writes (WB, FS, HDFS):  3134/0/1.
> Cache times (ACQr/m, RLS, EXP):       9.200/0.022/0.301/0.300 sec.
> HOP DAGs recompiled (PRED, SB):       0/100.
> HOP DAGs recompile time:      0.238 sec.
> Spark ctx create time (lazy): 0.000 sec.
> Spark trans counts (par,bc,col):0/0/0.
> Spark trans times (par,bc,col):       0.000/0.000/0.000 secs.
> ParFor loops optimized:               1.
> ParFor optimize time:         1.985 sec.
> ParFor initialize time:               0.007 sec.
> ParFor result merge time:     0.003 sec.
> ParFor total update in-place: 0/0/13900
> Total JIT compile time:               13.542 sec.
> Total JVM GC count:           29.
> Total JVM GC time:            3.49 sec.
> Heavy hitter instructions (name, time, count):
> -- 1)         cm      479.000 sec     2700
> -- 2)         qsort   228.928 sec     900
> -- 3)         qpick   20.598 sec      1800
> -- 4)         rangeReIndex    16.051 sec      2999
> -- 5)         uamean  12.867 sec      900
> -- 6)         uacmax  9.870 sec       1
> -- 7)         ctable  3.158 sec       100
> -- 8)         uamin   2.589 sec       1000
> -- 9)         uamax   2.560 sec       1101
> -- 10)        write   0.300 sec       1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to