[ 
https://issues.apache.org/jira/browse/SYSTEMML-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glenn Weidner updated SYSTEMML-455:
-----------------------------------
    Fix Version/s:     (was: SystemML 1.0)
                   SystemML 0.14

> OOM CP transpose in Spark hybrid mode 
> --------------------------------------
>
>                 Key: SYSTEMML-455
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-455
>             Project: SystemML
>          Issue Type: Bug
>          Components: Compiler
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 0.14
>
>
> The following data generation script failed with OOM in hybrid_spark 
> execution mode (config: 20GB driver memory), whereas the same script runs 
> fine with the same memory budget in hybrid_mr execution mode.
> {code}
> n = 30000;
> B = Rand (rows = n, cols = n, min = -1, max = 1, pdf = "uniform", seed = 
> 1234);
> v = exp (Rand (rows = n, cols = 1, min = -3, max = 3, pdf = "uniform", seed = 
> 5678));
> A = t(B) %*% (B * v);
> write(A, "./tmp/A", format="binary");
> {code}
> The resulting hop explain output is as follows:
> {code}
> # Memory Budget local/remote = 13739MB/184320MB/8602MB
> # Degree of Parallelism (vcores) local/remote = 16/120
> PROGRAM
> --MAIN PROGRAM
> ----GENERIC (lines 4-12) [recompile=true]
> ------(10) dg(rand) [30000,30000,1000,1000,900000000] [0,0,6866 -> 6866MB], CP
> ------(21) r(t) (10) [30000,30000,1000,1000,900000000] [6866,0,6866 -> 
> 13733MB], CP
> ------(19) dg(rand) [30000,1,1000,1000,30000] [0,0,0 -> 0MB], CP
> ------(20) u(exp) (19) [30000,1,1000,1000,-1] [0,0,0 -> 0MB], CP
> ------(22) b(*) (10,20) [30000,30000,1000,1000,-1] [6867,0,6866 -> 13733MB], 
> CP
> ------(23) ba(+*) (21,22) [30000,30000,1000,1000,-1] [13733,6866,6866 -> 
> 27466MB], SPARK
> ------(28) PWrite A (23) [30000,30000,1000,1000,-1] [6866,0,0 -> 6866MB], CP
> {code}
> The scripts fails at CP transpose with
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:414)
>         at 
> org.apache.sysml.runtime.matrix.data.LibMatrixReorg.transposeDenseToDense(LibMatrixReorg.java:752)
>         at 
> org.apache.sysml.runtime.matrix.data.LibMatrixReorg.transpose(LibMatrixReorg.java:136)
>         at 
> org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reorg(LibMatrixReorg.java:105)
>         at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.reorgOperations(MatrixBlock.java:3458)
>         at 
> org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:129)
> {code}
> It's noteworthy that the failing cp instructions requires 13733MB at a memory 
> budget of 13739MB. The current guess is that Spark itself occupies 
> substantial memory overhead which eventually leads to the OOM - we should 
> adjust our memory budget in Spark execution modes to account for this 
> overhead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to