[jira] [Updated] (SPARK-18218) Optimize BlockMatrix multiplication, which may cause OOM and low parallelism usage problem in several cases

Burak Yavuz (JIRA) Thu, 26 Jan 2017 20:12:06 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-18218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Burak Yavuz updated SPARK-18218:
--------------------------------
    Assignee: Weichen Xu

> Optimize BlockMatrix multiplication, which may cause OOM and low parallelism 
> usage problem in several cases
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18218
>                 URL: https://issues.apache.org/jira/browse/SPARK-18218
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.1.0
>            Reporter: Weichen Xu
>            Assignee: Weichen Xu
>             Fix For: 2.2.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> After I take a deep look into `BlockMatrix.multiply` implementation, I found 
> that current implementation may cause some problem in special cases.
> Now let me use an extreme case to represent it:
> Suppose we have two blockMatrix A and B
> A has 10000 blocks, numRowBlocks = 1,  numColBlocks = 10000
> B also has 10000 blocks, numRowBlocks = 10000,  numColBlocks = 1
> Now if we call A.mulitiply(B), no matter how A and B is partitioned,
> the resultPartitioner will always contains only one partition,
> this muliplication implementation will shuffle 10000 * 10000 blocks into one 
> reducer, this will cause the parallism became 1, 
> what's worse, because `RDD.cogroup` will load the total group element into 
> memory, now at reducer-side, 10000 * 10000 blocks will be loaded into memory, 
> because they are all shuffled into the same group. It will easily cause 
> executor OOM.
> The above case is a little extreme, but other case, such as M*N dimensions 
> matrix A multiply N*P dimensions matrix B, when N is much larger than M and 
> P, we met the similar problem.
> The multiplication implementation do not handle the task partition properly, 
> it will cause:
> 1. when the middle dimension N is too large, it will cause reducer OOM.
> 2. even if OOM do not occur, it will still cause parallism too low.
> 3. when N is much large than M and P, and matrix A and B have many 
> partitions, it will cause too many partition on M and P dimension, it will 
> cause much larger shuffled data size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18218) Optimize BlockMatrix multiplication, which may cause OOM and low parallelism usage problem in several cases

Reply via email to