[ 
https://issues.apache.org/jira/browse/SPARK-21118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051641#comment-16051641
 ] 

Lorenz Bühmann commented on SPARK-21118:
----------------------------------------

The first point would be to use a subject title without typos. I mean "handred" 
and "mitrx multply"? Come on - how can others search for similar problems?!

Secondly, you're using `collect()` for both matrices. That's more or less 
breaking the idea of Spark, since you're collecting everything to the driver in 
memory. Of course, this will mean for large data to an OOM. You should read 
more about the principles of Spark I guess.

> OOM with 2 handred million vertex when mitrx multply
> ----------------------------------------------------
>
>                 Key: SPARK-21118
>                 URL: https://issues.apache.org/jira/browse/SPARK-21118
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 2.1.0
>         Environment: on yarn cluster,19 node.30GB per node
>            Reporter: tao
>
> i have 2 matrix each one is 200milions*200milions.
> i want to multiply them ,but run out with oom .
> finally i find the oom appear at blockmatrix.simulateMultiply . there is a 
> collect action at this method. 
>  the collect will return all dataset that is too large to driver so the 
> driver will go to oom.
> class BlockMatrix @Since("1.3.0") (
> private[distributed] def simulateMultiply(
>       other: BlockMatrix,
>       partitioner: GridPartitioner): (BlockDestinations, BlockDestinations) = 
> {
>     val leftMatrix = {color:red}blockInfo.keys.collect() {color}// blockInfo 
> should already be cached
>     val rightMatrix = other.blocks.keys.collect()
> ......



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to