[ https://issues.apache.org/jira/browse/SPARK-21118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-21118. ------------------------------- Resolution: Invalid [~icesxrun] do not reopen this. This doesn't describe a problem with Spark, but a question about usage. > OOM with 2 handred million vertex when mitrx multply > ---------------------------------------------------- > > Key: SPARK-21118 > URL: https://issues.apache.org/jira/browse/SPARK-21118 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 2.1.0 > Environment: on yarn cluster,19 node.30GB per node > Reporter: tao > > i have 2 matrix each one is 200milions*200milions. > i want to multiply them ,but run out with oom . > finally i find the oom appear at blockmatrix.simulateMultiply . there is a > collect action at this method. > the collect will return all dataset that is too large to driver so the > driver will go to oom. > class BlockMatrix @Since("1.3.0") ( > private[distributed] def simulateMultiply( > other: BlockMatrix, > partitioner: GridPartitioner): (BlockDestinations, BlockDestinations) = > { > val leftMatrix = {color:red}blockInfo.keys.collect() {color}// blockInfo > should already be cached > val rightMatrix = other.blocks.keys.collect() > ...... -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org