[ 
https://issues.apache.org/jira/browse/MAHOUT-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pat Ferrel updated MAHOUT-1707:
-------------------------------
    Description: 
java.lang.OutOfMemoryError: Java heap space

The code has an unnecessary .collect(), forcing all interaction data into 
memory of the client/driver. Increasing the executor memory will not help with 
this.

remove this line and rebuild Mahout.
https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157

The errant line reads:

    interactions.collect()

This forces the user action data into memory, a bad thing for memory 
consumption. Removing it should allow for better Spark memory management.

  was:
java.lang.OutOfMemoryError: Java heap space

The code has an unnecessary .collect(), forcing all interaction data into 
memory of the client/driver. Increasing the executor memory will not help with 
this.

remove this line and rebuild Mahout.
https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157

The errant line reads:

    interactions.collect()

This forces the user action data into memory, a bad thing for memory 
consumption.


> Spark-itemsimilarity uses too much memory
> -----------------------------------------
>
>                 Key: MAHOUT-1707
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1707
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering, cooccurrence
>    Affects Versions: 0.10.0
>         Environment: Spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>             Fix For: 0.10.1
>
>
> java.lang.OutOfMemoryError: Java heap space
> The code has an unnecessary .collect(), forcing all interaction data into 
> memory of the client/driver. Increasing the executor memory will not help 
> with this.
> remove this line and rebuild Mahout.
> https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157
> The errant line reads:
>     interactions.collect()
> This forces the user action data into memory, a bad thing for memory 
> consumption. Removing it should allow for better Spark memory management.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to