[ 
https://issues.apache.org/jira/browse/SPARK-16440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367615#comment-15367615
 ] 

Anthony Truchet commented on SPARK-16440:
-----------------------------------------

Hello Spark developers,

I'm preparing a patch for this issue. This will be my first contribution to 
Spark. I'll strive to follow the contribution guidelines, but please do not 
hesitate to tell me how to do it better if required :-)



> Undeleted broadcast variables in Word2Vec causing OoM for long runs 
> --------------------------------------------------------------------
>
>                 Key: SPARK-16440
>                 URL: https://issues.apache.org/jira/browse/SPARK-16440
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0
>            Reporter: Anthony Truchet
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Three broadcast variables created at the beginning of {{Word2Vec.fit()}} are 
> never deleted nor unpersisted. This seems to cause excessive memory 
> consumption on the driver for a job running hundreds of successive training.
> They are 
> {code}
>     val expTable = sc.broadcast(createExpTable())
>     val bcVocab = sc.broadcast(vocab)
>     val bcVocabHash = sc.broadcast(vocabHash)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to