[jira] [Commented] (SPARK-22805) Use aliases for StorageLevel in event logs

2017-12-18 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294727#comment-16294727
 ] 

Anthony Truchet commented on SPARK-22805:
-

As I understand it does not break any standard use and save a lot of volume for 
2.1 and 2.2. So worth keeping the ticket IMHO.

> Use aliases for StorageLevel in event logs
> --
>
> Key: SPARK-22805
> URL: https://issues.apache.org/jira/browse/SPARK-22805
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.2, 2.2.1
>Reporter: Sergei Lebedev
>Priority: Minor
>
> Fact 1: {{StorageLevel}} has a private constructor, therefore a list of 
> predefined levels is not extendable (by the users).
> Fact 2: The format of event logs uses redundant representation for storage 
> levels 
> {code}
> >>> len('{"Use Disk": true, "Use Memory": false, "Deserialized": true, 
> >>> "Replication": 1}')
> 79
> >>> len('DISK_ONLY')
> 9
> {code}
> Fact 3: This leads to excessive log sizes for workloads with lots of 
> partitions, because every partition would have the storage level field which 
> is 60-70 bytes more than it should be.
> Suggested quick win: use the names of the predefined levels to identify them 
> in the event log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-11-09 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246804#comment-16246804
 ] 

Anthony Truchet commented on SPARK-18838:
-

Agreed but at the same time this is "almost" a bug when you consider use of 
Spark at really large scale...
I tried a naive cherry pick and obviously there are a huge lot of non trivial 
conflicts.
Do you know how to get an overview of the change (and of what happened in the 
mean time since 2.2) to guide this conflict resolution?
Or maybe you already have some insight about the best way to go for it?  

> High latency of event processing for large jobs
> ---
>
> Key: SPARK-18838
> URL: https://issues.apache.org/jira/browse/SPARK-18838
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Sital Kedia
>Assignee: Marcelo Vanzin
> Fix For: 2.3.0
>
> Attachments: SparkListernerComputeTime.xlsx, perfResults.pdf
>
>
> Currently we are observing the issue of very high event processing delay in 
> driver's `ListenerBus` for large jobs with many tasks. Many critical 
> component of the scheduler like `ExecutorAllocationManager`, 
> `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might 
> hurt the job performance significantly or even fail the job.  For example, a 
> significant delay in receiving the `SparkListenerTaskStart` might cause 
> `ExecutorAllocationManager` manager to mistakenly remove an executor which is 
> not idle.  
> The problem is that the event processor in `ListenerBus` is a single thread 
> which loops through all the Listeners for each event and processes each event 
> synchronously 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94.
>  This single threaded processor often becomes the bottleneck for large jobs.  
> Also, if one of the Listener is very slow, all the listeners will pay the 
> price of delay incurred by the slow listener. In addition to that a slow 
> listener can cause events to be dropped from the event queue which might be 
> fatal to the job.
> To solve the above problems, we propose to get rid of the event queue and the 
> single threaded event processor. Instead each listener will have its own 
> dedicate single threaded executor service . When ever an event is posted, it 
> will be submitted to executor service of all the listeners. The Single 
> threaded executor service will guarantee in order processing of the events 
> per listener.  The queue used for the executor service will be bounded to 
> guarantee we do not grow the memory indefinitely. The downside of this 
> approach is separate event queue per listener will increase the driver memory 
> footprint. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18838) High latency of event processing for large jobs

2017-11-01 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234891#comment-16234891
 ] 

Anthony Truchet edited comment on SPARK-18838 at 11/1/17 10:42 PM:
---

I'm interested to work on a backport for Spark 2.2 we are using at Criteo. Any 
interest for an official backport or other thoughts on that ?


was (Author: anthony-truchet):
I'm interested to work on a backport for Spark 2.2 we are using at Criteo. Any 
thoughts on that ?

> High latency of event processing for large jobs
> ---
>
> Key: SPARK-18838
> URL: https://issues.apache.org/jira/browse/SPARK-18838
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Sital Kedia
>Assignee: Marcelo Vanzin
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: SparkListernerComputeTime.xlsx, perfResults.pdf
>
>
> Currently we are observing the issue of very high event processing delay in 
> driver's `ListenerBus` for large jobs with many tasks. Many critical 
> component of the scheduler like `ExecutorAllocationManager`, 
> `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might 
> hurt the job performance significantly or even fail the job.  For example, a 
> significant delay in receiving the `SparkListenerTaskStart` might cause 
> `ExecutorAllocationManager` manager to mistakenly remove an executor which is 
> not idle.  
> The problem is that the event processor in `ListenerBus` is a single thread 
> which loops through all the Listeners for each event and processes each event 
> synchronously 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94.
>  This single threaded processor often becomes the bottleneck for large jobs.  
> Also, if one of the Listener is very slow, all the listeners will pay the 
> price of delay incurred by the slow listener. In addition to that a slow 
> listener can cause events to be dropped from the event queue which might be 
> fatal to the job.
> To solve the above problems, we propose to get rid of the event queue and the 
> single threaded event processor. Instead each listener will have its own 
> dedicate single threaded executor service . When ever an event is posted, it 
> will be submitted to executor service of all the listeners. The Single 
> threaded executor service will guarantee in order processing of the events 
> per listener.  The queue used for the executor service will be bounded to 
> guarantee we do not grow the memory indefinitely. The downside of this 
> approach is separate event queue per listener will increase the driver memory 
> footprint. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-11-01 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234891#comment-16234891
 ] 

Anthony Truchet commented on SPARK-18838:
-

I'm interested to work on a backport for Spark 2.2 we are using at Criteo. Any 
thoughts on that ?

> High latency of event processing for large jobs
> ---
>
> Key: SPARK-18838
> URL: https://issues.apache.org/jira/browse/SPARK-18838
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Sital Kedia
>Assignee: Marcelo Vanzin
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: SparkListernerComputeTime.xlsx, perfResults.pdf
>
>
> Currently we are observing the issue of very high event processing delay in 
> driver's `ListenerBus` for large jobs with many tasks. Many critical 
> component of the scheduler like `ExecutorAllocationManager`, 
> `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might 
> hurt the job performance significantly or even fail the job.  For example, a 
> significant delay in receiving the `SparkListenerTaskStart` might cause 
> `ExecutorAllocationManager` manager to mistakenly remove an executor which is 
> not idle.  
> The problem is that the event processor in `ListenerBus` is a single thread 
> which loops through all the Listeners for each event and processes each event 
> synchronously 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94.
>  This single threaded processor often becomes the bottleneck for large jobs.  
> Also, if one of the Listener is very slow, all the listeners will pay the 
> price of delay incurred by the slow listener. In addition to that a slow 
> listener can cause events to be dropped from the event queue which might be 
> fatal to the job.
> To solve the above problems, we propose to get rid of the event queue and the 
> single threaded event processor. Instead each listener will have its own 
> dedicate single threaded executor service . When ever an event is posted, it 
> will be submitted to executor service of all the listeners. The Single 
> threaded executor service will guarantee in order processing of the events 
> per listener.  The queue used for the executor service will be bounded to 
> guarantee we do not grow the memory indefinitely. The downside of this 
> approach is separate event queue per listener will increase the driver memory 
> footprint. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-07-05 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075600#comment-16075600
 ] 

Anthony Truchet commented on SPARK-18838:
-

Hello,

We (Criteo Predictive Search team with http://labs.criteo.com/about-us/) are 
critically running into this issue and I'm willing to work on it ASAP.
Is there any update not listed here ? Is the plan described in the ticket kind 
of agreed upon ?
I'll start diving in the code ; in the mean-time all design / return on 
experience / ... inputs are welcome :-)





> High latency of event processing for large jobs
> ---
>
> Key: SPARK-18838
> URL: https://issues.apache.org/jira/browse/SPARK-18838
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Sital Kedia
> Attachments: perfResults.pdf, SparkListernerComputeTime.xlsx
>
>
> Currently we are observing the issue of very high event processing delay in 
> driver's `ListenerBus` for large jobs with many tasks. Many critical 
> component of the scheduler like `ExecutorAllocationManager`, 
> `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might 
> hurt the job performance significantly or even fail the job.  For example, a 
> significant delay in receiving the `SparkListenerTaskStart` might cause 
> `ExecutorAllocationManager` manager to mistakenly remove an executor which is 
> not idle.  
> The problem is that the event processor in `ListenerBus` is a single thread 
> which loops through all the Listeners for each event and processes each event 
> synchronously 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94.
>  This single threaded processor often becomes the bottleneck for large jobs.  
> Also, if one of the Listener is very slow, all the listeners will pay the 
> price of delay incurred by the slow listener. In addition to that a slow 
> listener can cause events to be dropped from the event queue which might be 
> fatal to the job.
> To solve the above problems, we propose to get rid of the event queue and the 
> single threaded event processor. Instead each listener will have its own 
> dedicate single threaded executor service . When ever an event is posted, it 
> will be submitted to executor service of all the listeners. The Single 
> threaded executor service will guarantee in order processing of the events 
> per listener.  The queue used for the executor service will be bounded to 
> guarantee we do not grow the memory indefinitely. The downside of this 
> approach is separate event queue per listener will increase the driver memory 
> footprint. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18612) Leaked broadcasted variable Mllib

2016-11-28 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702446#comment-15702446
 ] 

Anthony Truchet commented on SPARK-18612:
-

See related:
* https://issues.apache.org/jira/browse/SPARK-16440
* https://issues.apache.org/jira/browse/SPARK-16696


> Leaked broadcasted variable Mllib
> -
>
> Key: SPARK-18612
> URL: https://issues.apache.org/jira/browse/SPARK-18612
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Anthony Truchet
>Priority: Trivial
>
> Fix broadcasted variable leaks in MLlib.
> For example, `bcW` in the L-BFGSS CostFun.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18612) Leaked broadcasted variable Mllib

2016-11-28 Thread Anthony Truchet (JIRA)
Anthony Truchet created SPARK-18612:
---

 Summary: Leaked broadcasted variable Mllib
 Key: SPARK-18612
 URL: https://issues.apache.org/jira/browse/SPARK-18612
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 2.0.2, 1.6.3
Reporter: Anthony Truchet


Fix broadcasted variable leaks in MLlib.

For example, `bcW` in the L-BFGSS CostFun.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18471) In treeAggregate, generate (big) zeros instead of sending them.

2016-11-16 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670795#comment-15670795
 ] 

Anthony Truchet commented on SPARK-18471:
-

Sure. But in our use case we do want to aggregate on a DenseVector.

Here is some context, we learn a logistic regression on very big hash space and 
volume of data. For each piece of data the features and the gradient will be 
sparse, but the aggregate will get denser and denser up to the point where it 
is almost fully dense (as observed on our current in house implementation).

So we do want to aggregate on DenseVector, but we do not need nor want to send 
100s of MB of 0 as part of the closure.

> In treeAggregate, generate (big) zeros instead of sending them.
> ---
>
> Key: SPARK-18471
> URL: https://issues.apache.org/jira/browse/SPARK-18471
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core
>Reporter: Anthony Truchet
>Priority: Minor
>
> When using optimization routine like LBFGS, treeAggregate curently sends the 
> zero vector as part of the closure. This zero can be huge (e.g. ML vectors 
> with millions of zeros) but can be easily generated.
> Several option are possible (upcoming patches to come soon for some of them).
> On is to provide a treeAggregateWithZeroGenerator method (either in core on 
> in MLlib) which wrap treeAggregate in an option and generate the zero if None.
> Another one is to rewrite treeAggregate to wrap an underlying implementation 
> which use a zero generator directly.
> There might be other better alternative we have not spotted...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18471) In treeAggregate, generate (big) zeros instead of sending them.

2016-11-16 Thread Anthony Truchet (JIRA)
Anthony Truchet created SPARK-18471:
---

 Summary: In treeAggregate, generate (big) zeros instead of sending 
them.
 Key: SPARK-18471
 URL: https://issues.apache.org/jira/browse/SPARK-18471
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, Spark Core
Reporter: Anthony Truchet
Priority: Minor


When using optimization routine like LBFGS, treeAggregate curently sends the 
zero vector as part of the closure. This zero can be huge (e.g. ML vectors with 
millions of zeros) but can be easily generated.

Several option are possible (upcoming patches to come soon for some of them).

On is to provide a treeAggregateWithZeroGenerator method (either in core on in 
MLlib) which wrap treeAggregate in an option and generate the zero if None.

Another one is to rewrite treeAggregate to wrap an underlying implementation 
which use a zero generator directly.

There might be other better alternative we have not spotted...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16440) Undeleted broadcast variables in Word2Vec causing OoM for long runs

2016-07-21 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387342#comment-15387342
 ] 

Anthony Truchet commented on SPARK-16440:
-

Regarding the try finally: we are computing numerous learning from within a 
same spark context and some with vocabulary so large that they fail (yes we do 
try to filter out too big ones, but too big is difficult to define).

So we are in a context where we do care about resource cleaning in case of 
error in order to enable thousands of successive learnings some of with 
expected to fail.

As for core readability we can try to refactor the function to reduce the 
nesting or find a "nice" scala solution: I'll propose a patch and I'll welcome 
any feedback on it.

> Undeleted broadcast variables in Word2Vec causing OoM for long runs 
> 
>
> Key: SPARK-16440
> URL: https://issues.apache.org/jira/browse/SPARK-16440
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0
>Reporter: Anthony Truchet
>Assignee: Anthony Truchet
> Fix For: 1.6.3, 2.0.1
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Three broadcast variables created at the beginning of {{Word2Vec.fit()}} are 
> never deleted nor unpersisted. This seems to cause excessive memory 
> consumption on the driver for a job running hundreds of successive training.
> They are 
> {code}
> val expTable = sc.broadcast(createExpTable())
> val bcVocab = sc.broadcast(vocab)
> val bcVocabHash = sc.broadcast(vocabHash)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16440) Undeleted broadcast variables in Word2Vec causing OoM for long runs

2016-07-19 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384118#comment-15384118
 ] 

Anthony Truchet commented on SPARK-16440:
-

I will, as well as putting this is a try finally to ensure proper deletion even 
in case of errors.

> Undeleted broadcast variables in Word2Vec causing OoM for long runs 
> 
>
> Key: SPARK-16440
> URL: https://issues.apache.org/jira/browse/SPARK-16440
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0
>Reporter: Anthony Truchet
>Assignee: Sean Owen
> Fix For: 1.6.3, 2.0.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Three broadcast variables created at the beginning of {{Word2Vec.fit()}} are 
> never deleted nor unpersisted. This seems to cause excessive memory 
> consumption on the driver for a job running hundreds of successive training.
> They are 
> {code}
> val expTable = sc.broadcast(createExpTable())
> val bcVocab = sc.broadcast(vocab)
> val bcVocabHash = sc.broadcast(vocabHash)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16440) Undeleted broadcast variables in Word2Vec causing OoM for long runs

2016-07-19 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383967#comment-15383967
 ] 

Anthony Truchet edited comment on SPARK-16440 at 7/19/16 11:21 AM:
---

Thanks for such a quick fix [~srowen] : I was off-line for the past week that's 
why I couldn't submit the patch quickly enough.

I would have {{destroyed}} the variable instead of {{unpersisting}} them though 
as the issues was memory consumption on the *driver* side: what am I missing 
which made you choose the later over the former ?


was (Author: anthony-truchet):
Thanks for such a quick fix [~srowen] : I was off-line for the past week that's 
why I couldn't submit the patch quickly enough.

I would have {{destroy}}ed the variable instead of {{unpersist}}ing them though 
as the issues was memory consumption on the driver side: what am I missing 
which made you choose the later over the former ?

> Undeleted broadcast variables in Word2Vec causing OoM for long runs 
> 
>
> Key: SPARK-16440
> URL: https://issues.apache.org/jira/browse/SPARK-16440
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0
>Reporter: Anthony Truchet
>Assignee: Sean Owen
> Fix For: 1.6.3, 2.0.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Three broadcast variables created at the beginning of {{Word2Vec.fit()}} are 
> never deleted nor unpersisted. This seems to cause excessive memory 
> consumption on the driver for a job running hundreds of successive training.
> They are 
> {code}
> val expTable = sc.broadcast(createExpTable())
> val bcVocab = sc.broadcast(vocab)
> val bcVocabHash = sc.broadcast(vocabHash)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16440) Undeleted broadcast variables in Word2Vec causing OoM for long runs

2016-07-19 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383967#comment-15383967
 ] 

Anthony Truchet commented on SPARK-16440:
-

Thanks for such a quick fix [~srowen] : I was off-line for the past week that's 
why I couldn't submit the patch quickly enough.

I would have {{destroy}}ed the variable instead of {{unpersist}}ing them though 
as the issues was memory consumption on the driver side: what am I missing 
which made you choose the later over the former ?

> Undeleted broadcast variables in Word2Vec causing OoM for long runs 
> 
>
> Key: SPARK-16440
> URL: https://issues.apache.org/jira/browse/SPARK-16440
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0
>Reporter: Anthony Truchet
>Assignee: Sean Owen
> Fix For: 1.6.3, 2.0.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Three broadcast variables created at the beginning of {{Word2Vec.fit()}} are 
> never deleted nor unpersisted. This seems to cause excessive memory 
> consumption on the driver for a job running hundreds of successive training.
> They are 
> {code}
> val expTable = sc.broadcast(createExpTable())
> val bcVocab = sc.broadcast(vocab)
> val bcVocabHash = sc.broadcast(vocabHash)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16440) Undeleted broadcast variables in Word2Vec causing OoM for long runs

2016-07-08 Thread Anthony Truchet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367615#comment-15367615
 ] 

Anthony Truchet commented on SPARK-16440:
-

Hello Spark developers,

I'm preparing a patch for this issue. This will be my first contribution to 
Spark. I'll strive to follow the contribution guidelines, but please do not 
hesitate to tell me how to do it better if required :-)



> Undeleted broadcast variables in Word2Vec causing OoM for long runs 
> 
>
> Key: SPARK-16440
> URL: https://issues.apache.org/jira/browse/SPARK-16440
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0
>Reporter: Anthony Truchet
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Three broadcast variables created at the beginning of {{Word2Vec.fit()}} are 
> never deleted nor unpersisted. This seems to cause excessive memory 
> consumption on the driver for a job running hundreds of successive training.
> They are 
> {code}
> val expTable = sc.broadcast(createExpTable())
> val bcVocab = sc.broadcast(vocab)
> val bcVocabHash = sc.broadcast(vocabHash)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16440) Undeleted broadcast variables in Word2Vec causing OoM for long runs

2016-07-08 Thread Anthony Truchet (JIRA)
Anthony Truchet created SPARK-16440:
---

 Summary: Undeleted broadcast variables in Word2Vec causing OoM for 
long runs 
 Key: SPARK-16440
 URL: https://issues.apache.org/jira/browse/SPARK-16440
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.6.2, 1.6.1, 1.6.0, 2.0.0
Reporter: Anthony Truchet


Three broadcast variables created at the beginning of {{Word2Vec.fit()}} are 
never deleted nor unpersisted. This seems to cause excessive memory consumption 
on the driver for a job running hundreds of successive training.

They are 
{code}
val expTable = sc.broadcast(createExpTable())
val bcVocab = sc.broadcast(vocab)
val bcVocabHash = sc.broadcast(vocabHash)
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org