Re: Review Request 25184: Delete framework data in TaskStatus to avoid OOM

Chengwei Yang Thu, 09 Oct 2014 07:01:25 -0700


> On Sept. 27, 2014, 12:47 a.m., Timothy Chen wrote:
> > src/master/master.cpp, line 3181
> > <https://reviews.apache.org/r/25184/diff/2/?file=681985#file681985line3181>
> >
> >     Period in the end of the comment.

I'm not sure if I understand you correctly, if not please correct me. Did you 
mean that it's better if some comments about how often, how long mesos-master 
will be killed by OOM killer?

If so, the answer as we observed is that when we running spark jobs, every task 
stored about 17MB data in TaskStatus and a small spark job consists of several 
thousands of tasks, so it can not finish the job if the leader mesos-master 
running on a machine with memory small than 10GB.

I'll give a common example OOM scenario in comment.

- Chengwei

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25184/#review54695
-----------------------------------------------------------

On Oct. 9, 2014, 10 p.m., Chengwei Yang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25184/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2014, 10 p.m.)
> 
> 
> Review request for mesos, Adam B and Timothy St. Clair.
> 
> 
> Bugs: MESOS-1746
>     https://issues.apache.org/jira/browse/MESOS-1746
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> There was a bug found that Spark use TaskStatus.data to transfer computed
> result and mesos-master RES memory keeps increasing fast and finally will be
> killed by OOM killer.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp cb46cec0674b3aa031450c5b4f48f4f8bb92767d 
> 
> Diff: https://reviews.apache.org/r/25184/diff/
> 
> 
> Testing
> -------
> 
> tested with spark
> 
> 
> Thanks,
> 
> Chengwei Yang
> 
>

Re: Review Request 25184: Delete framework data in TaskStatus to avoid OOM

Reply via email to