> On Sept. 27, 2014, 12:47 a.m., Timothy Chen wrote: > > src/master/master.cpp, line 3181 > > <https://reviews.apache.org/r/25184/diff/2/?file=681985#file681985line3181> > > > > Period in the end of the comment.
I'm not sure if I understand you correctly, if not please correct me. Did you mean that it's better if some comments about how often, how long mesos-master will be killed by OOM killer? If so, the answer as we observed is that when we running spark jobs, every task stored about 17MB data in TaskStatus and a small spark job consists of several thousands of tasks, so it can not finish the job if the leader mesos-master running on a machine with memory small than 10GB. I'll give a common example OOM scenario in comment. - Chengwei ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25184/#review54695 ----------------------------------------------------------- On Oct. 9, 2014, 10 p.m., Chengwei Yang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25184/ > ----------------------------------------------------------- > > (Updated Oct. 9, 2014, 10 p.m.) > > > Review request for mesos, Adam B and Timothy St. Clair. > > > Bugs: MESOS-1746 > https://issues.apache.org/jira/browse/MESOS-1746 > > > Repository: mesos-git > > > Description > ------- > > There was a bug found that Spark use TaskStatus.data to transfer computed > result and mesos-master RES memory keeps increasing fast and finally will be > killed by OOM killer. > > > Diffs > ----- > > src/master/master.cpp cb46cec0674b3aa031450c5b4f48f4f8bb92767d > > Diff: https://reviews.apache.org/r/25184/diff/ > > > Testing > ------- > > tested with spark > > > Thanks, > > Chengwei Yang > >