Hi Geraard,
isn't this the same issueas this?
https://issues.apache.org/jira/browse/MESOS-1688

On Mon, Jan 26, 2015 at 9:17 PM, Gerard Maas <gerard.m...@gmail.com> wrote:

> Hi,
>
> We are observing with certain regularity that our Spark  jobs, as Mesos
> framework, are hoarding resources and not releasing them, resulting in
> resource starvation to all jobs running on the Mesos cluster.
>
> For example:
> This is a job that has spark.cores.max = 4 and spark.executor.memory="3g"
>
> IDFrameworkHostCPUsMem…5050-16506-1146497FooStreamingdnode-4.hdfs.private713.4
> GB…5050-16506-1146495FooStreaming
> dnode-0.hdfs.private16.4 GB…5050-16506-1146491FooStreaming
> dnode-5.hdfs.private711.9 GB…5050-16506-1146449FooStreaming
> dnode-3.hdfs.private74.9 GB…5050-16506-1146247FooStreaming
> dnode-1.hdfs.private0.55.9 GB…5050-16506-1146226FooStreaming
> dnode-2.hdfs.private37.9 GB…5050-16506-1144069FooStreaming
> dnode-3.hdfs.private18.7 GB…5050-16506-1133091FooStreaming
> dnode-5.hdfs.private11.7 GB…5050-16506-1133090FooStreaming
> dnode-2.hdfs.private55.2 GB…5050-16506-1133089FooStreaming
> dnode-1.hdfs.private6.56.3 GB…5050-16506-1133088FooStreaming
> dnode-4.hdfs.private1251 MB…5050-16506-1133087FooStreaming
> dnode-0.hdfs.private6.46.8 GB
> The only way to release the resources is by manually finding the process
> in the cluster and killing it. The jobs are often streaming but also batch
> jobs show this behavior. We have more streaming jobs than batch, so stats
> are biased.
> Any ideas of what's up here? Hopefully some very bad ugly bug that has
> been fixed already and that will urge us to upgrade our infra?
>
> Mesos 0.20 +  Marathon 0.7.4 + Spark 1.1.0
>
> -kr, Gerard.
>

Reply via email to