Thanks Vinod.

Yes, I understand that Mesos assumes it's the only process managing
resources, makes sense.
Looking at the code and testing shows the agent reports as available
memory the total memory of the host, minus 1GB (or half the total
memory if the total memory is below 2GB)
(https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L152).
So basically it means that if assumes that the OS doesn't use more
than 1GB. I guess if it's not the case one can just specify the memory
manually to the agent, so that's fine.

Actually the reason I was wondering about this is because we recently
had a problem where containers couldn't be destroyed because of tasks
stuck in uninterruptible (D) state, which caused the memory to be
basically leaked, i.e. the agent was advertising the memory free while
it was still being used by the stuck processes. We ran into a similar
issue with GPUs - it's a known issue
https://issues.apache.org/jira/browse/MESOS-8038 - I posted an
analysis and potential fix, it'd be great if someone could have a look
:).

Cheers,

Charles

Le jeu. 30 avr. 2020 à 15:36, Vinod Kone <vinodk...@apache.org> a écrit :
>
> Mesos assumes that it is the only process managing resources of a box (cpu,
> mem, disk). So if you have out of band processes using up resources it
> won't be reflected in the resource offers and the box can be overcommitted.
> There is no runtime periodic check of available resources, it's only
> calculated once at startup.
>
> Resource detection logic is here:
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L65
>
> On Thu, Apr 30, 2020 at 8:17 AM Charles-François Natali <cf.nat...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Could someone point me to some code/documentation explaining how the
> > agent available memory is computed, and when it is refreshed?
> >
> > For example, if I have an agent started, with some outstanding offers,
> > and I then start a process - not as a task managed by Mesos, but as an
> > external process which just allocates a lot of memory - and touches
> > it, not just committed - I can see the machine available memory go
> > down (as reported by free, and MemAvailable in /proc/meminfo), but the
> > agent doesn't rescind any offer, and never seems to actually refresh
> > it - event after starting/stopping tasks.
> >
> > Cheers,
> >
> > Charles
> >

Reply via email to