Thanks Vinod. Yes, I understand that Mesos assumes it's the only process managing resources, makes sense. Looking at the code and testing shows the agent reports as available memory the total memory of the host, minus 1GB (or half the total memory if the total memory is below 2GB) (https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L152). So basically it means that if assumes that the OS doesn't use more than 1GB. I guess if it's not the case one can just specify the memory manually to the agent, so that's fine.
Actually the reason I was wondering about this is because we recently had a problem where containers couldn't be destroyed because of tasks stuck in uninterruptible (D) state, which caused the memory to be basically leaked, i.e. the agent was advertising the memory free while it was still being used by the stuck processes. We ran into a similar issue with GPUs - it's a known issue https://issues.apache.org/jira/browse/MESOS-8038 - I posted an analysis and potential fix, it'd be great if someone could have a look :). Cheers, Charles Le jeu. 30 avr. 2020 à 15:36, Vinod Kone <[email protected]> a écrit : > > Mesos assumes that it is the only process managing resources of a box (cpu, > mem, disk). So if you have out of band processes using up resources it > won't be reflected in the resource offers and the box can be overcommitted. > There is no runtime periodic check of available resources, it's only > calculated once at startup. > > Resource detection logic is here: > https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L65 > > On Thu, Apr 30, 2020 at 8:17 AM Charles-François Natali <[email protected]> > wrote: > > > Hi, > > > > Could someone point me to some code/documentation explaining how the > > agent available memory is computed, and when it is refreshed? > > > > For example, if I have an agent started, with some outstanding offers, > > and I then start a process - not as a task managed by Mesos, but as an > > external process which just allocates a lot of memory - and touches > > it, not just committed - I can see the machine available memory go > > down (as reported by free, and MemAvailable in /proc/meminfo), but the > > agent doesn't rescind any offer, and never seems to actually refresh > > it - event after starting/stopping tasks. > > > > Cheers, > > > > Charles > >
