On Fri, Nov 28, 2014 at 12:49 AM, Michael S. Tsirkin <m...@redhat.com> wrote: > On Thu, Nov 27, 2014 at 06:00:36PM +0400, Andrey Korolyov wrote: >> On Thu, Nov 27, 2014 at 3:28 PM, Michael S. Tsirkin <m...@redhat.com> wrote: >> > On Thu, Nov 27, 2014 at 03:50:11PM +0400, Andrey Korolyov wrote: >> >> On Thu, Nov 27, 2014 at 2:45 PM, Denis V. Lunev <d...@openvz.org> wrote: >> >> > Excessive virtio_balloon inflation can cause invocation of OOM-killer, >> >> > when Linux is under severe memory pressure. Various mechanisms are >> >> > responsible for correct virtio_balloon memory management. Nevertheless >> >> > it >> >> > is often the case that these control tools does not have enough time to >> >> > react on fast changing memory load. As a result OS runs out of memory >> >> > and >> >> > invokes OOM-killer. The balancing of memory by use of the virtio balloon >> >> > should not cause the termination of processes while there are pages in >> >> > the >> >> > balloon. Now there is no way for virtio balloon driver to free memory at >> >> > the last moment before some process get killed by OOM-killer. >> >> > >> >> > This does not provide a security breach as balloon itself is running >> >> > inside Guest OS and is working in the cooperation with the host. Thus >> >> > some improvements from Guest side should be considered as normal. >> >> > >> >> > To solve the problem, introduce a virtio_balloon callback which is >> >> > expected to be called from the oom notifier call chain in >> >> > out_of_memory() >> >> > function. If virtio balloon could release some memory, it will make the >> >> > system to return and retry the allocation that forced the out of memory >> >> > killer to run. >> >> > >> >> > This behavior should be enabled if and only if appropriate feature bit >> >> > is set on the device. It is off by default. >> >> > >> >> > This functionality was recently merged into vanilla Linux (actually in >> >> > linux-next at the moment) >> >> > >> >> > commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5 >> >> > Author: Raushaniya Maksudova <rmaksud...@parallels.com> >> >> > Date: Mon Nov 10 09:36:29 2014 +1030 >> >> > >> >> > This patch adds respective control bits into QEMU. It introduces >> >> > deflate-on-oom option for baloon device which do the trick. >> >> > >> >> > Signed-off-by: Denis V. Lunev <d...@openvz.org> >> >> > CC: Raushaniya Maksudova <rmaksud...@parallels.com> >> >> > CC: Anthony Liguori <aligu...@amazon.com> >> >> > CC: Michael S. Tsirkin <m...@redhat.com> >> > >> > ... >> > >> >> Had you tried this with a system-wide OOM on a real workload? This >> >> behavior can work perfectly with dedicated memory cgroups, but I`m >> >> afraid it would be unusable when entire system stalls and waits for a >> >> balloon deflation. >> > >> > That's really a question about guest drivers though, isn't it? >> > So you aren't responding to correct patches, and aren't copying >> > the correct people. >> > >> > -- >> > MST >> >> Not entirely, it is a question about host-guest interaction in such a >> case. If we will wait for a balloon deflation while OOM condition >> exists at the 'root' cg controller level, for a certain settings it >> may probably lead to the host unresponsiveness. As for OOM event in a >> dedicated cgroup with strictly defined set of processes inside, it >> should way more safe. In other words, even such kind of guest-host >> interaction can be considered as a potential threat for a host >> security, as return from a try of balloon defiation may take too much >> time and some other host processes can be stuck effectively. I am >> using delayed OOM loop via userspace application, reaching simular >> goals, but it is using dedicated cgroups explicitly. Please correct me >> if I am wrong in my suggestions. > > ATM balloon is cooperative anyway: > If guest deflating balloon leads to host OOM, you > have misconfigured your host, or you have trusted > guests. > > We could change this: unmap pages from guest memory on > inflate, map them back on inflate. > >
// sorry for bad grammar in a previous message, was distracted at a time Yes, exactly, I meant just a regular (probably untrusted) guest in a previous message, which can either behave badly or its driver may not respond timely (for this case I have zero knowledge on how delay increase of the return from OOM handler will affect hypervisor, if no separate control groups are set and memory pressure is high enough, but I do not expect anything good).