Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova
On 03/23/2017 01:01 PM, Jean-Philippe Methot wrote: Hi, Lately, on my production openstack Newton setup, I've ran into a situation that defies my assumptions regarding memory management on Openstack compute nodes and I've been looking for explanations. Basically, we had a VM with a flavor that limited it to 96 GB of ram, which, to be quite honest, we never thought we could ever reach. This is a very important VM where we wanted to avoid running out of memory at all cost. The VM itself generally uses about 12 GB of ram. We were surprised when we noticed yesterday that this VM, which has been running for several months, was using all its 96 GB on the compute host. Despite that, in the guest, the OS was indicating a memory usage of about 12 GB. The only explanation I see to this is that at some point in time, the host had to allocate all the 96GB of ram to the VM process and it never took back the allocated ram. This prevented the creation of more guests on the node as it was showing it didn't have enough memory left. Now, I was under the assumption that memory ballooning was integrated into nova and that the amount of allocated memory to a specific guest would deflate once that guest did not need the memory. After verification, I've found blueprints for it, but I see no trace of any implementation anywhere. I also notice that on most of our compute nodes, the amount of ram used is much lower than the amount of ram allocated to VMs, which I do believe is normal. So basically, my question is, how does openstack actually manage ram allocation? Will it ever take back the unused ram of a guest process? Can I force it to take back that ram? Basically, you are using a hammer as a screwdriver. The tool that Nova gives you to prevent other VMs from consuming memory allocated to another VM is called the ram_allocation_ratio. By default, this is set to 1.5, meaning that if you have 100GB of RAM on a compute host, you can allocate VMs that would consume up to 150GB of RAM. For your VM that has 12GB of RAM used but 96GB allocated, you do not want to do that. Instead, give that VM around 16GB of memory, set your compute host's ram_allocation_ratio (in nova.conf) to 1.0 and then instances on that compute host will not be able to consume more RAM than is available on the host. Best, -jay ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova
- Original Message - > From: "Jean-Philippe Methot" > To: "Edmund Rhudy" > Cc: openstack-operators@lists.openstack.org > Sent: Thursday, March 23, 2017 3:49:26 PM > Subject: Re: [Openstack-operators] Memory usage of guest vms, ballooning and > nova > > > On 2017-03-23 15:15, Edmund Rhudy (BLOOMBERG/ 120 PARK) wrote: > > What sort of memory overcommit value are you running Nova with? The > > scheduler looks at an instance's reservation rather than how much > > memory is actually being used by QEMU when making a decision, as far > > as I'm aware (but please correct me if I am wrong on this point). If > > the HV has 128GB of memory, the instance has a reservation of 96GB, > > you have 16GB reserved via reserved_host_memory_mb, > > ram_allocation_ratio is set to 1.0, and you try to launch an instance > > from a flavor with 32GB of memory, it will fail to pass RamFilter in > > the scheduler and the scheduler will not consider it a valid host for > > placement. (I am assuming you are using FilterScheduler still, as I > > know nothing about the new placement API or what parts of it do and > > don't work in Newton.) > The overcommit value is set to 1.5 in the scheduler. It's not the > scheduler that was preventing the instance from being provisionned, it > was qemu returning that there was not enough ram when libvirt was trying > to provision the instance (that error was not handled well by openstack, > btw, but that's something else). So the instance does pass every filter. > It just ends up in error when getting provisioned in the compute node > because of a lack of ram, with the actual full error message only > visible in the QEMU logs. > > As far as why the memory didn't automatically get reclaimed, maybe KVM > > will only reclaim empty pages and memory fragmentation in the guest > > prevented it from doing so? It might also not actively try to reclaim > > memory unless it comes under pressure to do so, because finding empty > > pages and returning them to the host may be a somewhat time-consuming > > operation. > > That's entirely possible, but according to the doc, libvirt is supposed > to have a memory balloon function that does the operation of reclaiming > empty pages from guest processes, or so I understand. Now, how this > function works is not exactly clear to me, or even if nova uses it or > not. Another user suggested it might not be automatic, which is in > accordance to what you're conjecturing. As a general rule Libvirt provides an interface to facilitate various actions on the guest, but does not perform them without intervention - that is generally it needs to be triggered to do something either by a management layer (OpenStack, oVirt, virt-manager, Boxes, etc.) or explicit call from the operator (e.g. via virsh). In this case as Chris noted while the memory stats are exposed by default, and while Libvirt exposes an API for interacting with the balloon, there is no process in Nova currently - or commonly deployed with it - that will actually exercise the ballooning mechanism to expand/contract the memory balloon. In oVirt/RHEV the traditional way to do it was using Memory Overcommitt Manager (MOM) to define and apply policies for managing it - the guest also needs to have a driver for the virtio balloon device IIRC. Such things have been proposed in the past [2] in OpenStack but never made it to implementation to my knowledge though as you've discovered it still seems like something that is generally desirable. Thanks, Steve [1] http://www.ovirt.org/develop/projects/mom/ [2] https://blueprints.launchpad.net/nova/+spec/libvirt-memory-ballooning > > From: jp.met...@planethoster.info > > Subject: Re: [Openstack-operators] Memory usage of guest vms, > > ballooning and nova > > > > Hi, This is indeed linux, CentOS 7 to be more precise, using > > qemu-kvm as hypervisor. The used ram was in the used column. While > > we have made adjustments by moving and resizing the specific guest > > that was using 96 GB (verified in top), the ram usage is still > > fairly high for the amount of allocated ram. Currently the ram > > usage looks like this : total used free shared buff/cache > > available Mem: 251G 190G 60G 42M 670M 60G Swap: 952M 707M 245M I > > have 188.5GB of ram allocated to 22 instances on this node. I > > believe it's unrealistic to think that all these 22 instances have > > cached/are using up all their ram at this time. On 2017-03-23 > > 13:07, Kris G. Lindgren wrote: > Sorry for the super stupid > &g
Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova
On 2017-03-23 15:15, Edmund Rhudy (BLOOMBERG/ 120 PARK) wrote: What sort of memory overcommit value are you running Nova with? The scheduler looks at an instance's reservation rather than how much memory is actually being used by QEMU when making a decision, as far as I'm aware (but please correct me if I am wrong on this point). If the HV has 128GB of memory, the instance has a reservation of 96GB, you have 16GB reserved via reserved_host_memory_mb, ram_allocation_ratio is set to 1.0, and you try to launch an instance from a flavor with 32GB of memory, it will fail to pass RamFilter in the scheduler and the scheduler will not consider it a valid host for placement. (I am assuming you are using FilterScheduler still, as I know nothing about the new placement API or what parts of it do and don't work in Newton.) The overcommit value is set to 1.5 in the scheduler. It's not the scheduler that was preventing the instance from being provisionned, it was qemu returning that there was not enough ram when libvirt was trying to provision the instance (that error was not handled well by openstack, btw, but that's something else). So the instance does pass every filter. It just ends up in error when getting provisioned in the compute node because of a lack of ram, with the actual full error message only visible in the QEMU logs. As far as why the memory didn't automatically get reclaimed, maybe KVM will only reclaim empty pages and memory fragmentation in the guest prevented it from doing so? It might also not actively try to reclaim memory unless it comes under pressure to do so, because finding empty pages and returning them to the host may be a somewhat time-consuming operation. That's entirely possible, but according to the doc, libvirt is supposed to have a memory balloon function that does the operation of reclaiming empty pages from guest processes, or so I understand. Now, how this function works is not exactly clear to me, or even if nova uses it or not. Another user suggested it might not be automatic, which is in accordance to what you're conjecturing. From: jp.met...@planethoster.info Subject: Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova Hi, This is indeed linux, CentOS 7 to be more precise, using qemu-kvm as hypervisor. The used ram was in the used column. While we have made adjustments by moving and resizing the specific guest that was using 96 GB (verified in top), the ram usage is still fairly high for the amount of allocated ram. Currently the ram usage looks like this : total used free shared buff/cache available Mem: 251G 190G 60G 42M 670M 60G Swap: 952M 707M 245M I have 188.5GB of ram allocated to 22 instances on this node. I believe it's unrealistic to think that all these 22 instances have cached/are using up all their ram at this time. On 2017-03-23 13:07, Kris G. Lindgren wrote: > Sorry for the super stupid question. > > But if this is linux are you sure that the memory is not actually being consumed via buffers/cache? > > free -m > total used free shared buff/cache available > Mem: 128751 27708 2796 4099 98246 96156 > Swap: 8191 0 8191 > > Shows that of 128GB 27GB is used, but buffers/cache consumes 98GB of ram. > > ___ > Kris Lindgren > Senior Linux Systems Engineer > GoDaddy > > On 3/23/17, 11:01 AM, "Jean-Philippe Methot" mailto:jp.met...@planethoster.info>> wrote: > > Hi, > > Lately, on my production openstack Newton setup, I've ran into a > situation that defies my assumptions regarding memory management on > Openstack compute nodes and I've been looking for explanations. > Basically, we had a VM with a flavor that limited it to 96 GB of ram, > which, to be quite honest, we never thought we could ever reach. This is > a very important VM where we wanted to avoid running out of memory at > all cost. The VM itself generally uses about 12 GB of ram. > > We were surprised when we noticed yesterday that this VM, which has been > running for several months, was using all its 96 GB on the compute host. > Despite that, in the guest, the OS was indicating a memory usage of > about 12 GB. The only explanation I see to this is that at some point in > time, the host had to allocate all the 96GB of ram to the VM process and > it never took back the allocated ram. This prevented the creation of > more guests on the node as it was showing it didn't have enough memory left. > > Now, I was under the assumption that memory ballooning was integrated > into nova and that the amount of allocated memory to a specific guest > would deflat
Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova
What sort of memory overcommit value are you running Nova with? The scheduler looks at an instance's reservation rather than how much memory is actually being used by QEMU when making a decision, as far as I'm aware (but please correct me if I am wrong on this point). If the HV has 128GB of memory, the instance has a reservation of 96GB, you have 16GB reserved via reserved_host_memory_mb, ram_allocation_ratio is set to 1.0, and you try to launch an instance from a flavor with 32GB of memory, it will fail to pass RamFilter in the scheduler and the scheduler will not consider it a valid host for placement. (I am assuming you are using FilterScheduler still, as I know nothing about the new placement API or what parts of it do and don't work in Newton.) As far as why the memory didn't automatically get reclaimed, maybe KVM will only reclaim empty pages and memory fragmentation in the guest prevented it from doing so? It might also not actively try to reclaim memory unless it comes under pressure to do so, because finding empty pages and returning them to the host may be a somewhat time-consuming operation. From: jp.met...@planethoster.info Subject: Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova Hi, This is indeed linux, CentOS 7 to be more precise, using qemu-kvm as hypervisor. The used ram was in the used column. While we have made adjustments by moving and resizing the specific guest that was using 96 GB (verified in top), the ram usage is still fairly high for the amount of allocated ram. Currently the ram usage looks like this : totalusedfree shared buff/cache available Mem: 251G190G 60G 42M 670M 60G Swap: 952M707M245M I have 188.5GB of ram allocated to 22 instances on this node. I believe it's unrealistic to think that all these 22 instances have cached/are using up all their ram at this time. On 2017-03-23 13:07, Kris G. Lindgren wrote: > Sorry for the super stupid question. > > But if this is linux are you sure that the memory is not actually being > consumed via buffers/cache? > > free -m >total usedfree shared > buff/cache available > Mem: 128751 277082796 4099 98246 96156 > Swap: 8191 0 8191 > > Shows that of 128GB 27GB is used, but buffers/cache consumes 98GB of ram. > > ___ > Kris Lindgren > Senior Linux Systems Engineer > GoDaddy > > On 3/23/17, 11:01 AM, "Jean-Philippe Methot" > wrote: > > Hi, > > Lately, on my production openstack Newton setup, I've ran into a > situation that defies my assumptions regarding memory management on > Openstack compute nodes and I've been looking for explanations. > Basically, we had a VM with a flavor that limited it to 96 GB of ram, > which, to be quite honest, we never thought we could ever reach. This is > a very important VM where we wanted to avoid running out of memory at > all cost. The VM itself generally uses about 12 GB of ram. > > We were surprised when we noticed yesterday that this VM, which has been > running for several months, was using all its 96 GB on the compute host. > Despite that, in the guest, the OS was indicating a memory usage of > about 12 GB. The only explanation I see to this is that at some point in > time, the host had to allocate all the 96GB of ram to the VM process and > it never took back the allocated ram. This prevented the creation of > more guests on the node as it was showing it didn't have enough memory > left. > > Now, I was under the assumption that memory ballooning was integrated > into nova and that the amount of allocated memory to a specific guest > would deflate once that guest did not need the memory. After > verification, I've found blueprints for it, but I see no trace of any > implementation anywhere. > > I also notice that on most of our compute nodes, the amount of ram used > is much lower than the amount of ram allocated to VMs, which I do > believe is normal. > > So basically, my question is, how does openstack actually manage ram > allocation? Will it ever take back the unused ram of a guest process? > Can I force it to take back that ram? > > -- > Jean-Philippe Méthot > Openstack system administrator > PlanetHoster inc. > www.planethoster.net > > > ___ >
Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova
Hi, This is indeed linux, CentOS 7 to be more precise, using qemu-kvm as hypervisor. The used ram was in the used column. While we have made adjustments by moving and resizing the specific guest that was using 96 GB (verified in top), the ram usage is still fairly high for the amount of allocated ram. Currently the ram usage looks like this : totalusedfree shared buff/cache available Mem: 251G190G 60G 42M 670M 60G Swap: 952M707M245M I have 188.5GB of ram allocated to 22 instances on this node. I believe it's unrealistic to think that all these 22 instances have cached/are using up all their ram at this time. On 2017-03-23 13:07, Kris G. Lindgren wrote: Sorry for the super stupid question. But if this is linux are you sure that the memory is not actually being consumed via buffers/cache? free -m total usedfree shared buff/cache available Mem: 128751 277082796 4099 98246 96156 Swap: 8191 0 8191 Shows that of 128GB 27GB is used, but buffers/cache consumes 98GB of ram. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy On 3/23/17, 11:01 AM, "Jean-Philippe Methot" wrote: Hi, Lately, on my production openstack Newton setup, I've ran into a situation that defies my assumptions regarding memory management on Openstack compute nodes and I've been looking for explanations. Basically, we had a VM with a flavor that limited it to 96 GB of ram, which, to be quite honest, we never thought we could ever reach. This is a very important VM where we wanted to avoid running out of memory at all cost. The VM itself generally uses about 12 GB of ram. We were surprised when we noticed yesterday that this VM, which has been running for several months, was using all its 96 GB on the compute host. Despite that, in the guest, the OS was indicating a memory usage of about 12 GB. The only explanation I see to this is that at some point in time, the host had to allocate all the 96GB of ram to the VM process and it never took back the allocated ram. This prevented the creation of more guests on the node as it was showing it didn't have enough memory left. Now, I was under the assumption that memory ballooning was integrated into nova and that the amount of allocated memory to a specific guest would deflate once that guest did not need the memory. After verification, I've found blueprints for it, but I see no trace of any implementation anywhere. I also notice that on most of our compute nodes, the amount of ram used is much lower than the amount of ram allocated to VMs, which I do believe is normal. So basically, my question is, how does openstack actually manage ram allocation? Will it ever take back the unused ram of a guest process? Can I force it to take back that ram? -- Jean-Philippe Méthot Openstack system administrator PlanetHoster inc. www.planethoster.net ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- Jean-Philippe Méthot Openstack system administrator PlanetHoster inc. www.planethoster.net ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Memory usage of guest vms, ballooning and nova
On 03/23/2017 11:01 AM, Jean-Philippe Methot wrote: So basically, my question is, how does openstack actually manage ram allocation? Will it ever take back the unused ram of a guest process? Can I force it to take back that ram? I don't think nova will automatically reclaim memory. I'm pretty sure that if you have CONF.libvirt.mem_stats_period_seconds set (which it is by default) then you can manually tell libvirt to reclaim some memory via the "virsh setmem" command. Chris ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Memory usage of guest vms, ballooning and nova
Hi, Lately, on my production openstack Newton setup, I've ran into a situation that defies my assumptions regarding memory management on Openstack compute nodes and I've been looking for explanations. Basically, we had a VM with a flavor that limited it to 96 GB of ram, which, to be quite honest, we never thought we could ever reach. This is a very important VM where we wanted to avoid running out of memory at all cost. The VM itself generally uses about 12 GB of ram. We were surprised when we noticed yesterday that this VM, which has been running for several months, was using all its 96 GB on the compute host. Despite that, in the guest, the OS was indicating a memory usage of about 12 GB. The only explanation I see to this is that at some point in time, the host had to allocate all the 96GB of ram to the VM process and it never took back the allocated ram. This prevented the creation of more guests on the node as it was showing it didn't have enough memory left. Now, I was under the assumption that memory ballooning was integrated into nova and that the amount of allocated memory to a specific guest would deflate once that guest did not need the memory. After verification, I've found blueprints for it, but I see no trace of any implementation anywhere. I also notice that on most of our compute nodes, the amount of ram used is much lower than the amount of ram allocated to VMs, which I do believe is normal. So basically, my question is, how does openstack actually manage ram allocation? Will it ever take back the unused ram of a guest process? Can I force it to take back that ram? -- Jean-Philippe Méthot Openstack system administrator PlanetHoster inc. www.planethoster.net ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators