Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

Matt Riedemann Thu, 09 Jun 2016 06:46:14 -0700

On 6/9/2016 6:15 AM, Paul Michali wrote:

On Wed, Jun 8, 2016 at 11:21 PM Chris Friesen
<chris.frie...@windriver.com <mailto:chris.frie...@windriver.com>> wrote:

    On 06/03/2016 12:03 PM, Paul Michali wrote:
    > Thanks for the link Tim!
    >
    > Right now, I have two things I'm unsure about...
    >
    > One is that I had 1945 huge pages left (of size 2048k) and tried
    to create a VM
    > with a small flavor (2GB), which should need 1024 pages, but Nova
    indicated that
    > it wasn't able to find a host (and QEMU reported an allocation issue).
    >
    > The other is that VMs are not being evenly distributed on my two
    NUMA nodes, and
    > instead, are getting created all on one NUMA node. Not sure if
    that is expected
    > (and setting mem_page_size to 2048 is the proper way).

    Just in case you haven't figured out the problem...

    Have you checked the per-host-numa-node 2MB huge page availability
    on your host?
      If it's uneven then that might explain what you're seeing.

These are the observations/questions I have:

1) On the host, I was seeing 32768 huge pages, of 2MB size. When I
created VMs (Cirros) using small flavor, each VM was getting created on
NUMA nodeid 0. When it hit half of the available pages, I could no
longer create any VMs (QEMU saying no space). I'd like to understand why
the assignment was always going two nodeid 0, and to confirm that the
huge pages are divided among the number of NUMA nodes available.

2) I changed mem_page_size from 1024 to 2048 in the flavor, and then
when VMs were created, they were being evenly assigned to the two NUMA
nodes. Each using 1024 huge pages. At this point I could create more
than half, but when there were 1945 pages left, it failed to create a
VM. Did it fail because the mem_page_size was 2048 and the available
pages were 1945, even though we were only requesting 1024 pages?

3) Related to #2, is there a relationship between mem_page_size, the
allocation of VMs to NUMA nodes, and the flavor size? IOW, if I use the
medium flavor (4GB), will I need a larger mem_page_size? (I'll play with
this variation, as soon as I can). Gets back to understanding how the
scheduling determines how to assign the VMs.

4) When the VM create failed due to QEMU failing allocation, the VM went
to error state. I deleted the VM, but the neutron port was still there,
and there were no log messages indicating that a request was made to
delete the port. Is this expected (that the user would have to manually
clean up the port)?

When you hit this case, can you check if instance.host is set in thedatabase before deleting the instance? I'm guessing what's happening isthe instance didn't get assigned a host since it eventually ended upwith NoValidHost, so when you go to delete it doesn't have a compute tosend it to for delete, so it deletes from the compute API, and we don'thave the host binding details to delete the port.

Although, when the spawn failed in the compute to begin with we shouldhave deallocated any networking that was created before kicking back tothe scheduler - unless we don't go back to the scheduler if the instanceis set to ERROR state.

A bug report with stacktrace of the failure scenario when the instancegoes to error state bug n-cpu logs would probably help.


5) A coworker had hit the problem mentioned in #1, with exhaustion at
the halfway point. If she delete's a VM, and then changes the flavor to
change the mem_page_size to 2048, should Nova start assigning all new
VMs to the other NUMA node, until the pool of huge pages is down to
where the huge pages are for NUMA node 0, or will it alternate between
the available NUMA nodes (and run out when node 0's pool is exhausted)?

Thanks in advance!

PCM




    Chris

    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

Reply via email to