Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

Avi Kivity Sun, 30 Nov 2008 10:07:38 -0800

Andi Kleen wrote:

I was more thinking about some heuristics that checks when a page
is first mapped into user space. The only problem is that it is zeroed
through the direct mapping before, but perhaps there is a way around it.That's one of the rare cases when 32bit highmem actually makes thingseasier.
It might be also easier on some other OS than Linux who don't use
direct mapping that aggressively.
In the context of kvm, the mmap() calls happen before the guest ever
The mmap call doesn't matter at all, what matters is when the
page is allocated.

The page is allocated at an uninteresting point in time. For example,the boot loaded allocates a bunch of pages.

executes. First access happens somewhat later, but still we cannotcount on the majority of accesses to come from the same cpu as the firstaccess.
It is a reasonable heuristic. It's just like the rather
successfull default local allocation heuristic the native kernel uses.

It's very different. The kernel expects an application that touchedpage X on node Y to continue using page X on node Y. Becauseapplications know this, they are written to this assumption. However,in a virtualization context, the guest kernel expects that page Xbelongs to whatever node the SRAT table points at, without regard to thefirst access.

Guest kernels behave differently from applications, because realhardware doesn't allocate pages dynamically like the kernel can forapplications.

(btw, what do you do with cpu-less nodes? I think some sgi hardware hasthem)

The alternative is to keep your own pools and allocate from the
correct pool, but then you either need pinning or getcpu()
This is meaningless in kvm context. Other than small bits of memoryneeded for I/O and shadow page tables, the bulk of memory is allocatedonce.
Mapped once. Anyways that could be changed too if there was need.


Mapped once and allocated once (not at the same time, but fairly close).

We can't change it without changing the guest.

Basic algorithm:
- If guest touches virtual node that is the same as the local node
of the current vcpu assume it's a local allocation.
The guest is not making the same assumption; lying to the guest is


Huh? Pretty much all NUMA aware OS should. Linux will definitely.

No. Linux will assume a page belongs to the node the SRAT table says itbelongs to. Whether first access will be from the local node depends onthe workload. If the first application running accesses all memory froma single cpu, we will allocate all memory from one node, but this is wrong.

(2) even without npt/ept, we have no idea how often mappings are usedand by which cpu. finding out is expensive.
You see a fault on the first mapping. That fault is on the CPU that
did the access.  Therefore you know which one it was.

It's meaningless information. First access means nothing. And again,the guest doesn't expect the page to move to the node where it touched it.


(we also see first access with ept)

(3) for many workloads, there are no unused pages. the guestapplication allocates all memory and manages memory by itself.


First a common case of guest using all memory is file cache,
but for NUMA purposes file cache locality typically doesn't
matter because it's not accessed frequently enough that
non locality is a problem. It really only matters for mapping
that are used often by the CPU.

When a single application allocates everything and keeps it that is fine
too because you'll give it approximately local memory on the initial
set up (assuming the application has reasonable NUMA behaviour by itself
on a first touch local allocation policy)

Sure, for the simple cases it works. But consider your first examplefollowed by the second (you can even reboot the guest in the middle, butthe bad assignment sticks).


And if the vcpu moves for some reason, things get screwed up permanently.

We should try to be predictable, not depend on behavior the guest has noreal reason to follow, if it follows hardware specs.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

Reply via email to