On Fri, Aug 01, 2008 at 02:47:39PM -0500, Anthony Liguori wrote:
> Is there a way to detect MMU notifiers from userspace?  I don't think it's 
> currently safe to madvise unconditionally.

There is no way to detect mmu notifiers from userspace (well strictly
speaking you could check /proc/kallsyms) but the point is that without
mmu notifiers madvise won't be enough (the memory may not be freed
without the other ioctl proposed by Marcelo that also flushes sptes).

Especially with the plan to pin pages during memslot allocation (next
step in the kvm-userland compatibility effort when built against old
kernel) madvise will be a noop because all memory in the memslot will
remain pinned.

The safety issue with madvise was then found to be the same issue for
all rmap_remove invocations (not just the ones done by Marcelo's
ioctl). Whenever the put_page in rmap_remove happens to be the last
free of the page, we have a tlb race where we flush the other vcpus
tlbs after the page was already freed by rmap_remove. This is usually
only an issue for smp guest on smp host.  So in short madvise run on a
memslot wasn't "less" safe, but still the ioctl proposed to remove
sptes to allow memory to be released wasn't capable of fixing this
race for madvise.

mmu notifiers on the other hand are allowing madvise to free all
memory reliably while fixing this race as well as a whole (not just
for madvise but for swapping and all other operations too).

The compatibility plan for the host kernels without mmu notifiers is
going to entirely prevent swapping, and in turn it will never allow
rmap_remove to invoke the last free of the page will also fix the tlb
troubles.

So if you currently just madvise, right now it won't be less safe than
without madvise (it's not safe regardless) but without mmu notifier
madvise won't remove sptes so it won't be reliable (this is why once
Marcelo proposed the ioctl to call after/before madvise, but that
ioctl still had the same issues that every other rmap_remove had
without mmu notifier). With the next plan of pinning all pages you can
still madvise just fine, and it will be safe too (like every other
rmap_remove will be safe) but it will be a guaranteed a noop because
of the memslot being entirely page-pinning. So madvise on old kernels
is never less safe but if we want to provide reliable ballooning to
host kernels without mmu notifiers with the next compatibility plan
that Avi suggested, we should "trim" the memslot first (to unpin the
pages) instead of calling madvise and then you can munmap the range
instead of madvise. Or you can add a new call that only drops sptes in
the range and _later_ unpin the pages (that's similar to Marcelo ioctl
but the unpin event happening later should fix its previous safety
issues). However if the sptes removal is done without touching the
memslot that will complicate the memslots semantics quite a bit
(because all memory mapped by a memslot won't be guaranteed to be
entirely pinned anymore and we'll have to track separately the
balloned ranges that aren't pinned). So I guess teaching
set_memory_region to trim a memslot (currently it can't) sounds one
approach that could be used to allow balloning on kernels without mmu
notifiers, if we go with Avi's backwards compatibility suggestion of
memslot pinning. Then it's up to you if to munamp or madvise, it'll be
the same then.

It's a bit complicated if something isn't clear on what the issues are
without mmu notifier, don't hesitate to ask more details.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to