On 08/13/2018 12:00 PM, Dr. David Alan Gilbert wrote: > cc'ing in Mike*2 > * Denis Plotnikov (dplotni...@virtuozzo.com) wrote: >> >> >> On 26.07.2018 12:23, Peter Xu wrote: >>> On Thu, Jul 26, 2018 at 10:51:33AM +0200, Paolo Bonzini wrote: >>>> On 25/07/2018 22:04, Andrea Arcangeli wrote: >>>>> >>>>> It may look like the uffd-wp model is wish-feature similar to an >>>>> optimization, but without the uffd-wp model when the WP fault is >>>>> triggered by kernel code, the sigsegv model falls apart and requires >>>>> all kind of ad-hoc changes just for this single feature. Plus uffd-wp >>>>> has other benefits: it makes it all reliable in terms of not >>>>> increasing the number of vmas in use during the snapshot. Finally it >>>>> makes it faster too with no mmap_sem for reading and no sigsegv >>>>> signals. >>>>> >>>>> The non cooperative features got merged first because there was much >>>>> activity on the kernel side on that front, but this is just an ideal >>>>> time to nail down the remaining issues in uffd-wp I think. That I >>>>> believe is time better spent than trying to emulate it with sigsegv >>>>> and changing all drivers to send new events down to qemu specific to >>>>> the sigsegv handling. We considered this before doing uffd for >>>>> postcopy too but overall it's unreliable and more work (no single >>>>> change was then needed to KVM code with uffd to handle postcopy and >>>>> here it should be the same). >>>> >>>> I totally agree. The hard part in userfaultfd was the changes to the >>>> kernel get_user_pages API, but the payback was huge because _all_ kernel >>>> uses (KVM, vhost-net, syscalls, etc.) just work with userfaultfd. Going >>>> back to mprotect would be a huge mistake. >>> >>> Thanks for explaining the bits. I'd say I wasn't aware of the >>> difference before I started the investigation (and only until now I >>> noticed that major difference between mprotect and userfaultfd). I'm >>> really glad that it's much clear (at least for me) on which way we >>> should choose. >>> >>> Now I'm thinking whether we can move the userfault write protect work >>> forward. The latest discussion I saw so far is in 2016, when someone >>> from Huawei tried to use the write protect feature for that old >>> version of live snapshot but reported issue: >>> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg01127.html >>> >>> Is that the latest status for userfaultfd wr-protect? >>> >>> If so, I'm thinking whether I can try to re-verify the work (I tried >>> his QEMU repository but I failed to compile somehow, so I plan to >>> write some even simpler code to try) to see whether I can get the same >>> KVM error he encountered. >>> >>> Thoughts? >> >> Just to sum up all being said before. >> >> Using mprotect is a bad idea because VM's memory can be accessed from the >> number of places (KVM, vhost, ...) which need their own special care >> of tracking memory accesses and notifying QEMU which makes the mprotect >> using unacceptable. >> >> Protected memory accesses tracking can be done via userfaultfd's WP mode >> which isn't available right now. >> >> So, the reasonable conclusion is to wait until the WP mode is available and >> build the background snapshot on top of userfaultfd-wp. >> But, works on adding the WP-mode is pending for a quite a long time already. >> >> Is there any way to estimate when it could be available? > > I think a question is whether anyone is actively working on it; I > suspect really it's on a TODO list rather than moving at the moment. >
I am not working on it, and it is not on my TODO list. However, if someone starts making progress I will jump in and work on hugetlbfs support. My intention would be to not let hugetlbfs support 'fall behind' general uffd support. -- Mike Kravetz