Hi, David, Andrea and Mike The problem I want to discuss it's 1G hugepage based VM and post copy live migration.
I would like to know your opinion on following approach of avoiding such problem: Once we have mmap'ed area through 1G hugetlbfs, remap physical pages with /dev/mem. It will be 2 types of vmas mapped to the same PFN. Register userfaultfd for newly obtained virtual addresses, it could reduce granularity of pages and reduce downtime per one 1G page. So registering userfaultfd for 2Mb, when the real hugepage was 1G, I think, could help. Current postcopy implementation in QEMU allows to make live migration from 1G based hugepage VM to 2Mb based hugepages VM (sanity checks prevent it). Also I checked, it's possible to remap through /dev/mem and get PFN based vmas, register userfaultfd (with allowance in vma_can_userfault) and finally make UFFDIO_COPY with allowing PFN based vmas in __mcopy_atomic. But there are a lot of drawback of such approach: First of all it's /dev/mem interface. Need to provide full access (kernel w/o CONFIG_STRICT_DEVMEM) and need to disable PAT. The second drawback, maybe I just didn't find possibility to remap hugepages again, but mmap of /dev/mem character driver maps 4Kb pages. I don't know how THP could help here, but madvise with MADV_HUGEPAGE didn't. So 4Kb is not exactly what needed, due to overhead of encapsulation summary downtime is worse than in other cases. It would be great to have interface to obtain new virtual address based on existing PFN, but for hugepages. Honestly, I can't find another use case for this feature. -- BR Alexey