On Fri, Nov 21, 2025 at 05:45:20AM +0000, Michael Kelley wrote: > From: Stanislav Kinsburskii <[email protected]> Sent: > Wednesday, November 19, 2025 4:45 PM > > > > On Tue, Nov 18, 2025 at 04:29:56PM +0000, Michael Kelley wrote: > > > From: Stanislav Kinsburskii <[email protected]> Sent: > > > Monday, November 17, 2025 8:53 AM > > [snip] > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig > > > > index 0b8c391a0342..5f1637cbb6e3 100644 > > > > --- a/drivers/hv/Kconfig > > > > +++ b/drivers/hv/Kconfig > > > > @@ -75,6 +75,7 @@ config MSHV_ROOT > > > > depends on PAGE_SIZE_4KB > > > > select EVENTFD > > > > select VIRT_XFER_TO_GUEST_WORK > > > > + select HMM_MIRROR > > > > > > Couldn't you also do "select MMU_NOTIFIER" to avoid the #ifdef's > > > and stubs for when it isn't selected? There are other Linux kernel > > > drivers that select it. Or is the intent to allow building an image that > > > doesn't support unpinned memory, and the #ifdef's save space? > > > > > > > That's an interesting question. This driver can function without MMU > > notifiers > > by pinning all memory, which might be advantageous for certain real-time > > applications. > > However, since most other virtualization solutions use MMU_NOTIFIER, there > > doesn't appear to be a strong reason for this driver to deviate. > > I'm not clear on your last sentence. Could you elaborate? >
I meant I'll select MMU_NOTIFIER. > > Right. But even for pinned regions as coded today, is that assumption > correct? Due to memory being fragmented at the time of region creation, > it would be possible that some 2Meg ranges in a region are backed by a large > page, while other 2Meg ranges are not. In that case, a single per-region flag > isn't enough information. Or does the hypercall work OK if the "map huge > page" flag is passed when the range isn't a huge page? I'm not clear on what > the hypercall requires as input. > It is not with THP enabled and missing MAP_HUGETLB mmap flag. I'll fix it in the next revision. > > OK. So what is the impact? Losing the perf benefit of mapping guest > memory in the SLAT as a 2 Meg large page vs. a bunch of individual 4K > pages? Anything else? > I decided to rework it in scope of these the series. > > > > This is possible, if hypervisor returns some invalid GFN. > > But there is also a possibility, that this code can race with region > > removal from a guest. > > I'll address it in the next revision. > > In either of these cases, what happens next? The MSHV_RUN_VP ioctl > will return to user space with the unhandled HVMSG_GPA_INTERCEPT > message. Is there anything user space can do to enable the VP to make > progress past the fault? Or does user space just have to terminate the > guest VM? > I don't think there is much to be done here in kernel. Control will be returned to VMM, which will likely kill the guest. > > > > > > I'm pretty sure region->start_uaddr is always page aligned. But what > > > about range->start and range->end? The code here and below assumes > > > they are page aligned. It also assumes that range->end is greater than > > > range->start so the computation of page_count doesn't wrap and so > > > page_count is >= 1. I don't know whether checks for these assumptions > > > are appropriate. > > > > > > > There is a check for memory region size to be non-zero and page aligned > > in mshv_partition_ioct_set_memory function, which is the only caller for > > memory region creation. And region start is defined in PFNs. > > > > Right -- no disagreement that the region start and size are page aligned > and non-zero. But what about the range that is being invalidated? > (i.e., range->start and range->end) The values in that range are coming > from the mm subsystem, and aren't governed by how a region is created. > If that range is a subset of the MSHV region, then > mshv_region_internal_invalidate() will be operating on whatever subset > was provided in the 'range' argument. > > mshv_region_interval_invalidate() is ultimately called from > mmu_notifier_invalidate_range_start(), which has about 30 different > callers in the kernel, mostly in the mm subsystem. It wasn't > clear to me what rules, if any, those 30 callers are following when they > set up the range to be invalidated. > > Michael I see what you mean. Yes, these values are guarantee to be page aligned. And it must be as anything else can't be invalidated. Thanks, Stanislav
