From: Yu Zhang <[email protected]> Sent: Friday, January 9, 2026 9:07 PM > > On Thu, Jan 08, 2026 at 06:47:44PM +0000, Michael Kelley wrote: > > From: Yu Zhang <[email protected]> Sent: Monday, December 8, 2025 > 9:11 PM > > > > > > > The "Subject:" line prefix for this patch should probably be "Drivers: hv:" > > to be consistent with most other changes to this source code file. > > > > > Previously, the allocation of per-CPU output argument pages was restricted > > > to root partitions or those operating in VTL mode. > > > > > > Remove this restriction to support guest IOMMU related hypercalls, which > > > require valid output pages to function correctly. > > > > The thinking here isn't quite correct. Just because a hypercall produces > > output > > doesn't mean that Linux needs to allocate a page for the output that is > > separate > > from the input. It's perfectly OK to use the same page for both input and > > output, > > as long as the two areas don't overlap. Yes, the page is called > > "hyperv_pcpu_input_arg", but that's a historical artifact from before the > > time > > it was realized that the same page can be used for both input and output. > > > > Of course, if there's ever a hypercall that needs lots of input and lots of > > output > > such that the combined size doesn't fit in a single page, then separate > > input > > and output pages will be needed. But I'm skeptical that will ever happen. > > Rep > > hypercalls could have large amounts of input and/or output, but I'd venture > > that the rep count can always be managed so everything fits in a single > > page. > > > > Thanks, Michael. > > Is there an existing hypercall precedent that reuses the input page for > output? > I believe reusing the input page should be acceptable, at least for pvIOMMU's > hypercalls, but I will confirm these interfaces with the Hyper-V team.
See hv_pci_read_mmio() for a precedent in current kernel code. There's also hv_get_partition_id() which uses hyperv_pcpu_input_page for the hypercall output. But in this case, there is no input, so input and output aren't actually sharing the page. In the kernel 6.13 and earlier, get_vtl() used the hyperv_pcpu_input_page for both input and output, but it did it wrong because the input and output areas overlapped. While overlap worked because the hypercall is a simple "one-shot" operation (i.e., read the input, then write the output), it's not legal according to the TLFS. When the illegal overlap was fixed in commit 07412e1f163d, the developer decided to allocate the hyperv_pcpu_output_page for VTL2 images, so the fix uses separate pages for the input and output. There was extensive discussion of the tradeoffs in allocating the output page for VTL2. In my view it was an unnecessary use of memory, but the developer preferred to do it for consistency, and I didn't press the argument because it was limited to VTL2. Similarly, I won't press the argument here if folks really want to always allocate the output page. My only request is that the commit message not be misleading about the reason. See https://elixir.bootlin.com/linux/v6.13/source/arch/x86/hyperv/hv_init.c#L416 for the older get_vtl() code that puts the input and output in the same page, but improperly overlaps. > > > > > > > While unconditionally allocating per-CPU output pages scales with the > > > number > > > of vCPUs, and potentially adding overhead for guests that may not utilize > > > the > > > IOMMU, this change anticipates that future hypercalls from child > > > partitions > > > may also require these output pages. > > > > I've heard the argument that the amount of overhead is modest relative to > > the > > overall amount of memory that is typically in a VM, particularly VMs with > > high > > vCPU counts. And I don't disagree. But on the flip side, why tie up memory > > when > > there's no need to do so? I'd argue for dropping this patch, and changing > > the > > two hypercall call sites in Patch 5 to just use part of the so-called > > hypercall input > > page for the output as well. It's only a one-line change in each hypercall > > call site. > > > > I share your concern about unconditionally allocating a separate output page > for each vCPU. And if reusing the input page isn't accepted by the Hyper-V > team, > perhaps we could gate the allocation by checking > IS_ENABLED(CONFIG_HYPERV_PVIOMMU) > in hv_output_page_exist()? Yes, that's doable, though I hope it doesn't come to that. At some point the additional complexity starts to favor just allocating the output page. :-) Michael
