On Fri, May 15, 2026 at 07:35:45PM -0300, Jason Gunthorpe wrote:
> On Tue, May 12, 2026 at 12:24:08AM +0800, Yu Zhang wrote:
> > +static inline u16 hv_iommu_fill_iova_list(union hv_iommu_flush_va
> > *iova_list,
> > + unsigned long start,
> > + unsigned long end)
> > +{
> > + unsigned long start_pfn = start >> PAGE_SHIFT;
> > + unsigned long end_pfn = PAGE_ALIGN(end) >> PAGE_SHIFT;
> > + unsigned long nr_pages = end_pfn - start_pfn;
> > + u16 count = 0;
> > +
> > + while (nr_pages > 0) {
> > + unsigned long flush_pages;
> > + int order;
> > + unsigned long pfn_align;
> > + unsigned long size_align;
> > +
> > + if (count >= HV_IOMMU_MAX_FLUSH_VA_COUNT) {
> > + count = HV_IOMMU_FLUSH_VA_OVERFLOW;
> > + break;
> > + }
> > +
> > + if (start_pfn)
> > + pfn_align = __ffs(start_pfn);
> > + else
> > + pfn_align = BITS_PER_LONG - 1;
> > +
> > + size_align = __fls(nr_pages);
> > + order = min(pfn_align, size_align);
> > + iova_list[count].page_mask_shift = order;
> > + iova_list[count].page_number = start_pfn;
> > +
> > + flush_pages = 1UL << order;
> > + start_pfn += flush_pages;
> > + nr_pages -= flush_pages;
> > + count++;
> > + }
>
> This seems like a really silly hypervisor interface. Why doesn't it
> just accept a normal range? Splitting it into power of two aligned
> ranges is very inefficient.
Fair point. I'm not sure how much flexibility we have to change
this hypercall interface at the moment - it predates the pvIOMMU
work and may have other consumers beyond Linux guest. On the other
hand, having the guest specify 2^N-aligned blocks does save the
hypervisor from having to decompose ranges itself before issuing
hardware invalidation commands - the guest-provided entries can be
fed to the HW more or less directly.
That said, the way I'm currently using this interface may be
more precise than necessary. Maybe we have 2 options:
1) Current approach: decompose the range into multiple exact
2^N-aligned blocks with no over-flush, but at the cost of
more complex calculations and more entries.
2) Follow what Intel/AMD drivers do: find a single minimal
2^N-aligned block that covers the entire range, but may
over-flush.
Any preference?
@Michael, since you've also been reviewing this patch, I'd
appreciate your thoughts on the above as well. :)
Yu