On Thu, Jun 12, 2025 at 10:53:34AM -0700, Nicolin Chen wrote: > On Thu, Jun 12, 2025 at 12:42:42PM -0300, Jason Gunthorpe wrote: > > On Thu, Jun 12, 2025 at 05:23:01PM +0200, Thomas Weißschuh wrote: > > > On Thu, Jun 12, 2025 at 11:58:01AM -0300, Jason Gunthorpe wrote: > > > > On Thu, Jun 12, 2025 at 04:27:41PM +0200, Thomas Weißschuh wrote: > > > > > > > > > If the assumption is that this is most likely a kernel bug, > > > > > shouldn't it be fixed properly rather than worked around? > > > > > After all the job of a selftest is to detect bugs to be fixed. > > > > > > > > I investigated the history for a bit and it seems likely we cannot > > > > change the kernel here. Call it an undocumented "feature". > > > > > > I looked a bit and it seems to be mentioned in mmap(2): > > > > > > For mmap(), offset must be a multiple of the underlying huge page size. > > > The system automatically aligns length to be a multiple of the > > > underlying huge page size. > > > > Oh there you go then :) Horrible design. No way for userspace to know > > what the rounded up length actually was and thus no way for > > userspace to unmap it. > > OK. I think we would have to skip those cases then.
Or.. maybe we could just allocate a huge page: @@ -2022,7 +2023,19 @@ FIXTURE_SETUP(iommufd_dirty_tracking) self->fd = open("/dev/iommu", O_RDWR); ASSERT_NE(-1, self->fd); - rc = posix_memalign(&self->buffer, HUGEPAGE_SIZE, variant->buffer_size); + if (variant->hugepages) { + /* + * Allocation must be aligned to the HUGEPAGE_SIZE, because the + * following mmap() will automatically align the length to be a + * multiple of the underlying huge page size. Failing to do the + * same at this allocation will result in a memory overwrite by + * the mmap(). + */ + size = __ALIGN_KERNEL(variant->buffer_size, HUGEPAGE_SIZE); + } else { + size = variant->buffer_size; + } + rc = posix_memalign(&self->buffer, HUGEPAGE_SIZE, size); if (rc || !self->buffer) { SKIP(return, "Skipping buffer_size=%lu due to errno=%d", variant->buffer_size, rc); It can just upsize the allocation, i.e. the test case will only use the first 64M or 128MB out of the reserved 512MB huge page. Thanks Nicolin