On Mon, Sep 08, 2025 at 09:55:26AM -0300, Jason Gunthorpe wrote: > On Mon, Sep 08, 2025 at 12:10:37PM +0100, Lorenzo Stoakes wrote: > > We have introduced the f_op->mmap_prepare hook to allow for setting up a > > VMA far earlier in the process of mapping memory, reducing problematic > > error handling paths, but this does not provide what all > > drivers/filesystems need. > > > > In order to supply this, and to be able to move forward with removing > > f_op->mmap altogether, introduce f_op->mmap_complete. > > > > This hook is called once the VMA is fully mapped and everything is done, > > however with the mmap write lock and VMA write locks held. > > > > The hook is then provided with a fully initialised VMA which it can do what > > it needs with, though the mmap and VMA write locks must remain held > > throughout. > > > > It is not intended that the VMA be modified at this point, attempts to do > > so will end in tears. > > The commit message should call out if this has fixed the race > condition with unmap mapping range and prepopulation in mmap()..
To be claer, this isn't the intent of the series, the intent is to make it possible for mmap_prepare to replace mmap. This is just a bonus :) Looking at the discussion in [0] it seems the issue was that .mmap() is called before the vma is actually correctly inserted into the maple tree. This is no longer the case, we call .mmap_complete() once the VMA is fully established, but before releasing the VMA/mmap write lock. This should, presumably, resolve the race as stated? I can add some blurb about this yes. [0]:https://lore.kernel.org/linux-mm/20250801162930.gb184...@nvidia.com/ > > > @@ -793,6 +793,11 @@ struct vm_area_desc { > > /* Write-only fields. */ > > const struct vm_operations_struct *vm_ops; > > void *private_data; > > + /* > > + * A user-defined field, value will be passed to mmap_complete, > > + * mmap_abort. > > + */ > > + void *mmap_context; > > Seem strange, private_data and mmap_context? Something actually needs > both? We are now doing something _new_ - we're splitting an operation that was never split before. Before a hook implementor could rely on there being state throughout the _entire_ operation. But now they can't. And they may already be putting context into private_data, which then gets put into vma->vm_private_data for a VMA added to the maple tree and made accessible. So it is appropriate and convenient to allow for the transfer of state between the two, and I already implement logic that does this. > > Jason Cheers, Lorenzo