On Thu, Jul 21, 2022 at 11:44:11AM +0200, David Hildenbrand wrote: > On 06.07.22 10:20, Chao Peng wrote: > > Normally, a write to unallocated space of a file or the hole of a sparse > > file automatically causes space allocation, for memfd, this equals to > > memory allocation. This new seal prevents such automatically allocating, > > either this is from a direct write() or a write on the previously > > mmap-ed area. The seal does not prevent fallocate() so an explicit > > fallocate() can still cause allocating and can be used to reserve > > memory. > > > > This is used to prevent unintentional allocation from userspace on a > > stray or careless write and any intentional allocation should use an > > explicit fallocate(). One of the main usecases is to avoid memory double > > allocation for confidential computing usage where we use two memfds to > > back guest memory and at a single point only one memfd is alive and we > > want to prevent memory allocation for the other memfd which may have > > been mmap-ed previously. More discussion can be found at: > > > > https://lkml.org/lkml/2022/6/14/1255 > > > > Suggested-by: Sean Christopherson <sea...@google.com> > > Signed-off-by: Chao Peng <chao.p.p...@linux.intel.com> > > --- > > include/uapi/linux/fcntl.h | 1 + > > mm/memfd.c | 3 ++- > > mm/shmem.c | 16 ++++++++++++++-- > > 3 files changed, 17 insertions(+), 3 deletions(-) > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > index 2f86b2ad6d7e..98bdabc8e309 100644 > > --- a/include/uapi/linux/fcntl.h > > +++ b/include/uapi/linux/fcntl.h > > @@ -43,6 +43,7 @@ > > #define F_SEAL_GROW 0x0004 /* prevent file from growing */ > > #define F_SEAL_WRITE 0x0008 /* prevent writes */ > > #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while > > mapped */ > > +#define F_SEAL_AUTO_ALLOCATE 0x0020 /* prevent allocation for > > writes */ > > Why only "on writes" and not "on reads". IIRC, shmem doesn't support the > shared zeropage, so you'll simply allocate a new page via read() or on > read faults.
Right, it also prevents read faults. > > > Also, I *think* you can place pages via userfaultfd into shmem. Not sure > if that would count "auto alloc", but it would certainly bypass fallocate(). Userfaultfd sounds interesting, will further investigate it. But a rough look sounds it only faults to usrspace for write/read fault, not write()? Also sounds it operates on vma and userfaultfd_register() takes mmap_lock which is what we want to avoid for frequent register/unregister during private/shared memory conversion. Chao > > -- > Thanks, > > David / dhildenb