On Fri 05-02-21 08:54:31, Christian König wrote: > Am 05.02.21 um 01:32 schrieb Hugh Dickins: > > On Thu, 4 Feb 2021, Michal Hocko wrote: > > > On Thu 04-02-21 17:32:20, Christian Koenig wrote: > > > > Hi Michal, > > > > > > > > as requested in the other mail thread the following sample code gets my > > > > test > > > > system down within seconds. > > > > > > > > The issue is that the memory allocated for the file descriptor is not > > > > accounted to the process allocating it, so the OOM killer pics whatever > > > > process it things is good but never my small test program. > > > > > > > > Since memfd_create() doesn't need any special permission this is a > > > > rather > > > > nice deny of service and as far as I can see also works with a standard > > > > Ubuntu 5.4.0-65-generic kernel. > > > Thanks for following up. This is really nasty but now that I am looking > > > at it more closely, this is not really different from tmpfs in general. > > > You are free to create files and eat the memory without being accounted > > > for that memory because that is not seen as your memory from the sysstem > > > POV. You would have to map that memory to be part of your rss. > > I mostly agree. The big difference is that tmpfs is only available when > mounted. > > And tmpfs can be restricted in size per mount point as well as per user > quotas IIRC. Looking at my desktop system those restrictions are actually > exactly what I see there.
I cannot find anything about per user quotas for tmpfs in the tmpfs man page. Or maybe I am looking at a wrong layer and there is a generic handling somewhere in the vfs core? > But memfd_create() is just free for all, you don't have any size limit nor > access restriction as far as I can see. Yes, this is unfortunate and a design decision that should have been considered when the syscall has been introduced. But this boat has sailed looong ago to change that without risking a userspace breakage. > > > The only existing protection right now is to use memoery cgroup > > > controller because the tmpfs memory is accounted to the process which > > > faults the memory in (or write to the file). > > Agreed, but having to rely on cgroup is not really satisfying when you have > to maintain a hardened server. Yes I do recognize the pain. The only other way to mitigate the risk is to disallow the syscall to untrusted users in a hardened environment. You should be very strict in tmpfs usage there already. -- Michal Hocko SUSE Labs