Am 05.02.21 um 01:32 schrieb Hugh Dickins:
On Thu, 4 Feb 2021, Michal Hocko wrote:
On Thu 04-02-21 17:32:20, Christian Koenig wrote:
Hi Michal,

as requested in the other mail thread the following sample code gets my test
system down within seconds.

The issue is that the memory allocated for the file descriptor is not
accounted to the process allocating it, so the OOM killer pics whatever
process it things is good but never my small test program.

Since memfd_create() doesn't need any special permission this is a rather
nice deny of service and as far as I can see also works with a standard
Ubuntu 5.4.0-65-generic kernel.
Thanks for following up. This is really nasty but now that I am looking
at it more closely, this is not really different from tmpfs in general.
You are free to create files and eat the memory without being accounted
for that memory because that is not seen as your memory from the sysstem
POV. You would have to map that memory to be part of your rss.

I mostly agree. The big difference is that tmpfs is only available when mounted.

And tmpfs can be restricted in size per mount point as well as per user quotas IIRC. Looking at my desktop system those restrictions are actually exactly what I see there.

But memfd_create() is just free for all, you don't have any size limit nor access restriction as far as I can see.

The only existing protection right now is to use memoery cgroup
controller because the tmpfs memory is accounted to the process which
faults the memory in (or write to the file).

Agreed, but having to rely on cgroup is not really satisfying when you have to maintain a hardened server.

I am not sure there is a good way to handle this in general
unfortunatelly. Shmem is is just tricky (e.g. how to you deal with left
overs after the fd is closed?). Maybe memfd_create can be more clever
and account memory to all owners of the fd but even that sounds far from
trivial from the accounting POV. It is true that tmpfs can at least
control who can write to it which is not the case for memfd but then we
hit the backward compatibility wall.
Yes, no solution satisfactory, and memcg best, but don't forget
echo 2 >/proc/sys/vm/overcommit_memory

Good point as well.

Regards,
Christian.


Hugh

Reply via email to