Hannes Frederic Sowa <han...@stressinduktion.org> writes:

> On 18.05.2016 01:12, Eric W. Biederman wrote:
>> 
>> While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
>> bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
>> with current->nsproxy->mnt_ns. As the code does not acquire a reference
>> to the mount namespace it can not possibly be correct to store the mount
>> namespace on the superblock as it does.
>> 
>> Replace mount_ns with mount_nodev so that each mount of the bpf
>> filesystem returns a distinct instance, and the code is not utterly
>> broken.
>> 
>> Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
>> Signed-off-by: "Eric W. Biederman" <ebied...@xmission.com>
>> ---
>> 
>> No one should care about this change, as userspace typically only mounts
>> things once and does not depend on things in one mount do not showing up
>> in another.  Can someone who actually uses the bpf filesystem please
>> verify this.
>> 
>> This needs to be fixed as the existing code is broken beyond words that
>> I know how to express.
>
> The idea is to have the bpf filesystem as a singeleton per mnt-namespace
> to prevent endless instances being created and kernel resources being
> hogged by pinning them to hard to discover bpf mounts.

There is no method in the kernel to support a singleton per mount
namespace.  Mount propagation ruins that idea, and in most recent
distros mount propgation is enabled by default (it is something you can
opt out of later but not opt into later).

In general convention is a much better defense against endless
instances.

Having just fought a similar fight with devpts (because things went
horribly wrong) you are much better off with telling people to be careful
how to use things rather than not letting people use things wrong.
Especially if we are still at the "the idea is" stage rather than a
stage where changing this will actually break deployed implementations.

> Do you see any problem with adding appropriate reference counts?

Honestly my head hurts thinking about it.  Technically reference counts
would fix one aspect of it, but the whole situation really sucks.

Especially in a world of mount propgation where these mounts propgate
between mount namespaces, and where people choose to share or not on a
different criteria besides the mount namespace, attempting a one fs per
mount namespace policy is just bizarre bordering on completely broken.
Even if implemented correctly.

Filesystems do not know and should not care about the mount namespace
they are implemented it.  These are and should remain independent
concenpts and your implementation and attempted semantics violate that
horribly and I can't see a way to achieve what you were trying to
achieve.  The VFS just doesn't work that way.

Eric

Reply via email to