Re: Thinking outside the box on file systems

Kyle Moffett Wed, 15 Aug 2007 14:21:23 -0700

Al Viro added to the CC, since he's one of the experts on this stuffand will probably whack me with a LART for explaining it all wrong,or something. :-D


On Aug 15, 2007, at 16:38:36, Phillip Susi wrote:

Kyle Moffett wrote:
We've *always* had to do this; that's what "chmod -R" or "setfacl -R" are for :-D. The major problem is that the locking and lookupoverhead gets really significant if you have to look at the entiredirectory tree in order to determine the permissions for onesingle object. I definitely agree that we need better GUIs formanaging file permissions, but I don't see how you could modifythe kernel in this case to do what you want.
I am well aware of that, I'm simply saying that sucks. Doing arecursive chmod or setfacl on a large directory tree is slow as allhell.


Doing it in the kernel won't make it any faster.

As for hard links, your access would depend on which name you useto access the file. The file itself may still have an acl thatgrants or denies access to people no matter what name they use, butif it allows inheritance, then which name you access it by willmodify the effective acl that it gets.

You can't safely preserve POSIX semantics that way. For example,even without *ANY* ability to read /etc/shadow, I can easily "ln /etc/shadow /tmp/shadow", assuming they are on the same filesystem. Ifthe /etc/shadow permissions depend on inherited ACLs to enforceaccess then that one little command just made your shadow file world-readable/writeable. Oops.


Think about it this way:

Permissions depend on *what* something is, not *where* it is. UnderLinux you can leave the digital equivalent of a $10,000 piece ofjewelry lying around in /var/www and not have to worry about it beingcompromised as long as you set your permissions properly (not that Irecommend it). Moving the piece of jewelry around your house doesnot change what it *is* (and by extension does not change theprotection required on it), any more than "ln /etc/shadow /tmp/shadow" (or "mv") changes what *it* is. If your /house is reallyextraordinarily secure then you could leave the jewelry lying aroundas /house/gems.bin with permissions 0777, but if somebody had a back-door to /house (an open fd, a careless typo, etc) then you'd have thesame issues.

As for performance implications, I hardly think it is worrisome.Each directory in the path has to be looked up anyhow so youalready have their acls, so when you finally reach the file, youjust have to take the union of the acls encountered on the way.Should only be a few cpu cycles.

Not necessarily. When I do "vim some-file-in-current-directory", forexample, the kernel does *NOT* look up the path of my currentdirectory. It does (in pseudocode):


if (starts_with_slash(filename)) {
        entry = task->cwd;
} else {
        entry = task->root;
}
while (have_components_left(filename)
        entry = lookup_next_component(filename);
return entry;

That's not even paying attention to functions like "fchdir" or theirinteractions with "chroot" and namespaces. I can probably have anopen directory handle to a volume in a completely differentnamespace, a volume which isn't even *MOUNTED* in my current fsnamespace. Using that file-handle I believe I can "fchdir","openat", etc, in a completely different namespace. I can do thesame thing with a chroot, except there I can even "escape":

  /* Switch into chroot.  Doesn't drop root privs */
  chdir("/some/dir/somewhere");
  chroot(".");

  /* Malicious code later on */
  chdir("/");
  chroot("another_dir");
  chdir("../../../../../../../../..");
  chroot(".");
  /* Now I'm back in the real root filesystem */

There's also the question of what to do about namespaces and bindmounts. If I run "mount --bind /foo /home/foo", then do I getdifferent file permissions depending on what path I access thefile by? What if I then run chroot("/foo"), do I get differentfile permissions then? What if I have two namespaces, each withtheir own root filesystem (say "root1" and "root2"), and I mountthe other namespace's root filesystem in a subdir of each:
  NS1:  mount /dev/root2 /otherns
  NS2:  mount /dev/root1 /otherns
Now I have the following paths to the same file, do these getdifferent permissions or the same?
  NS1:/foo == NS2:/otherns/foo
  NS2:/bar == NS1:/otherns/bar
If you answered that they get different permissions, then how doyou handle the massive locking performance penalty due to theextra lookups? If you answered "same permissions", then how doyou explain the obvious discrepancy to the admin?
Good question. I would say the bind mount should have a flag tospecify. That way the admin can choose where it should inheritfrom. I also don't see where this locking penalty is.

The locking penalty is because the path-lookup is *not* implied. Theabove chroot example shows that in detail. If you have to do thelookup in *reverse* on every open operation then you have to either:

(A) Store lots of security context with every open directory (cwdincluded). When a directory you have open is moved, you still havefull access to everything inside it since your handle's data hasn'tchanged.

OR

(B) Do a reverse-lookup on every operation, which means you aretaking locks in the reverse order from the normal lookup order andyou're either very inefficient and slow or you're deadlocky. This isalso vulnerable to confusion when the "path" isn't visible from yournamespace/chroot at all. This also causes problems with overmounteddirectories:

        [EMAIL PROTECTED]:/# mkdir -p /foo/bar
        [EMAIL PROTECTED]:/# cd /foo/bar
        [EMAIL PROTECTED]:/foo/bar# mount -t tmpfs tmpfs /foo
        [EMAIL PROTECTED]:/foo/bar# touch /foo/bar/baz
        touch: cannot touch `/foo/bar/baz': No such file or directory
        [EMAIL PROTECTED]:/foo/bar# pwd
        /foo/bar
        [EMAIL PROTECTED]:/foo/bar# ls -al
        total 4
        drwxr-xr-x 2 root root 4096 2007-08-15 17:04 .
        drwxrwxrwt 2 root root   40 2007-08-15 17:04 ..

The idea is nice, but as soon as you add multiple namespaces,chroots, or bind mounts the concept breaks down. For security-sensitive things like file permissions, you really do wantdeterminate behavior regardless of the path used to access the data.
How does it break down? Chroots have absolutely no impact at all,and the bind mounts/namespaces can be handled as I mentionedabove. If you really want to be sure of the effective permissionson the file, then you simply flag it to not inherit from its parentor use an inherited rights mask to block the specific inheritedpermissions you want.

See above. Linux has a very *very* powerful VFS nowadays. It alsohas support for shared subtrees across private namespaces, allowingthings like polyinstantiation. For example you can configure a Linuxbox so every user gets their own empty /tmp directory created andmounted on their namespace's /tmp when they log in, but all the restof the system mounts are all shared.


I'll be offline for a couple days, we can chat more when I get back.

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Thinking outside the box on file systems

Reply via email to