Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello. Indan Zupancic wrote: It seems to me that the alternatives you are proposing include modification of userland applications. But my assumption is that Don't require modification of userland applications. If you want a secure system it isn't that unreasonable to expect applications to not do brain dead things, so not requiring any modifications or config changes seems a bit optimistic to me. It depends. Some users have to continue using brain dead legacy applications without modification because ... the application's source code is not available. the distributor no longer supports the application. the application is too difficult/complicated to reconstruct. For cases where you can expect application won't do brain dead things and/or we can reconstruct application, your approach is OK. In other words, I want to implement without asking applications to use /dev/dynamic/ or something. This filesystem is intended to provide support for legacy applications. (In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and later.) Legacy applications should cope with a static /dev/. What is the advantage of your filesystem compared to a static /dev/? I assume a static /dev/ means a /dev/ directory in 2.4 kernels. This filesystem's advantage: (1) Can guarantee filename/attribute pairs. A process with root privilege can do mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2 if /dev is in / partition or is a devfs partition, whereas a process with root privilege cannot do mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2 if /dev is this filesystem unless granted by the configuration file. So, you can guarantee that /dev/hda1 is block-3-1 and /dev/hda2 is block-3-2 . (e.g. mount /dev/hda1 /home won't mount block-3-2 partition on /home .) (2) Can keep nodes that needn't to be deleted/modified for read-only. A process with root privilege can delete /dev/null on / partition or on devfs partition, whereas a process with root privilege cannot delete /dev/null on this filesystem unless granted by the configuration file. So, you can guarantee the node which needn't to be deleted/modified won't be deleted/modified. (e.g. /dev/null is always there with char-1-3 attribute.) (3) Can hide unwanted device nodes. A process with root privilege can create new nodes on / partition or on devfs, whereas a process with root privilege cannot create new nodes on this filesystem that are not specified by configuration file. So, you can expose specific nodes selectively. (e.g. Allow accessing /dev/hda1 , but forbid accessing /dev/hda2 .) Use of a tiny daemon that communicates with udev is not sufficient. The udev is not the only application that modifies /dev files. Oh, it isn't? Which other applications do modify /dev files? I'd like to hear about a few, no matter how obscure or proprietary. And please tell how many of those will stop working with a static /dev with all nodes they might create already existing. I don't know. I'm not using rare software. At least, the tiny daemon should communicate with the kernel so that all requests are checked by the tiny daemon. No, why should the kernel be involved? The tiny daemon would be the only one allowed to modify /dev/, so all mknod commands will be done by it. Of course it means that you might need to modify the two or three apps wanting to create device nodes, or you can make an LD_PRELOAD lib that intercepts mknod commands and sends them to the daemon. No. The kernel must be involved. Suppose the tiny daemon is the only one allowed to modify /dev/ . foo requests mknod /dev/null from chroot() environment. bar requests mknod /dev/null from clone(CLONE_FS) + mount() environment. How can the daemon know where to create the node? How can the daemon determine whether the requested pathname is in /dev directory or not? The process who requests mknod and the process who performs mknod are not always using the same / directory. The daemon must not forbid creation of /dev/null if the realpath() is /tmp/dev/null (i.e. mknod /dev/null after chroot /tmp), because the daemon is not asked to manage /tmp/dev directory. Who can guarantee that the daemon can access all namespaces? The process who requests mknod and the process who performs mknod are not always using the same namespace. If foo or bar is a statically linked or suid-root application (where LD_PRELOAD is ignored), they would attempt to create device nodes directly (i.e. call sys_mknod() instead of communicating with the daemon) and abort due to failure. Not only applications who wants to create device nodes in /dev/ , but also all applications who wants to modify entries in /dev/ . From the beginning, the kernel is deeply involved because in-kernel MAC is essential
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hi, On Fri, January 11, 2008 09:46, Tetsuo Handa wrote: It depends. Some users have to continue using brain dead legacy applications without modification because ... the application's source code is not available. Source isn't needed, as long as the vendor has it. the distributor no longer supports the application. Then why should anyone else support it? the application is too difficult/complicated to reconstruct. Then you can't trust it and it shouldn't have permission to do potentially dangerous things in /dev/ either. Even if you can contain the device node creation, it most likely does other potentially dangerous things too. As a whole it can't be trusted. I assume a static /dev/ means a /dev/ directory in 2.4 kernels. This filesystem's advantage: I'm not talking about devfs, I'm talking about a real static /dev. I'm using it now and it works fine (I let udev manage /udev/ to see what's it's doing). (1) Can guarantee filename/attribute pairs. Wrong. All nodes are created and thus there's never a need to create new nodes. So /dev/ can't be modified by anyone. This works because all nodes that anyone might want to create already exist. (2) Can keep nodes that needn't to be deleted/modified for read-only. This would also be true for all nodes in a static /dev I think. (3) Can hide unwanted device nodes. In a static /dev you only create the nodes you want. It's true that it can't hide nodes for hardware that doesn't exist (other than deleting the nodes manually), but that was the norm for years before the whole dynamic /dev thing catched up. I don't know. I'm not using rare software. It doesn't have to be rare, anything is fine. You don't know anything else than udev? (And shell commands like mknod etc.) Then why all the talk about mysterious apps that might need to do all kind of crazy things in /dev? No. The kernel must be involved. Who can guarantee that the daemon can access all namespaces? The process who requests mknod and the process who performs mknod are not always using the same namespace. This is true on a theoretical level. But practically I think you can either run multiple daemons, one for each namespace where you want to control /dev/, or if you really want one daemon you can pass the directory fd to it where the node should be created and use mknodat(). I believe that crosses namespaces correctly. If the daemon can't be contacted or doesn't want to do a mknod for you, the preloaded lib can fallback to doing the mknod itself, though normally that would be disallowed by MAC. But I think that the chance that any process needs to create device nodes in a chroot is at the level of fairy existance. If foo or bar is a statically linked or suid-root application (where LD_PRELOAD is ignored), they would attempt to create device nodes directly (i.e. call sys_mknod() instead of communicating with the daemon) and abort due to failure. Not only applications who wants to create device nodes in /dev/ , but also all applications who wants to modify entries in /dev/ . If the preloaded library is setuid, it will also work for setuid programs. It's true that it won't work for statically linked apps, but so what? Device node creating apps are rare enough, let alone the ones that are also statically linked. Nice theoretical problem, but I don't think anyone will care in practice. From the beginning, the kernel is deeply involved because in-kernel MAC is essential to realize only the tiny daemon can modify /dev/. Why not do this filename/attribute checking in the kernel too? That only the tiny daemon can modify /dev/ is done with MAC rules, the ones that should be the default for all applications except udev by default already. For teh kernel nothing changes. The ammount of code will be the current parsing code + a few hundred lines of code, including the preloaded library. You will be bothered with what is the realpath of /dev/null? and how can I reach the realpath? because you have to manage namespace information. Or ignore the problem and see if it's a real problem or a nice theoretical case. And when it turns out to be a real problem, there are probably ways to fix it (See above). But you know what exactly is needed only after problems do turn up. OK. I'll consider adding this feature. But I'd like to use approach (B) to keep the advantage (3). (A) White-listing + Black-listing approach. Permit any operations if the filename didn't appear in the configuration file. (B) White-listing + Wild-card approach. Support wildcard and permit only operations if the filename-with-wildcard/attributes-with-wildcard appeared in the configuration file. With this the filesystem at least adds some unique abilities. If anyone really needs it and where/how it should be implemented is another matter. Without it it's a glorified and complicated drop-in replacement for a static /dev/. Regards, Indan - To
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
On Fri, Jan 11, 2008 at 11:05:07PM +0900, Tetsuo Handa wrote: Not only mknod() but also rename()/unlink()/link()/mount(bind) etc. that may cause filename/attribute mismatching. How can the daemon know whether the request is trying to manipulate nodes in /dev directory or not? If mount --bind /dev/ /var/dir/ is used, the daemon must check filename/attribute pair when mknod(/var/dir/null) is requested because permitting the request will modify /dev state. If mount --bind /dev/ /var/dir/ is not used, the daemon must not check filename/attribute pair when mknod(/var/dir/null) is requested because permitting the request will not modify /dev state. What does the daemon do? It receives requests from the LD_PRELOAD library using UNIX domain socket and checks filename/attribute pair and issue mknodat()/renameat()/unlinkat()/linkat() etc. when the combination is appropriate? What does the LD_PRELOAD library do? It intercepts all pathname related syscalls (except open()) and solve directory component and determine whether the request is trying to manipulate nodes in /dev direcrtory and forward request to the daemon using UNIX domain socket? Make the daemon and the LD_PRELOAD library bug-and-race free and develop the MAC policy for the daemon and the LD_PRELOAD library and Make this filesystem bug-and-race free. Which one is easier? I think a good question is: What kind of idiot wrote a program that thinks it is allowed to go messing with the contents of /dev? There simply can't be a good reason for an application to do that. Device nodes should match up with devices, so as long as the device nodes exist for all your devices, then everything should just work and no one should ever have a reason to go changing things for any reason. Perhaps the real solution is a preload library that blocks the idiotic program from touching anything in /dev with anything other than open/close/read/write. Of course it could also help to simply tell people what this stupid program is actually doing and why it should be allowed to mess in places it doesn't belong. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello. Indan Zupancic wrote: That only the tiny daemon can modify /dev/ is done with MAC rules, the ones that should be the default for all applications except udev by default already. For teh kernel nothing changes. OK. You assume use of MAC with enough fine grained access control. Wrong. All nodes are created and thus there's never a need to create new nodes. So /dev/ can't be modified by anyone. This works because all nodes that anyone might want to create already exist. Already exist is not enough. These nodes have to be deletable if requested by appropriate process. These nodes have to be protected by MAC from directly calling mknod()/rename()/unlink()/link()/mount() etc. This is true on a theoretical level. But practically I think you can either run multiple daemons, one for each namespace where you want to control /dev/, If the daemon does not exist in that namespace? or if you really want one daemon you can pass the directory fd to it where the node should be created and use mknodat(). I believe that crosses namespaces correctly. The fd passed to mknodat() is used for starting from specified directory instead for current directory. The object obtained by resolving the rest pathname depends on the / of the calling process. If /var/jail/dev/dyndev/link is a symlink to /dev , a process in chroot(/var/jail/) + chdir(/) will get /var/jail/dev/node and a process not in chroot(/var/jail/) + chdir(/) will get /dev/node by resolving mknodat(fd_for_/var/jail/, dev/dyndev/link/node) . If the process is in the chroot() but the daemon is not in the chroot() , the daemon will create nodes in a wrong location. So, you let the LD_PRELOAD library to solve all directory components before passing the fd to the daemon using UNIX domain socket so that the daemon won't create nodes in a wrong location. OK. It looks like working, although I'm not taking racy condition into account. But I think that the chance that any process needs to create device nodes in a chroot is at the level of fairy existance. Not only mknod() but also rename()/unlink()/link()/mount(bind) etc. that may cause filename/attribute mismatching. How can the daemon know whether the request is trying to manipulate nodes in /dev directory or not? If mount --bind /dev/ /var/dir/ is used, the daemon must check filename/attribute pair when mknod(/var/dir/null) is requested because permitting the request will modify /dev state. If mount --bind /dev/ /var/dir/ is not used, the daemon must not check filename/attribute pair when mknod(/var/dir/null) is requested because permitting the request will not modify /dev state. What does the daemon do? It receives requests from the LD_PRELOAD library using UNIX domain socket and checks filename/attribute pair and issue mknodat()/renameat()/unlinkat()/linkat() etc. when the combination is appropriate? What does the LD_PRELOAD library do? It intercepts all pathname related syscalls (except open()) and solve directory component and determine whether the request is trying to manipulate nodes in /dev direcrtory and forward request to the daemon using UNIX domain socket? Make the daemon and the LD_PRELOAD library bug-and-race free and develop the MAC policy for the daemon and the LD_PRELOAD library and Make this filesystem bug-and-race free. Which one is easier? Regards. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
On Thu, January 10, 2008 05:57, Tetsuo Handa wrote: It seems to me that the alternatives you are proposing include modification of userland applications. But my assumption is that Don't require modification of userland applications. If you want a secure system it isn't that unreasonable to expect applications to not do brain dead things, so not requiring any modifications or config changes seems a bit optimistic to me. In other words, I want to implement without asking applications to use /dev/dynamic/ or something. This filesystem is intended to provide support for legacy applications. (In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and later.) Legacy applications should cope with a static /dev/. What is the advantage of your filesystem compared to a static /dev/? Use of a tiny daemon that communicates with udev is not sufficient. The udev is not the only application that modifies /dev files. Oh, it isn't? Which other applications do modify /dev files? I'd like to hear about a few, no matter how obscure or proprietary. And please tell how many of those will stop working with a static /dev with all nodes they might create already existing. At least, the tiny daemon should communicate with the kernel so that all requests are checked by the tiny daemon. No, why should the kernel be involved? The tiny daemon would be the only one allowed to modify /dev/, so all mknod commands will be done by it. Of course it means that you might need to modify the two or three apps wanting to create device nodes, or you can make an LD_PRELOAD lib that intercepts mknod commands and sends them to the daemon. The ammount of code will be the current parsing code + a few hundred lines of code, including the preloaded library. But use of the tiny daemon (which is a process running in userland) causes a lot of troubles. No, it doesn't, and most of those problems are true for all programs that access /dev! If those are straced or whatever they can be forced to open the wrong file, practically breaking the filename/attribute pairs. So all security you think you need to have for the daemon process is the same security you already need for all processes anyway to protect them against each other. If an administrator wants something else than 3 or 5, you're breaking something. That's the fate of white-list based access control. Does this filesystem sound too strict to support dynamic device? May be this filesystem should be able to permit creation of device nodes that are not listed in the policy file. Actually, I assumed that was the case, because if it's strictly white-list based it's almost the same as a static /dev with some nodes hidden. Without it has even less value, because it just complicates matters compared to a normal static dev. I thought it checked that if a device name was in the list, it has the correct attributes, and was free to create nodes without restricted names. From your next posting: But I think doing more is getting ridiculous, because if a process can create a device node, it can also access it and do whatever harm could be done by the confusion caused by unexpected name/attribute pairs. FYI. Being able to create a device node is different from being able to access it and do whatever harm. You will need read and/or write permission to open that device. Yes, but as the process creates the device it can also choose the file mode and probably also ownership. And as it creates a new file there likely aren't strict MAC rules in place restricting the process from reading or writing to it. So yes, you're right, but in practise it isn't as easy to close that hole, especially not if the applications isn't very clean and single purpose. If it creates the node it probably wans to use it too, and that means read/write access. Even if it can live without it, it could give access to the node to another process and let the other process do the dirty work. Very tricky. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello, On Wed, January 9, 2008 05:39, Tetsuo Handa wrote: Hello. Indan Zupancic wrote: I think you focus too much on your way of enforcing filename/attributes pairs. So? So that you miss alternatives and don't see the bigger picture. The same can be achieved by creating the device nodes with expected attributes, and preventing processes from changing those files. The device nodes have to be deletable if some process (including udev) needs to delete. Thus, you cannot unconditionally prevent processes from changing those files. This because expected combinations are known beforehand. Yes. And once those files are present, the MAC system used doesn't have to have special device nodes attributes support. Protecting those files is enough to guarantee filename/attributes pairs. If MAC system needn't to support this filesystem's functionality, who creates those files with warrantee of expected attributes? The udev does? If udev is exploited, who can guarantee? The person that would write the config file for your fs, the one who wants that guarantee. No, this is because rename permission was given for files that it shouldn't had. Do you think all MAC implementation have the same granularity and functionalities? I don't think so. Not all MAC implementation can control with such granularity. This filesystem is designed to be combined with any MAC, although the MAC used with this filesystem should be able to restrict namespace manipulation requests so that this filesystem can remain /dev and visible to userland applications. Good point, but I assume they all have at least a directory granularity, and then /dev/ can be static and udev and other can have free reign in e.g. /dev/dynamic/. Just use subdirs for the dynamic stuff and this granularity problem is, with slight inconvenience, solved. Either you want a process to manage device names and attributes, and then you give it permission to do that, or you want to enforce certain filename/attribute pairs and then you just do it yourself. If I modify udev to enforce certain filename/attribute pairs and the modified udev was exploited, who can guarantee? Don't trust userland application is the basis of restricting access in kernel space. If you can trust userland application, you don't need in-kernel access control. Funny, I thought that it was in the kernel because that's the way to protect processes against eachother, the fs against processes, and for performance reasons. Exploits are in code, and where that code is doesn't matter that much, either kernel or userspace, though if it's exploitable you'll rather not have it in the kernel. So I think it's more secure if the checking would be done by udev than in a special filesystem, even if that means that you're screwed if udev is exploited. Of course you fully trust your own code, naturally. A tiny daemon that communicates with udev and does the checking you have now, and if ok it creates the node is really not much more code than your fs, so as hard to exploit too. Then if udev is hacked you have the same guarantee as you have now. I can think of more alternatives that are as secure or more secure than the current solution. Will your filesystem prevent the trivial case of rm /dev/hda1 ln -s /dev/hda2 /dev/hda1 Of course. To permit the above operation, the following permissions are needed. hda1660 0 6 2 b 3 1 hda1777 0 0 33 l . Yes, I should've read the code before asking that, instead of the other way round. Rename permission can be given for /dev in general, but prohibited for certain files in /dev, the ones you want to have specific attributes. It isn't all or nothing. Do you think all MAC implementation can prohibit renaming for certain files in /dev ? It's forbid modifying certain nodes that process needn't to modify versus forbid breaking filename/attribute pairs of certain nodes. Both have the same effect, except that the first one is generic and can be done by existing MAC systems, while the second one needs a special filesystem and a handful of MAC rules to make it effective. Do you think all MAC implementation can do? I think the first one is implementation specific and the second one is generic. Protecting certain files from being modified seems to me more generic than enforcing filename/attributes pairs on device nodes. And if they can't do it surely they can do it per directory, and the using subdirs solves it. It doesn't matter where they are, it's that a different fs than yours could be mounted over it. You say a MAC can prevent that from happening, but a MAC can also prevent all processes except for udev from modifying /dev. But MAC cannot prevent udev from modifying /dev . And what if exploited? Not all MAC can enforce access control over all processes with the granularity you are talking. And what if a process that cannot be
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Quoting Indan Zupancic ([EMAIL PROTECTED]): Hello, On Wed, January 9, 2008 05:39, Tetsuo Handa wrote: Hello. Indan Zupancic wrote: I think you focus too much on your way of enforcing filename/attributes pairs. So? So that you miss alternatives and don't see the bigger picture. These emails again are getting really long, but I think the gist of Indan's suggestion can be concisely summarized: To confine process P3 to /dev/hda2 being 'b 3 2', create /dev/p3, launch P3 in a new mounts namespace, mount --bind /dev/p3 /dev, exec what you want p3 running, and have MAC prevent umount /dev/p3. This is a neat idea, but Tetsuo's rebutall is P3 may be legacy code needing to create or delete /dev/floppy, where -EPERM confuses P3 and prevents it working correctly. Indan's idea is interesting and I like it, but is there an answer to Tetsuo's problem with it? thanks, -serge PS - Indan, you also said in essence if P3 can be trusted to create /dev/floppy why can't it be trusted to create /dev/hda1. I trust that, phrased that way, the question answers itself? - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
On Thu, January 10, 2008 00:08, Serge E. Hallyn wrote: These emails again are getting really long, but I think the gist of Indan's suggestion can be concisely summarized: No worry, I wasn't planning on extending it, I've said what I've to say. Except... To confine process P3 to /dev/hda2 being 'b 3 2', create /dev/p3, launch P3 in a new mounts namespace, mount --bind /dev/p3 /dev, exec what you want p3 running, and have MAC prevent umount /dev/p3. This is a neat idea, but Tetsuo's rebutall is P3 may be legacy code needing to create or delete /dev/floppy, where -EPERM confuses P3 and prevents it working correctly. Indan's idea is interesting and I like it, but is there an answer to Tetsuo's problem with it? ...that I didn't mean that, but a more simple /dev/ directory protected from any modifications by MAC, /dev/* all the nodes that need to have guaranteed name/attribute pairs, like /dev/null, /dev/zero, /dev/random, etc. and: /dev/dynamic/ being a dir where apps who really need to create/modify device nodes can do whatever they want to do. It can be multiple dirs too, like /dev/snd/, /dev/input/ etc. I guess this covers about 96% of the usecases of this tamper-proof dev fs. You can think of unlikely cases that aren't solved by this, but those can be solved in another way if really wanted (like a checking daemon, modified udev, shadow /dev/, to name a few). But I think doing more is getting ridiculous, because if a process can create a device node, it can also access it and do whatever harm could be done by the confusion caused by unexpected name/attribute pairs. As for information snooping, that's mostly about /dev/null or other things that are known beforehand. PS - Indan, you also said in essence if P3 can be trusted to create /dev/floppy why can't it be trusted to create /dev/hda1. I trust that, phrased that way, the question answers itself? Not exactly. If there's a process that dynamically created certain device nodes, and it wants to create one that doesn't fit the rules, you can't know if it's wrong or if your rules are wrong. The process has a certain policy of naming/creating the devices, but you also have a policy at the kernel side with this fs. If it mismatches you don't know which one is right. If you trust a process to create /dev/hd*, you can also trust it to create the proper /dev/hdXn, no need to verify if /dev/hda1 is really 3 1. The whole thing about filename/attribute pairs is that it's about what applications expect. There aren't many expectations about dynamically created device nodes which might not always be there, because their name isn't stable. The use case for this fs is a malicious app that can create device nodes, and we're worried about mismatching name/attribute pairs. Not about our data, or anything else. Call me an optimist, but I think you don't need to worry about name/attribute pairs. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello. Indan Zupancic wrote: Good point, but I assume they all have at least a directory granularity, and then /dev/ can be static and udev and other can have free reign in e.g. /dev/dynamic/. Just use subdirs for the dynamic stuff and this granularity problem is, with slight inconvenience, solved. It seems to me that the alternatives you are proposing include modification of userland applications. But my assumption is that Don't require modification of userland applications. In other words, I want to implement without asking applications to use /dev/dynamic/ or something. This filesystem is intended to provide support for legacy applications. (In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and later.) Exploits are in code, and where that code is doesn't matter that much, either kernel or userspace, though if it's exploitable you'll rather not have it in the kernel. So I think it's more secure if the checking would be done by udev than in a special filesystem, even if that means that you're screwed if udev is exploited. Of course you fully trust your own code, naturally. I'm keeping the mechanism as simple as possible so that there is unlikely room (e.g. buffer overflow) for running exploits. A tiny daemon that communicates with udev and does the checking you have now, and if ok it creates the node is really not much more code than your fs, so as hard to exploit too. Then if udev is hacked you have the same guarantee as you have now. Use of a tiny daemon that communicates with udev is not sufficient. The udev is not the only application that modifies /dev files. At least, the tiny daemon should communicate with the kernel so that all requests are checked by the tiny daemon. But use of the tiny daemon (which is a process running in userland) causes a lot of troubles. See the block after the -- boundary -- of this posting. My assumption is that Don't require userland process's assistance, as written at Why not use FUSE?. Protecting certain files from being modified seems to me more generic than enforcing filename/attributes pairs on device nodes. OK. You are saying that from the point of view of what it can. I thought you were saying enforcing filename/attributes pairs from out-of-this-filesystem (e.g. MAC) is more flexible than this-filesystem. rm -f /dev/either-null-or-zero as said before, if this is possible then the MAC config used is wrong. Exactly the same as for your filesystem with mknod /dev/tmp1 c 1 X mount --bind /dev/tmp1 /dev/either-null-or-zero and you count on the MAC to prevent that. An administrator asks MAC to prevent processes (except specific processes who need to do rm -f /dev/either-null-or-zero) from doing rm -f /dev/either-null-or-zero. An administrator asks this filesystem to prevent processes from doing mknod /dev/tmp1 c 1 X. An administrator asks MAC to prevent processes from doing mount --bind /dev/tmp1 /dev/either-null-or-zero. And as for that app, if you trust it to create device nodes, why don't you trust it to make the right nodes too? If that app has a bug that triggers mknod /dev/either-null-or-zero 1$REPLY instead of mknod /dev/either-null-or-zero $REPLY under an unexpected circumstance, it will create unwanted nodes. Thus I don't trust the app. If an administrator wants something else than 3 or 5, you're breaking something. That's the fate of white-list based access control. Does this filesystem sound too strict to support dynamic device? May be this filesystem should be able to permit creation of device nodes that are not listed in the policy file. Can SELinux guarantee the same result as my filesystem even if udev or administrative programs have to be able to modify /dev ? More, because your filesystem doesn't guarantee anything at all on its own. But assuming the MAC is decent enough to protect your fs from being bypassed, I'm sure it can do what's needed fine without your fs. I can't answer for SELinux because I don't know it well. But I trust it can protect files and/or directories, and that's all that's needed to achieve the same end result. I don't know SELinux well, but as far as seeing an example (found by Googling selinux allow mknod) allow udev_t self:capability { chown dac_override dac_read_search fowner fsetid sys_admin sys_nice mknod net_raw net_admin sys_rawio }; I can't find a place to specify filename/attributes pairs in this syntax. So, if the process who is permitted to create device nodes misbehaves, it will generate unexpected filename/attribute pairs. I think SELinux can't guarantee the same result as my filesystem. You seem to assume that the in-kernel implementation is suddenly guaranteed bugfree. I keep the implementation as simple as possible. From your next posting: But I think doing more is getting ridiculous, because if a process can create a device node, it can also access it and do whatever harm could
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello. Indan Zupancic wrote: I want to use this filesystem in case where a process with root privilege was hijacked but the behavior of the hijacked process is still restricted by MAC. 1) If the behaviour can be controlled, why can't the process be disallowed to change anything badly in /dev? Like disallowing anything from modifying existing nodes that weren't created by that process. That would have practically the same effect as your filesystem, won't it? MAC system can prevent hijacked processes from changing anything badly in /dev . But MAC system can't prevent hijacked processes from doing mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2 if permissions to rename device nodes in /dev are given to hijacked processes. This is because MAC implementation doesn't check filename/attribute pairs. But this filesystem can prevent hijacked processes from doing mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2 even if permissions to rename device nodes in /dev are given to hijacked processes. This filesystem is not designed to forbid modifying nodes if that process needn't to modify nodes. This filesystem is designed to forbid breaking filename/attribute pairs of nodes even if that process need to (or permitted to) modify nodes. Or phrased differently, if the MAC system used can't protect /dev, it won't be able to protect other directories either, and if it can't protect e.g. my homedir, doesn't it make the whole MAC system ineffective? And if the MAC system used is ineffective, your filesystem is useless and you've bigger problems to fix. You can use nodev mount option to prevent attackers from opening device files. You can use MAC system to prevent attackers from mounting partitions (other than /dev partition) without nodev option. 2) The MAC system may not be able to guarantee certain combinations of device names and properties, but isn't that policy that shouldn't be in the kernel anyway? But if it is, shouldn't all device nodes be checked? That is, shouldn't it be a global check instead of a filesystem specific one? I think the reason why MAC system doesn't handle filename/attributes pairs is that: Filename and its attributes pairs are conventionally considered as constant and reliable. It makes the MAC's policy syntax complicated to describe this attribute enforcement information in MAC's policy. Thus, this should be a global check. But usually device nodes are only in /dev . 3) Code efficiency. Thousand lines of code just to close one very specific attack, which can be done in lots of different other ways that all need to be prevented by the MAC system. (mounting over it, intercepting open calls, duping the fd, etc.) Is it worth it? This filesystem is doing what MAC system is not doing. So, please don't complain about inability of this filesystem to close all attacks. You can use MAC system to prevent attackers from mounting other filesystem over this filesystem. The filename/attribute pairs are something like system call entry tables. The application will go wrong if __NR_read is mapped to sys_write() and __NR_write is mapped to sys_read(). Userland applications access special functionalities (e.g. /dev/zero and /dev/random) by name (i.e. syscall numbers). Therefore, keeping the filename/attribute pairs tamper-proof is important. You recognize that there is a threat that device nodes may have irregular attribute (e.g. /dev/null existing as a regular file), do you? You don't deny implementing mechanisms somehow to avoid such threat, do you? OK. Then the matter is the comparison of code efficiency. This patch is less than 1100 lines in total. Large part of this patch is for parsing and managing policy file. If you try to extend every MAC implementation (SELinux, SMACK, AppArmor, TOMOYO) so that they can handle filename/attributes pairs (i.e. expand policy file's syntax and both in-kernel and userland data structures, manage strings with variant length and non-printable characters etc.), I think that modification exceeds this patch. I think guaranteeing filename/attribute pairs in filesystem layer can keep MAC system implementation simple and compact. http://www.mail-archive.com/linux-fsdevel@vger.kernel.org/msg10653.html Thank you. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello. [EMAIL PROTECTED] wrote: Ouch. The .c files should generally be built into their own .o files and then the Makefile should do something like obj-$(CONFIG_SYAORAN) += syaoran.o unless there's *really* good reasons for including .c files (such as an otherwise-messy variable-namespace issue or similar). Yes. The final implementation will become so. This is a temporal hack to keep all functions and variables static. Also, has this been double-checked to Do The Right Thing if you have *two* instances of ramfs mounted, one with Syaoran and one without? Yes. The memory for superblock is allocated for each instance. Thus, mounting one as syaoran and the other as tmpfs won't cause problems. (incidentally, all of these should probably be abstracted into a helper function that's 'static inline' so we have just one #ifdef in the definition in a .h file, and none in open .c code). Oh, good idea. Similarly for other places you have #ifdef CONFIG_ in ramfs .c code - see if you can abstract it out. This patch replaces the previous patch and this patch modifies only tmpfs (fs/shm*) files. I'm no longer modifying ramfs (fs/ramfs/*) files. +/* + * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field. + * Now I'm setting the field to share tmpfs/rootfs/syaoran code. Question for the audience: *should* ramfs set that field so setattr works on ramfs (even if it's just a stub similar to the SELinux fscontext= mount stuff)? Question for Tetsuo: What happens to this code if somebody actually does the above change? Please forget this question. I'm no longer setting ramfs_dir_inode_operations.setattr field. + Applications using well-known device locations under /dev + get the device they want (e.g. an application that accesses + /dev/null can always get a character special device + with major=1 and minor=3). This should say will always get, not can always, as this code will mandate, rather than just make possible. OK. + The list of possible combinations of filename and its attributes + that can exist on this filesystem is defined at mount time + using a configuration file. The format of this file needs to be documented. Yes. It is a line-by-line processable format defined as: filename permission owner group flags type [ symlink_data | major minor ] where flags are bit-wised combinations of * 1: Allow creation of the file. * 2: Allow deletion of the file. * 4: Allow changing permissions of the file. * 8: Allow changing owner or group of the file. * 16: For internal use. Remembers whether this file is opened or not. * 32: Don't create this file at mount time. and here are some example entries: pts 755 0 0 0 d shm 755 0 0 0 d fd 777 0 0 0 l /proc/self/fd stdin 777 0 0 0 l /proc/self/fd/0 stdout 777 0 0 0 l /proc/self/fd/1 stderr 777 0 0 0 l /proc/self/fd/2 null666 0 0 0 c 1 3 zero666 0 0 0 c 1 5 random 644 0 0 0 c 1 8 urandom 644 0 0 0 c 1 9 tty 666 0 0 0 c 5 0 tty0600 0 0 12 c 4 0 cdrom 777 0 0 3 l /dev/scd0 console 600 0 0 1 c 5 1 hda 660 0 6 0 b 3 0 hda1660 0 6 0 b 3 1 initctl 600 0 0 3 p log 666 0 0 15 s rtc 644 0 0 0 c 10 135 ptmx666 0 0 0 c 5 2 ram 777 0 0 3 l /dev/ram0 ram0660 0 6 0 b 1 0 ram1660 0 6 0 b 1 1 sda 660 0 6 0 b 8 0 initrd 660 0 6 1 b 1 250 Full documentation of this filesystem is at http://tomoyo.sourceforge.jp/en/1.5.x/policy-syaoran.html I'm not terribly thrilled by the idea of passing a file to be read by the kernel, but I also understand that if it isn't done before mount, you have a race condition betweet the mount and the load. What race condition is possible? Are you worrying that the file gets modified while reading? Perhaps write some configfs code so that you can 'mount /configfs; cat config.file /configfs/syaoran; mount -t syaoran? If you worry that the file gets modified while reading in kernel space, you will also worry that the file gets modified while doing cat config.file /configfs/syaoran. To use configfs (or whatever approach that is done before mount syscall), some tag for
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hi Tetsuo, I think you focus too much on your way of enforcing filename/attributes pairs. The same can be achieved by creating the device nodes with expected attributes, and preventing processes from changing those files. This because expected combinations are known beforehand. And once those files are present, the MAC system used doesn't have to have special device nodes attributes support. Protecting those files is enough to guarantee filename/attributes pairs. On Tue, January 8, 2008 14:50, Tetsuo Handa wrote: Hello. Indan Zupancic wrote: I want to use this filesystem in case where a process with root privilege was hijacked but the behavior of the hijacked process is still restricted by MAC. 1) If the behaviour can be controlled, why can't the process be disallowed to change anything badly in /dev? Like disallowing anything from modifying existing nodes that weren't created by that process. That would have practically the same effect as your filesystem, won't it? MAC system can prevent hijacked processes from changing anything badly in /dev . But MAC system can't prevent hijacked processes from doing mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2 if permissions to rename device nodes in /dev are given to hijacked processes. This is because MAC implementation doesn't check filename/attribute pairs. No, this is because rename permission was given for files that it shouldn't had. Either you want a process to manage device names and attributes, and then you give it permission to do that, or you want to enforce certain filename/attribute pairs and then you just do it yourself. Will your filesystem prevent the trivial case of rm /dev/hda1 ln -s /dev/hda2 /dev/hda1 But this filesystem can prevent hijacked processes from doing mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2 even if permissions to rename device nodes in /dev are given to hijacked processes. Rename permission can be given for /dev in general, but prohibited for certain files in /dev, the ones you want to have specific attributes. It isn't all or nothing. This filesystem is not designed to forbid modifying nodes if that process needn't to modify nodes. This filesystem is designed to forbid breaking filename/attribute pairs of nodes even if that process need to (or permitted to) modify nodes. It's forbid modifying certain nodes that process needn't to modify versus forbid breaking filename/attribute pairs of certain nodes. Both have the same effect, except that the first one is generic and can be done by existing MAC systems, while the second one needs a special filesystem and a handful of MAC rules to make it effective. 2) The MAC system may not be able to guarantee certain combinations of device names and properties, but isn't that policy that shouldn't be in the kernel anyway? But if it is, shouldn't all device nodes be checked? That is, shouldn't it be a global check instead of a filesystem specific one? I think the reason why MAC system doesn't handle filename/attributes pairs is that: Filename and its attributes pairs are conventionally considered as constant and reliable. It makes the MAC's policy syntax complicated to describe this attribute enforcement information in MAC's policy. Thus, this should be a global check. But usually device nodes are only in /dev . It doesn't matter where they are, it's that a different fs than yours could be mounted over it. You say a MAC can prevent that from happening, but a MAC can also prevent all processes except for udev from modifying /dev. Done globally instead of as a filesystem it can actually guarantee name/attr pairs, now it can't even do that on its own. 3) Code efficiency. Thousand lines of code just to close one very specific attack, which can be done in lots of different other ways that all need to be prevented by the MAC system. (mounting over it, intercepting open calls, duping the fd, etc.) Is it worth it? This filesystem is doing what MAC system is not doing. So, please don't complain about inability of this filesystem to close all attacks. I don't. What I complain about is that it's too specific and does it one chosen job badly. It lacks abstraction. As far as I can see any decent MAC can achieve the same end result as your filesystem, without directly enforcing name/attr pairs. You can use MAC system to prevent attackers from mounting other filesystem over this filesystem. The filename/attribute pairs are something like system call entry tables. The application will go wrong if __NR_read is mapped to sys_write() and __NR_write is mapped to sys_read(). Userland applications access special functionalities (e.g. /dev/zero and /dev/random) by name (i.e. syscall numbers). Therefore, keeping the filename/attribute pairs tamper-proof is important. You recognize that there is a threat that device
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello. Indan Zupancic wrote: I think you focus too much on your way of enforcing filename/attributes pairs. So? The same can be achieved by creating the device nodes with expected attributes, and preventing processes from changing those files. The device nodes have to be deletable if some process (including udev) needs to delete. Thus, you cannot unconditionally prevent processes from changing those files. This because expected combinations are known beforehand. Yes. And once those files are present, the MAC system used doesn't have to have special device nodes attributes support. Protecting those files is enough to guarantee filename/attributes pairs. If MAC system needn't to support this filesystem's functionality, who creates those files with warrantee of expected attributes? The udev does? If udev is exploited, who can guarantee? No, this is because rename permission was given for files that it shouldn't had. Do you think all MAC implementation have the same granularity and functionalities? I don't think so. Not all MAC implementation can control with such granularity. This filesystem is designed to be combined with any MAC, although the MAC used with this filesystem should be able to restrict namespace manipulation requests so that this filesystem can remain /dev and visible to userland applications. Either you want a process to manage device names and attributes, and then you give it permission to do that, or you want to enforce certain filename/attribute pairs and then you just do it yourself. If I modify udev to enforce certain filename/attribute pairs and the modified udev was exploited, who can guarantee? Don't trust userland application is the basis of restricting access in kernel space. If you can trust userland application, you don't need in-kernel access control. Will your filesystem prevent the trivial case of rm /dev/hda1 ln -s /dev/hda2 /dev/hda1 Of course. To permit the above operation, the following permissions are needed. hda1660 0 6 2 b 3 1 hda1777 0 0 33 l . Rename permission can be given for /dev in general, but prohibited for certain files in /dev, the ones you want to have specific attributes. It isn't all or nothing. Do you think all MAC implementation can prohibit renaming for certain files in /dev ? It's forbid modifying certain nodes that process needn't to modify versus forbid breaking filename/attribute pairs of certain nodes. Both have the same effect, except that the first one is generic and can be done by existing MAC systems, while the second one needs a special filesystem and a handful of MAC rules to make it effective. Do you think all MAC implementation can do? I think the first one is implementation specific and the second one is generic. It doesn't matter where they are, it's that a different fs than yours could be mounted over it. You say a MAC can prevent that from happening, but a MAC can also prevent all processes except for udev from modifying /dev. But MAC cannot prevent udev from modifying /dev . And what if exploited? Not all MAC can enforce access control over all processes with the granularity you are talking. And what if a process that cannot be controlled with your boolean level granularity exists (e.g. an administrator running his/her administrative applications that require modification of /dev )? A crazy example of administrative applications: (Please don't say Don't use such crazy application.) #! /bin/sh rm -f /dev/either-null-or-zero read mknod /dev/either-null-or-zero c 1 $REPLY echo Administrative task finished successfully. | mail root This filesystem can guarantee /dev/either-null-or-zero is either char-1-3 or char-1-5 by using a policy either-null-or-zero666 0 0 3 c 1 3 either-null-or-zero666 0 0 35 c 1 5 The boolean level granularity (e.g. forbid all processes except for udev , and modify udev to perform name/attribute pair enforcement) is not generic. Userland application sometimes misbehaves. I assume kernel process doesn't misbehave. If you doubt my assumption, you have to doubt in-kernel MAC implementation too. I don't. What I complain about is that it's too specific and does it one chosen job badly. It lacks abstraction. As far as I can see any decent MAC can achieve the same end result as your filesystem, without directly enforcing name/attr pairs. Can SELinux guarantee the same result as my filesystem even if udev or administrative programs have to be able to modify /dev ? The thing is, all special device nodes that are expected to exist by applications are known beforehand. Yes. Thus they can be created statically and can be protected against any modifications with any MAC system. But sometimes some modifications needs to be permitted. Who can guarantee that there is no application (other than udev)
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello. [EMAIL PROTECTED] wrote: Good summary - probably should add that to the patch, drop it into Documentation/syaoran-config.txt or similar... I see. Modification while reading *is* an issue, but can probably be worked around with some clever locking. The race condition I was thinking of was if you had the mount and the policy load be 2 separate events, you could see: (a) issue mount request (b) do something malicious in /dev while.. (c) load the policy that would have prevented (b). This is partly why SELinux has init load the policy *very* early on, before any other userspace have had a chance to run and do things that would have been prevented by policy. So, you suggested to load policy before mount() request so that this filesystem can prevent attackers from doing something malicious by minimizing (i.e. implement as non-blocking operation) the latency between the userland process's call of mount() and the nodes become visible to userland process. I didn't take such cases into account. My assumed usage of this filesystem is that run a script with #!/bin/sh mount -t syaoran -o accept=/etc/ccs/syaoran.conf none /dev exec /sbin/init $@ by passing init=/path/to/this/script to the kernel command line so that /sbin/init can create /dev/initlog on this filesystem. If you mount this filesystem after /sbin/init starts, it will shadow /dev/initctl opened by /sbin/init . Which basically ends up meaning that anybody who can trick the mount into happening can reset the permitted list and create (for example) a mode 666 entry for a hard drive, and go scribbling around at will. Note that you don't seem to do any sanity checking on the path (for instance, that each component is owned by root, and not world-writable) - so anybody who finds a way to get the mount to happen can supply their own list in /home/joeuser/blat or /tmp/surprise-mount-list or wherever. I assume that being able to reach this location means the caller of mount() is root. But, the patches to allow mount() by non-root is in progress? http://lkml.org/lkml/2008/1/8/131 May be I should add some sanity checking on the path. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello. Changes from previous posting: (1) I rebased this patch using tmpfs. I didn't know I was making this patch using ramfs... This patch is for 2.6.24-rc6-mm1. Regards. -- Subject: Simple tamper-proof device filesystem. The goal of this filesystem is to guarantee that applications using well-known device locations under /dev get the device they want (e.g. an application that accesses /dev/null can always get a character special device with major=1 and minor=3). This idea sounds silly? Indeed, if you think the root can do whatever he/she wants do do. But this filesystem makes sense when used with access control mechanisms like MAC (mandatory access control). I want to use this filesystem in case where a process with root privilege was hijacked but the behavior of the hijacked process is still restricted by MAC. Why not use FUSE? Because /dev has to be available through the lifetime of the kernel. It is not acceptable if /dev stops working due to SIGKILL or OOM-killer. Why not use SELinux? Because SELinux doesn't guarantee filename and its attribute. As far as I know, no MAC implementation can handle filename and its attribute. I guess this is because Filename and its attributes pairs are conventionally considered as constant and reliable. It makes the MAC's policy syntax complicated to describe this attribute enforcement information in MAC's policy. I want to add functionality that the MACs are missing. Instead of adding this functionality per MAC, I propose to add it as ground work, to be combined with any MAC. Why not drop CAP_MKNOD? Dropping CAP_MKNOD is not enough for emulating this filesystem because a process can still rename()/unlink() to break filename and its attributes handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1; mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ). This time, I'm implementing this filesystem as an extension to tmpfs because what this filesystem does are nothing but check filename and its attributes in addition to what tmpfs does. Signed-off-by: Tetsuo Handa [EMAIL PROTECTED] --- fs/Kconfig | 18 + include/linux/shmem_fs.h |5 mm/shmem.c | 124 +++ mm/shmem_mac.h | 57 + mm/shmem_mac_debug.c | 183 + mm/shmem_mac_init.c | 486 +++ mm/shmem_mac_main.c | 205 +++ 7 files changed, 1077 insertions(+), 1 deletion(-) --- linux-2.6-mm.orig/mm/shmem.c +++ linux-2.6-mm/mm/shmem.c @@ -736,11 +736,39 @@ static void shmem_truncate(struct inode shmem_truncate_range(inode, inode-i_size, (loff_t)-1); } +#ifdef CONFIG_SYAORAN +#include shmem_mac.h +#include shmem_mac_init.c +#include shmem_mac_main.c +#include shmem_mac_debug.c + +static bool with_mac(struct super_block *sb) +{ + return sb-s_type == syaoran_fs_type; +} +#else +static inline bool with_mac(struct super_block *sb) +{ + return 0; +} +#endif + static int shmem_notify_change(struct dentry *dentry, struct iattr *attr) { struct inode *inode = dentry-d_inode; struct page *page = NULL; int error; +#ifdef CONFIG_SYAORAN + if (with_mac(inode-i_sb)) { + unsigned int flags = 0; + if (attr-ia_valid (ATTR_UID | ATTR_GID)) + flags |= MAY_CHOWN; + if (attr-ia_valid ATTR_MODE) + flags |= MAY_CHMOD; + if (syaoran_may_modify_node(dentry, flags)) + return -EPERM; + } +#endif if (S_ISREG(inode-i_mode) (attr-ia_valid ATTR_SIZE)) { if (attr-ia_size inode-i_size) { @@ -1515,6 +1543,10 @@ shmem_get_inode(struct super_block *sb, default: inode-i_op = shmem_special_inode_operations; init_special_inode(inode, mode, dev); +#ifdef CONFIG_SYAORAN + if (with_mac(sb)) + init_syaoran_inode(inode, mode); +#endif break; case S_IFREG: inode-i_op = shmem_inode_operations; @@ -1739,8 +1771,15 @@ static int shmem_statfs(struct dentry *d static int shmem_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) { - struct inode *inode = shmem_get_inode(dir-i_sb, mode, dev); + struct inode *inode; int error = -ENOSPC; +#ifdef CONFIG_SYAORAN + if (with_mac(dir-i_sb)) { + if (syaoran_may_create_node(dentry, mode, dev) 0) + return -EPERM; + } +#endif + inode = shmem_get_inode(dir-i_sb, mode, dev); if (inode) { error = security_inode_init_security(inode, dir, NULL, NULL, @@ -1792,6 +1831,13 @@ static int shmem_link(struct dentry *old { struct inode *inode = old_dentry-d_inode; int ret; +#ifdef
[PATCH][RFC] Simple tamper-proof device filesystem.
Hello. Changes from previous posting: (1) Added kernel config so that users can choose whether to compile this filesystem or not. I didn't receive any ACK/NACK regarding whether I'm permitted to implement this filesystem as an extension to tmpfs or not. So, I continued implementing this filesystem as an extension to tmpfs. (2) Removed indirect grabbing of blkdev_open() and chrdev_open(). The previous posting was using indirect approach to call blkdev_open() and chrdev_open() so that users can compile this filesystem as a module without exporting blkdev_open() from fs/block_dev.c and chrdev_open() from fs/char_dev.c . But since tmpfs cannot be compiled as a module, I changed it to direct accessing. (3) Splitted single file into three files. syaoran_init.c: initialization part syaoran_main.c: access control part syaoran_debug.c: taking snapshot part This patch is for 2.6.24-rc6-mm1. Regards. -- Subject: Simple tamper-proof device filesystem. The goal of this filesystem is to guarantee that applications using well-known device locations under /dev get the device they want (e.g. an application that accesses /dev/null can always get a character special device with major=1 and minor=3). This idea sounds silly? Indeed, if you think the root can do whatever he/she wants do do. But this filesystem makes sense when used with access control mechanisms like MAC (mandatory access control). I want to use this filesystem in case where a process with root privilege was hijacked but the behavior of the hijacked process is still restricted by MAC. Why not use FUSE? Because /dev has to be available through the lifetime of the kernel. It is not acceptable if /dev stops working due to SIGKILL or OOM-killer. Why not use SELinux? Because SELinux doesn't guarantee filename and its attribute. As far as I know, no MAC implementation can handle filename and its attribute. I guess this is because Filename and its attributes pairs are conventionally considered as constant and reliable. It makes the MAC's policy syntax complicated to describe this attribute enforcement information in MAC's policy. I want to add functionality that the MACs are missing. Instead of adding this functionality per MAC, I propose to add it as ground work, to be combined with any MAC. Why not drop CAP_MKNOD? Dropping CAP_MKNOD is not enough for emulating this filesystem because a process can still rename()/unlink() to break filename and its attributes handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1; mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ). This time, I'm implementing this filesystem as an extension to tmpfs because what this filesystem does are nothing but check filename and its attributes in addition to what tmpfs does. Signed-off-by: Tetsuo Handa [EMAIL PROTECTED] --- fs/Kconfig | 18 + fs/ramfs/inode.c | 177 ++ fs/ramfs/syaoran.h | 75 ++ fs/ramfs/syaoran_debug.c | 183 +++ fs/ramfs/syaoran_init.c | 568 +++ fs/ramfs/syaoran_main.c | 207 + 6 files changed, 1222 insertions(+), 6 deletions(-) --- linux-2.6-mm.orig/fs/ramfs/inode.c +++ linux-2.6-mm/fs/ramfs/inode.c @@ -36,6 +36,20 @@ #include asm/uaccess.h #include internal.h +static struct inode *__ramfs_get_inode(struct super_block *sb, int mode, + dev_t dev, bool tmpfs_with_mac); + +#define TMPFS_WITH_MAC1 +#define TMPFS_WITHOUT_MAC 0 +#include linux/quotaops.h + +#ifdef CONFIG_SYAORAN +#include syaoran.h +#include syaoran_init.c +#include syaoran_main.c +#include syaoran_debug.c +#endif + /* some random number */ #define RAMFS_MAGIC0x858458f6 @@ -51,6 +65,12 @@ static struct backing_dev_info ramfs_bac struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev) { + return __ramfs_get_inode(sb, mode, dev, TMPFS_WITHOUT_MAC); +} + +static struct inode *__ramfs_get_inode(struct super_block *sb, int mode, + dev_t dev, const bool tmpfs_with_mac) +{ struct inode * inode = new_inode(sb); if (inode) { @@ -65,10 +85,18 @@ struct inode *ramfs_get_inode(struct sup switch (mode S_IFMT) { default: init_special_inode(inode, mode, dev); +#ifdef CONFIG_SYAORAN + if (tmpfs_with_mac) + init_syaoran_inode(inode, mode); +#endif break; case S_IFREG: inode-i_op = ramfs_file_inode_operations; inode-i_fop = ramfs_file_operations; +#ifdef CONFIG_SYAORAN + if (tmpfs_with_mac) + init_syaoran_inode(inode, mode); +#endif break;
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello, On Sun, Jan 06, 2008 at 03:20:00PM +0900, Tetsuo Handa wrote: Hello. Changes from previous posting: (1) Added kernel config so that users can choose whether to compile this filesystem or not. I didn't receive any ACK/NACK regarding whether I'm permitted to implement this filesystem as an extension to tmpfs or not. So, I continued implementing this filesystem as an extension to tmpfs. (2) Removed indirect grabbing of blkdev_open() and chrdev_open(). The previous posting was using indirect approach to call blkdev_open() and chrdev_open() so that users can compile this filesystem as a module without exporting blkdev_open() from fs/block_dev.c and chrdev_open() from fs/char_dev.c . But since tmpfs cannot be compiled as a module, I changed it to direct accessing. (3) Splitted single file into three files. syaoran_init.c: initialization part syaoran_main.c: access control part syaoran_debug.c: taking snapshot part This patch is for 2.6.24-rc6-mm1. Regards. -- Subject: Simple tamper-proof device filesystem. The goal of this filesystem is to guarantee that applications using well-known device locations under /dev get the device they want (e.g. an application that accesses /dev/null can always get a character special device with major=1 and minor=3). This idea sounds silly? Indeed, if you think the root can do whatever he/she wants do do. But this filesystem makes sense when used with access control mechanisms like MAC (mandatory access control). I want to use this filesystem in case where a process with root privilege was hijacked but the behavior of the hijacked process is still restricted by MAC. Why not use FUSE? Because /dev has to be available through the lifetime of the kernel. It is not acceptable if /dev stops working due to SIGKILL or OOM-killer. Why not use SELinux? Because SELinux doesn't guarantee filename and its attribute. As far as I know, no MAC implementation can handle filename and its attribute. I guess this is because Filename and its attributes pairs are conventionally considered as constant and reliable. It makes the MAC's policy syntax complicated to describe this attribute enforcement information in MAC's policy. I want to add functionality that the MACs are missing. Instead of adding this functionality per MAC, I propose to add it as ground work, to be combined with any MAC. Why not drop CAP_MKNOD? Dropping CAP_MKNOD is not enough for emulating this filesystem because a process can still rename()/unlink() to break filename and its attributes handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1; mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ). This time, I'm implementing this filesystem as an extension to tmpfs because what this filesystem does are nothing but check filename and its attributes in addition to what tmpfs does. Signed-off-by: Tetsuo Handa [EMAIL PROTECTED] --- fs/Kconfig | 18 + fs/ramfs/inode.c | 177 ++ fs/ramfs/syaoran.h | 75 ++ fs/ramfs/syaoran_debug.c | 183 +++ fs/ramfs/syaoran_init.c | 568 +++ fs/ramfs/syaoran_main.c | 207 + 6 files changed, 1222 insertions(+), 6 deletions(-) Your patch is very confusing. In your description, as well as in the comments you talk about tmpfs, but your patch does not touch even one line of tmpfs and only changes ramfs. Even your variables and arguments refer to tmpfs. The Kconfig entry indicates that the feature depends on TMPFS too. Judging from the following comment : * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field. I suspect that you confuse both filesystems. - ramfs is in fs/ramfs and is always compiled in, you cannot disable it - tmpfs is in mm/shmem.c and is optional. It also supports options that ramfs does not (eg: size) and data may be swapped. Please understand that I'm not discussing the usefulness of your patch, I'm just trying to avoid a huge confusion. Regards, Willy --- linux-2.6-mm.orig/fs/ramfs/inode.c +++ linux-2.6-mm/fs/ramfs/inode.c @@ -36,6 +36,20 @@ #include asm/uaccess.h #include internal.h +static struct inode *__ramfs_get_inode(struct super_block *sb, int mode, +dev_t dev, bool tmpfs_with_mac); + +#define TMPFS_WITH_MAC1 +#define TMPFS_WITHOUT_MAC 0 +#include linux/quotaops.h + +#ifdef CONFIG_SYAORAN +#include syaoran.h +#include syaoran_init.c +#include syaoran_main.c +#include syaoran_debug.c +#endif + /* some random number */ #define RAMFS_MAGIC 0x858458f6 @@ -51,6 +65,12 @@ static struct backing_dev_info ramfs_bac struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev) { + return
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Hello. Willy Tarreau wrote: Your patch is very confusing. In your description, as well as in the comments you talk about tmpfs, but your patch does not touch even one line of tmpfs and only changes ramfs. Even your variables and arguments refer to tmpfs. The Kconfig entry indicates that the feature depends on TMPFS too. Judging from the following comment : * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field. I suspect that you confuse both filesystems. - ramfs is in fs/ramfs and is always compiled in, you cannot disable it - tmpfs is in mm/shmem.c and is optional. It also supports options that ramfs does not (eg: size) and data may be swapped. Please understand that I'm not discussing the usefulness of your patch, I'm just trying to avoid a huge confusion. Oh, I thought the filesystem mounted by mount -t tmpfs none /tmp is tmpfs and the source code of tmpfs is located in fs/ramfs directory. So, I should write the description as an extension to ramfs rather than an extension to tmpfs. I'll fix it in next posting. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
On Sun, Jan 06, 2008 at 04:36:06PM +0900, Tetsuo Handa wrote: Hello. Willy Tarreau wrote: Your patch is very confusing. In your description, as well as in the comments you talk about tmpfs, but your patch does not touch even one line of tmpfs and only changes ramfs. Even your variables and arguments refer to tmpfs. The Kconfig entry indicates that the feature depends on TMPFS too. Judging from the following comment : * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field. I suspect that you confuse both filesystems. - ramfs is in fs/ramfs and is always compiled in, you cannot disable it - tmpfs is in mm/shmem.c and is optional. It also supports options that ramfs does not (eg: size) and data may be swapped. Please understand that I'm not discussing the usefulness of your patch, I'm just trying to avoid a huge confusion. Oh, I thought the filesystem mounted by mount -t tmpfs none /tmp is tmpfs Yes, that is a tmpfs. and the source code of tmpfs is located in fs/ramfs directory. No, ramfs is what you get by mount -t ramfs none /tmp :-) You will notice that df will not report your ramfs by default because it reports zero blocks. But mount or df /tmp will report it. So, I should write the description as an extension to ramfs rather than an extension to tmpfs. and please also the comments, macros and variable names in the code, as they are what confused me first. I'll fix it in next posting. Thanks, Willy - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Simple tamper-proof device filesystem.
Quoting Tetsuo Handa ([EMAIL PROTECTED]): Hello. Thank you for attending discussion for previous posting (starting from http://lkml.org/lkml/2007/12/16/23 ). The previous posting was for feasibility test to know whether this kind of trivial filesystem is acceptable for mainline. Now, it seems that there is a little chance for accepting. Therefore I rebased the patch using the -mm tree. Regards. -- Subject: Simple tamper-proof device filesystem. The goal of this filesystem is to guarantee that applications using well-known device locations under /dev get the device they want (e.g. an application that accesses /dev/null can always get a character special device with major=1 and minor=3). This idea sounds silly? Indeed, if you think the root can do whatever he/she wants do do. But this filesystem makes sense when used with access control mechanisms like MAC (mandatory access control). I want to use this filesystem in case where a process with root privilege was hijacked but the behavior of the hijacked process is still restricted by MAC. Why not use FUSE? Because /dev has to be available through the lifetime of the kernel. It is not acceptable if /dev stops working due to SIGKILL or OOM-killer. Why not use SELinux? Because SELinux doesn't guarantee filename and its attribute. As far as I know, no MAC implementation can handle filename and its attribute. I guess this is because Filename and its attributes pairs are conventionally considered as constant and reliable. It makes the MAC's policy syntax complicated to describe this attribute enforcement information in MAC's policy. I want to add functionality that the MACs are missing. Instead of adding this functionality per MAC, I propose to add it as ground work, to be combined with any MAC. Why not drop CAP_MKNOD? Dropping CAP_MKNOD is not enough for emulating this filesystem because a process can still rename()/unlink() to break filename and its attributes handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1; mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ). This time, I'm implementing this filesystem as an extension to tmpfs because what this filesystem does are nothing but check filename and its attributes in addition to what tmpfs does. Signed-off-by: Tetsuo Handa [EMAIL PROTECTED] --- fs/ramfs/inode.c | 101 - fs/ramfs/syaoran.h | 1066 + 2 files changed, 1160 insertions(+), 7 deletions(-) --- linux-2.6-mm.orig/fs/ramfs/inode.c +++ linux-2.6-mm/fs/ramfs/inode.c @@ -35,6 +35,7 @@ #include linux/sched.h #include asm/uaccess.h #include internal.h +#include syaoran.h /* some random number */ #define RAMFS_MAGIC 0x858458f6 @@ -49,7 +50,8 @@ static struct backing_dev_info ramfs_bac BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP, }; -struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev) +struct inode *__ramfs_get_inode(struct super_block *sb, int mode, dev_t dev, + const int mac) { struct inode * inode = new_inode(sb); @@ -65,10 +67,19 @@ struct inode *ramfs_get_inode(struct sup switch (mode S_IFMT) { default: init_special_inode(inode, mode, dev); + if (mac) { + if (S_ISBLK(mode)) + inode-i_fop = wrapped_def_blk_fops; + else if (S_ISCHR(mode)) + inode-i_fop = wrapped_def_chr_fops; + inode-i_op = syaoran_file_inode_operations; + } break; case S_IFREG: inode-i_op = ramfs_file_inode_operations; inode-i_fop = ramfs_file_operations; + if (mac) + inode-i_op = syaoran_file_inode_operations; break; case S_IFDIR: inode-i_op = ramfs_dir_inode_operations; @@ -79,12 +90,19 @@ struct inode *ramfs_get_inode(struct sup break; case S_IFLNK: inode-i_op = page_symlink_inode_operations; + if (mac) + inode-i_op = syaoran_symlink_inode_operations; break; } } return inode; } +struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev) +{ + return __ramfs_get_inode(sb, mode, dev, 0); +} + /* * File creation. Allocate an inode, and we're done.. */ @@ -92,9 +110,17 @@ struct inode *ramfs_get_inode(struct sup static int ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) { -
[PATCH][RFC] Simple tamper-proof device filesystem.
Hello. Thank you for attending discussion for previous posting (starting from http://lkml.org/lkml/2007/12/16/23 ). The previous posting was for feasibility test to know whether this kind of trivial filesystem is acceptable for mainline. Now, it seems that there is a little chance for accepting. Therefore I rebased the patch using the -mm tree. Regards. -- Subject: Simple tamper-proof device filesystem. The goal of this filesystem is to guarantee that applications using well-known device locations under /dev get the device they want (e.g. an application that accesses /dev/null can always get a character special device with major=1 and minor=3). This idea sounds silly? Indeed, if you think the root can do whatever he/she wants do do. But this filesystem makes sense when used with access control mechanisms like MAC (mandatory access control). I want to use this filesystem in case where a process with root privilege was hijacked but the behavior of the hijacked process is still restricted by MAC. Why not use FUSE? Because /dev has to be available through the lifetime of the kernel. It is not acceptable if /dev stops working due to SIGKILL or OOM-killer. Why not use SELinux? Because SELinux doesn't guarantee filename and its attribute. As far as I know, no MAC implementation can handle filename and its attribute. I guess this is because Filename and its attributes pairs are conventionally considered as constant and reliable. It makes the MAC's policy syntax complicated to describe this attribute enforcement information in MAC's policy. I want to add functionality that the MACs are missing. Instead of adding this functionality per MAC, I propose to add it as ground work, to be combined with any MAC. Why not drop CAP_MKNOD? Dropping CAP_MKNOD is not enough for emulating this filesystem because a process can still rename()/unlink() to break filename and its attributes handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1; mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ). This time, I'm implementing this filesystem as an extension to tmpfs because what this filesystem does are nothing but check filename and its attributes in addition to what tmpfs does. Signed-off-by: Tetsuo Handa [EMAIL PROTECTED] --- fs/ramfs/inode.c | 101 - fs/ramfs/syaoran.h | 1066 + 2 files changed, 1160 insertions(+), 7 deletions(-) --- linux-2.6-mm.orig/fs/ramfs/inode.c +++ linux-2.6-mm/fs/ramfs/inode.c @@ -35,6 +35,7 @@ #include linux/sched.h #include asm/uaccess.h #include internal.h +#include syaoran.h /* some random number */ #define RAMFS_MAGIC0x858458f6 @@ -49,7 +50,8 @@ static struct backing_dev_info ramfs_bac BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP, }; -struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev) +struct inode *__ramfs_get_inode(struct super_block *sb, int mode, dev_t dev, + const int mac) { struct inode * inode = new_inode(sb); @@ -65,10 +67,19 @@ struct inode *ramfs_get_inode(struct sup switch (mode S_IFMT) { default: init_special_inode(inode, mode, dev); + if (mac) { + if (S_ISBLK(mode)) + inode-i_fop = wrapped_def_blk_fops; + else if (S_ISCHR(mode)) + inode-i_fop = wrapped_def_chr_fops; + inode-i_op = syaoran_file_inode_operations; + } break; case S_IFREG: inode-i_op = ramfs_file_inode_operations; inode-i_fop = ramfs_file_operations; + if (mac) + inode-i_op = syaoran_file_inode_operations; break; case S_IFDIR: inode-i_op = ramfs_dir_inode_operations; @@ -79,12 +90,19 @@ struct inode *ramfs_get_inode(struct sup break; case S_IFLNK: inode-i_op = page_symlink_inode_operations; + if (mac) + inode-i_op = syaoran_symlink_inode_operations; break; } } return inode; } +struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev) +{ + return __ramfs_get_inode(sb, mode, dev, 0); +} + /* * File creation. Allocate an inode, and we're done.. */ @@ -92,9 +110,17 @@ struct inode *ramfs_get_inode(struct sup static int ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) { - struct inode * inode = ramfs_get_inode(dir-i_sb, mode, dev); + struct inode *inode; int