Re: [PATCH] fs: Correct SuS compliance for open of large file without options
On Sep 27, 2007, at 17:34:45, Greg KH wrote: On Thu, Sep 27, 2007 at 02:37:42PM -0400, Theodore Tso wrote: That fact that sysfs is all laid out in a directory, but for which some directories/symlinks are OK to use, and some are NOT OK to use --- is why I call the sysfs interface an open pit. And because of the original design mistakes, we have only been able to change things for the better in a slow manner. We have had userspace programs fixed up for _years_ before we are able to make the corresponding changes in the kernel, so as to not break the distros that are slow to upgrade packages and kernels (like Debian.) Hey! No poking fingers at Debian here; it's been *MUCH* improved lately. I far more frequently have problems with boxes still running some ancient release of RHEL-4 or something than I do with those running Debian stable (virtually always the latest Debian stable). Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/25] Unionfs: add un/likely conditionals on copyup ops
On Sep 26, 2007, at 09:40:20, Erez Zadok wrote: In message [EMAIL PROTECTED], Kok, Auke writes: I've been told several times that adding these is almost always bogus - either it messes up the CPU branch prediction or the compiler/CPU just does a lot better at finding the right way without these hints. Adding them as a blanket seems rather strange. Have you got any numbers that this really improves performance? Auke, that's a good question, but I found it hard to find any info about it. There's no discussion on it in Documentation/, and very little I could find elsewhere. I did see one url explaining what un/likely does precisely, but no guidelines. My understanding is that it can improve performance, as long as it's used carefully (otherwise it may hurt performance). Hmm, even still I agree with Auke, you probably use it too much. Recently we've done a full audit of the entire code, and added un/ likely where we felt that the chance of succeeding is 95% or better (e.g., error conditions that should rarely happen, and such). Actually due to the performance penalty on some systems I think you only want to use it if the chance of succeeding is 99% or better, as the benefit if predicted is a cycle or two and the harm if mispredicted can be more than 50 cycles, depending on the CPU. You should also remember than in filesystems many failures are triggered by things like the ld.so library searches, where it literally calls access() 20 different times on various possible paths for library files, failing the first 19. It does this once for each necessary library. Typically you only want to add unlikely() or likely() for about 2 reasons: (A) It's a hot path and the unlikely case is just going to burn a bunch of CPU anyways (B) It really is extremely unlikely that it fails (Think physical hardware failure) Anything else is just bogus. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [00/41] Large Blocksize Support V7 (adds memmap support)
On Sep 23, 2007, at 02:22:12, Goswin von Brederlow wrote: [EMAIL PROTECTED] (Mel Gorman) writes: On (16/09/07 23:58), Goswin von Brederlow didst pronounce: But when you already have say 10% of the ram in mixed groups then it is a sign the external fragmentation happens and some time should be spend on moving movable objects. I'll play around with it on the side and see what sort of results I get. I won't be pushing anything any time soon in relation to this though. For now, I don't intend to fiddle more with grouping pages by mobility for something that may or may not be of benefit to a feature that hasn't been widely tested with what exists today. I watched the videos you posted. A nice and quite clear improvement with and without your logic. Cudos. When you play around with it may I suggest a change to the display of the memory information. I think it would be valuable to use a Hilbert Curve to arange the pages into pixels. Like this: # # 0 3 # # ### 1 2 ### ### 0 1 E F # # ### ### 3 2 D C # # # ### # 4 7 8 B # # # # ### ### 5 6 9 A Here's an excellent example of an 0-255 numbered hilbert curve used to enumerate the various top-level allocations of IPv4 space: http://xkcd.com/195/ Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage. Move away from char device ioctls.
-accessed inode objects and creates non-fragmented copies before deleting the old ones. There's a lot of other technical details which would need resolution in an actual implementation, but this is enough of a summary to give you the gist of the concept. Most likely there will be some major flaw which makes it impossible to produce reliably, but the concept contains the things I would be interested in for a real networked filesystem. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adding a security parameter to VFS functions
On Aug 16, 2007, at 18:57:24, Linus Torvalds wrote: On Wed, 15 Aug 2007, David Howells wrote: Would you object greatly to functions like vfs_mkdir() gaining a security parameter? What I'm thinking of is this: int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode, struct security *security) I personally consider this an affront to everythign that is decent. Why the *hell* would mkdir() be so magical as to need something like that? Not speaking directly for David, but I believe the reason is for background kernel code which needs to do filesystem access during a thread's execution with *completely* different security context from that of the thread. Examples should be reasonably obvious; kNFSd is one, but it also includes anything where the kernel would poke directly into the filesystem, such as network filesystem cachefiles. Make it something sane, like a struct nameidata instead, and make it at least try to look like the path creation that is done by open (). Or create a struct file * or something. I can imagine having mkdir() being passed similar data as open () (ie lookup()), but I cannot _possibly_ imagine it ever being valid to pass in something totally made-up to just mkdir(), and nothing else. There's something fundamentally wrong there. I would offer the suggestion of using the described struct security in-place in the task struct, in place of using all those fields individually. That would be, in effect the default security context for any given task, if NULL is passed to the appropriate vfs function. For CacheFiles and kNFSd, they could each allocate their own during initialization or new-connection and pass that to any mkdir(), etc that they do on behalf of a given client. What makes mkdir() so magical? Also, what about all the other ops? Why is mkdir() special, but not mknod()? Why is mkdir() special, but not rmdir()? Really, none of this seems to make any sense unless you describe what is so magical about mkdir(). I think mkdir() was just an example he was using, probably because it was the first VFS call he needed to set a security context on. Next would come anything which CacheFiles or NFSd call on the underlying filesystem. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 00/44] AppArmor security module overview
On Jun 26, 2007, at 22:24:03, John Johansen wrote: other issues that have been raised are: - the use of d_path to generate the pathname used for mediation when a file is opened. - Generating the pathname using a reverse walk is considered ugly A little more than ugly. In this basic concurrent rename() and path-lookup load: mkdir -p /a/b/0 mkdir -p /a/b/2 mkdir -p /c touch /a/b/0/1 cd /a/b while true; mv 0/1 2/3; mv 2/3 0/1; done cd / while true; do mv a/b c/d; mv c/d a/b; done while true; do cat a/b/0/1 done while true; do cat a/b/2/3 done while true; do cat c/d/0/1 done while true; do cat c/d/2/3 done I seem to recall you could actually end up racing and building a path to the file in those directories as a/d/0/3 or some other path at which it never even remotely existed. I'd love to be wrong, but I can't help but see this problem in any reverse-pathname-generation proposal which gets the locking right. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fsblock
On Jun 26, 2007, at 07:14:14, Nick Piggin wrote: On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: Can we call it a block mapping layer or something like that? e.g. struct blkmap? I'm not fixed on fsblock, but blkmap doesn't grab me either. It is a map from the pagecache to the block layer, but blkmap sounds like it is a map from the block to somewhere. fsblkmap ;) vmblock? pgblock? Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Versioning file system
On Jun 19, 2007, at 03:58:57, Bron Gondwana wrote: On Mon, Jun 18, 2007 at 11:10:42PM -0400, Kyle Moffett wrote: On Jun 18, 2007, at 13:56:05, Bryan Henderson wrote: The question remains is where to implement versioning: directly in individual filesystems or in the vfs code so all filesystems can use it? Or not in the kernel at all. I've been doing versioning of the types I described for years with user space code and I don't remember feeling that I compromised in order not to involve the kernel. What I think would be particularly interesting in this domain is something similar in concept to GIT, except in a file-system: [...snip...] It can work, but there's one big pain at the file level: no mmap. IMHO it's actually not that bad. The gitfs would divide larger files up into manageable chunks (say 4MB) which could be quickly SHA-1ed. When a file is mmapped and partially modified, the SHA-1 would be marked as locally invalid, but since mmap() loses most consistency guarantees that's OK. A time or writeout based commit scheme might still freeze, SHA-1, and write-out the page at regular intervals without the program's knowledge, but since you only have to SHA-1 the relatively-small 4MB chunk (which is about to hit disk anyways), it's not a significant time penalty. Even if under memory pressure and swapping data out to disk you don't have to update the SHA-1 and create a new commit as long as you keep a reference to the object stored in the volume header somewhere and maintain the SHA-1 out-of-date bit. A program which carefully uses msync() would be fine, of course (with proper configuration) as that would create a new commit as appropriate. Since mmap() is poorly defined on network filesystems in the absence of msync(), I don't see that such behaviour would be a problem. And it certainly would be fine on local filesystems as there you can just stuff the SHA-1 out-of-date bit and a reference to the parent commit and path in the object itself. Then you just need to keep a useful reference to that object in a table somewhere in the volume and you're set. If you don't want to support mmap it can work reasonably happily, though you may want to keep your sha1 (or other digest) state as well as the final digest so you can cheaply calculate the digest for a small append without walking the entire file. You may also want to keep state checkpoints every so often along a big file so that truncates don't cost too much to recalculate. That may be worth it even if the file is divided into 4MB chunks (or other configurable value), but it would need benchmarking. Luckily in a userspace VFS that's only accessed via FTP and DAV we can support a limited set of operations (basically create, append, read, delete) You don't get that luxury for a general purpose filesystem, and that's the problem. There will always be particular usage patterns (especially something that mmaps or seeks and touches all over the place like a loopback mounted filesystem or a database file) that just dodn't work for file-level sha1s. I'd think that loopback-mounted filesystems wouldn't be that difficult 1) Set the SHA-1 block size appropriately to divide the big file into a bunch of little manageable files. Could conceivably be multi- layered like directories, depending on the size of the file. 2) Mark the file as exempt from normal commits (IE: without special syscalls or fsync/msync() on the file itself, it is never updated in the tree objects. 3) Set up the loopback device to call the gitfs commit code when it receives barriers or flushes from the parent filesystem. And database files aren't a big issue. I have yet to see a networked filesystem which you could stick a MySQL database on it from one node and expect to get useful/recent read results from other nodes. If you really wanted something like that for such a gitfs, you could just add code to MySQL to create a gitfs commit every N transactions and not otherwise. The best part is: that would make online MySQL backups from another node trivial! Just pick any arbitrary appropriate commit object and mount that object, then cp -a mysql_db_dir mysql_backup_dir. That's not to say it wouldn't have a performance penalty, but for some people the performance penalty might be worth it. Oh, and for those programs which want multi-master replication, this makes it ten times easier: 1) Put each master-server on a different gitfs branch 2) Write your program as gitfs aware. Make it create gitfs commits at appropriate times (so the data is accessible from other nodes). 3) Come up with a useful non-interactive database-file merge algorithm. Useful examples of different kinds of merge engines may be found in the git project. This should take $BASE_VERSION, $NEWVERSION1, $NEWVERSION2, and produce a $MERGEDVERSION. A good algorithm
Re: Versioning file system
erased object you would use a History archived object with a little bit of string data to indicate which volume it's stored on (and where on the volume). When you stick that volume into the system you could easily tell the kernel to use it as an alternate for the given storage group. Q. What enforces data integrity? A. Ensure that a new tree object and its associated sub objects are on disk before you delete the old one. Doesn't need any actual full syncs at all, just barriers. If you replace the tree object before write-out is complete then just skip writing the old one and write the new one in its place. Q. What consists of a commit? A. Anything the administrator wants to define it as. Useful algorithms include: Once per x Mbyte of page dirtying, Once per 5 min, Only when sync() or fsync() are called, Only when gitfs- commit is called. You could even combine them: Every x Mbyte of page dirtying or every 5 minutes, whichever is shorter (or longer, depending on admin requirements). There would also be appropriate syscalls to trigger appropriate git-like behavior. Network- accessible gitfs would want to have mechanisms to trigger commits based on activity on other systems (needs more thought). Q. How do you access old versions? A. Mount another instance of the filesystem with an SHA-1 ID, a tag- name, or a branch-name in a special mount option. Should be user accessible with some restrictions (needs more thought). Q. How do you deal with conflicts on networked filesystems. A. Once again, however the administrator wants to deal with them. Options: 1) Forcibly create a new branch for the conflicted tree. 2) Attempt to merge changes using the standard git-merge semantics 3) Merge independent changes to different files and pick one for changes to the same file 4) Your Algorithm Here(TM). GIT makes it easy to extend conflict-resolution. Q. How do you deal with little scattered changes in big (or sparse) files? A. Two questions, two answers: For sparse files, git would need extending to understand (and hash) the nature of the sparse-ness. For big files, you should be able to introduce a compound-file datatype and configure git to deal with specific X-Mbyte chunks of it independently. This might not be a bad idea for native git as well. Would need system-specific configuration. Q. How do you prevent massive data consumption by spurious tiny changes A. You have a few options: 1) Configure your commit algorithm as above to not commit so often 2) Configure a stepped commit-discard algorithm as described above in the How do you delete things question 3) Archive unused data to secondary storage more often Q. What about all the unanswered questions? A. These are all the ones I could think of off the top of my head but there are at least a hundred more. I'm pretty sure these are some of the most significant ones. Q. That's a great idea and I'll implement it right away! A. Yay! (but that's not a question :-D) Good luck and happy hacking. Q. That's a stupid idea and would never ever work! A. Thanks for your useful input! (but that's not a question either) I'm sure anybody who takes up a project like this will consider such opinions. Q. *flamage* A. I'm glad you have such strong opinions, feel free to to continue to spam my /dev/null device (and that's also not a question). All opinions and comments welcomed. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Versioning file system
On Jun 18, 2007, at 17:24:23, Brad Boyer wrote: On Tue, Jun 19, 2007 at 12:26:57AM +0200, Jörn Engel wrote: Pointless here means that _I_ don't see the point. Maybe there are valid uses for extended attributes. If there are, noone has explained them to me yet. The users of extended attributes that I've dealt with are ACL support and SELinux. These both use extended attributes under the covers. It's just not immediately obvious if you aren't looking. Yeah, extended attributes are typically used for exactly that: attributes like labels, permissions, encoding, cached file-type, DOS/Windows/Mac metadata, etc. Sometimes people suggest sticking icons in there, but that's probably a bad idea. At most stick an icon label attribute which refers to a file /usr/share/icons/ by_attr/$ICON_LABEL.png. If you're trying to put more than 256 bytes of data in an extended attribute then you're probably doing something wrong. They're very good for cached attributes (like file- type) where you don't care if the data is lost by tar, and they're reasonable for security-related attributes where you don't want attribute-unaware programs trying to save and restore them (like SELinux labels). Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Jun 09, 2007, at 01:18:40, [EMAIL PROTECTED] wrote: SELinux is like a default allow IPS system, you have to describe EVERYTHING to the system so that it knows what to allow and what to stop. WRONG. You clearly don't understand SELinux at all. Try booting in enforcing mode with an empty policy file (well, not quite empty, there are a few mandatory labels you have to create before it's a valid policy file). /sbin/init will load the initial policy, attempt to re-exec() itself... and promptly grind to a halt. End-of-story. Typical targetted policies leave all user logins as unrestricted, adding security for daemons but not getting in the way of users who would otherwise turn SELinux off. On the other hand, a targeted policy has a trusted type for user logins which is explicitly allowed access to everything. That said, if you actually want your system to *work* with any default-deny policy then you have to describe EVERYTHING anyways. How exactly do you expect AppArmor to work if you don't allow users to run /bin/passwd, for example. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Jun 09, 2007, at 12:46:40, [EMAIL PROTECTED] wrote: On Sat, 9 Jun 2007, Kyle Moffett wrote: Typical targetted policies leave all user logins as unrestricted, adding security for daemons but not getting in the way of users who would otherwise turn SELinux off. On the other hand, a targeted policy has a trusted type for user logins which is explicitly allowed access to everything. Ok, it sounds as if I did misunderstand SELinux. I thought that by labeling the individual files you couldn't do the 'only restrict apache' type of thing. That said, if you actually want your system to *work* with any default-deny policy then you have to describe EVERYTHING anyways. How exactly do you expect AppArmor to work if you don't allow users to run /bin/passwd, for example. for AA you don't try to define permissions for every executable, and ones that you don't define policy are unrestricted. so as I understand this with SELinux you will have lots of labels around your system (more as you lock down the system more) you need to define policy so that your unrestricted users must have access to every label, and every time you create a new label you need to go back to all your policies to see if the new label needs to be allowed from that policy Actually, it's easier than that. There are type attributes which may be assigned to an arbitrary set of types, and each type field in an access rule may use either a type or an attribute. So you don't actually need to modify existing rules when adding new types, you just add the appropriate existing attributes to your new type. For example, you could set up a logfile attribute which allows logrotate to archive old versions and allows audit-admin users to modify/delete them, then whenever you need to add a new logfile you just declare the my_foo_log_t type to have the logfile attribute. On the other hand, I seem to recall that typical targeted policies don't grant most of the additional access via access rules, they instead add a special case to the fundamental constraints in the policy (IE: If the subject type has the trusted attribute then skip some of the other type-based checks). Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Jun 09, 2007, at 13:32:05, [EMAIL PROTECTED] wrote: On Sat, 9 Jun 2007, Kyle Moffett wrote: On Jun 09, 2007, at 12:46:40, [EMAIL PROTECTED] wrote: so as I understand this with SELinux you will have lots of labels around your system (more as you lock down the system more) you need to define policy so that your unrestricted users must have access to every label, and every time you create a new label you need to go back to all your policies to see if the new label needs to be allowed from that policy Actually, it's easier than that. There are type attributes which may be assigned to an arbitrary set of types, and each type field in an access rule may use either a type or an attribute. So you don't actually need to modify existing rules when adding new types, you just add the appropriate existing attributes to your new type. For example, you could set up a logfile attribute which allows logrotate to archive old versions and allows audit- admin users to modify/delete them, then whenever you need to add a new logfile you just declare the my_foo_log_t type to have the logfile attribute. isn't this just the flip side of the same problem? every time you define a new attribute you need to go through all the files and decide if the new attribute needs to be given to that file. No you don't, you can add attributes to a type after-the-fact. In concept this problem is very similar to programming: You have various documented interfaces used by different policy files to interact with each other. As long as your policy files conform to the documented interfaces then you *DONT* have to manually inspect each file because you can make basic assumptions. On the other hand, when you break that interface contract you will get very unexpected results. For the above example: My syslog policy file would create a logfile attribute and types for /var/log/auth/auth.log, /var/log/kern/kern.log, and /var/log/ messages. It would also create a logdaemon attribute which has automatic type transitions to create files in different /var/log/* directories Finally, it would allow the syslogd type to create and append to its specific file types for auth.log, kern.log, and messages. My logrotate policy file would depend on the syslog policy and would declare the logrotate daemon type as a logdaemon, and additionally allow logrotate to read, rename, append, and delete logfile types. Since logrotate is a logdaemon, it already has the appropriate type transitions for new types. My samba policy file would depend on the syslog policy and would declare the samba daemon type as a logdaemon and the /var/log/ samba/* type as a logfile. Then it would add a type transition rule so when logdaemon creates new files in samba_log_dir_t, they have the appropriate samba_log_t label. Finally, samba would allow itself to append to samba_log_t files. Note that now when logrotate runs and rotates files in /var/log/ samba, it will automatically create the new files with type samba_log_t, even though there are no *direct* associations between those types. If the syslog policy file was poorly written it could seriously adversely affect the security of the system, but hopefully that's obvious :-D. Policy development is _hard_, it's a whole separate state-machine and pseudo-programming-language that should mostly be left to security professionals or very experienced developers/sysadmins. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
On May 28, 2007, at 06:41:11, Toshiharu Harada wrote: 2007/5/27, Kyle Moffett [EMAIL PROTECTED]: If you can't properly manage your labels, then how do you expect any security at all? Please read my message again. I didn't say, This can never be achieved. I said, This can not be easily achieved. So you said (data labels) can not be easily achieved. My question for you is: How do you manage secure UNIX systems without standard UNIX permission bits? Also: If you have problems with data labels then what makes pathname based labels easier? If there is something that could be done to improve SELinux and make it more readily configurable then it should probably be done. Permission bits can be checked easily with ls command but assuring the correctness of labels are not that easy. I'll try to explain. The correctness of the permission bit for a given file can be judged solely by the result of ls command. The correctness of the label, on the other hand, can't be judged without understanding of whole policy including domain transitions. (see the attached figure) I can imagine that once one get the complete SELinux policy, then it is able to modify and maintain it. That's why there are a number of efforts to make modular SELinux policies. A good SELinux policy provides a few core system types and labels which a policy developer needs to understand, as well as some good macros to simplify the human-editable policy files. For instance, in my customized policy a daemon run by an initscript which reads a single config file in /etc needs this policy (Note that I use _d as a suffix for process domains instead of the usual _t): initrc_daemon(foo_exec_t, foo_d) daemon_config(foo_d, foo_conf_t) Add maybe 2 lines for network port access, another 2 for database files in /var, plus maybe an iptables rule or two in your firewall file. I don't say making a complete SELinux policy is impossible, and actually you said you did it. But to be frank, I don't think you are the average level user at all. ;-) Average users are not supposed to be writing security policy. To be honest, even average-level system administrators should not be writing security policy. It's OK for such sysadmins to tweak existing policy to give access to additional web-docs or such, but only expert sysadmin/developers or security professionals should be writing security policy. It's just too damn easy to get completely wrong. I'm very interested in how you can know that you have the correct object labeling (this is my point). Could you tell? I know that I have the correct object labeling because: Do you mind if I add this? 0) I understood the default policy and perfectly understand the every behavior of my system. this is where the difficulties exist. You don't have to understand the entire default policy; that's the point of modular policy. You only have to understand how to _use_ the interfaces of the system policy (which are documented) and how the particular daemon policy is supposed to work. The people developing the core system policy need to understand the inner workings of said policy, but they don't need to understand how the rest of the system works. The core functionality behind this separation is macro interfaces and attributes. By grouping types with attributes it is possible for arbitrary daemon types to categorize themselves under access rules defined by the base policy, and with interfaces the daemons don't really even need to know what those attributes are called. I don't deny DAC at all. If we deny DAC, we can't live with Linux it's the base. MAC can be used to cover the shortages of DAC and Linux's simple user model, that's it. From security point of view, simplicity is always the virtue and the way to go. Inode combined label is guaranteed to be a single at any point time. This is the most noticeable advantage of label-based security. I would argue that pathname-based security breaks the simplicity is the best virtue (of a security system) paradigm, because it attributes multiple potentially-conflicting labels to the same piece I have a question for you. With current implementation of SELinux, only one label can be assigned. But there are cases that one object can be used in different context, so I think it might help if SELinux would allow objects to have multiple labels. (I'm not talking about conflicts here) What do you think? This is the whole advantage of SELinux type attributes: you can define a type var_foo_t which has a specific list of attributes; rules which accept type specifiers can also accept attribute specifiers as well. If what you want is a label which may be accessed in two different ways, then you declare attributes for each access method and declare a type which has the attributes filetype, access1, and access2 (assuming access1 and access2
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
On May 28, 2007, at 16:38:38, Pavel Machek wrote: Kyle Moffett wrote: I am of the opinion that adding a name parameter to the file/ directory create actions would be useful. For example, with such support you could actually specify a type-transition rule conditional on a specific name or substring: named_type_transition sshd_t tmp_t:sock_file prefix ssh- ssh_sock_t; Useful options for matching would be prefix, suffix, substr (start,len). regex would be nice but is sorta computationally intensive and would be likely to cause more problems than it solves. Could someone implement this? AFAICT that prevents SELinux from being superset of AppArmor... Doing this should be significantly simpler than whole AA, and hopefully it will end up less ugly, too. Really it would need to extend all action-match items with new named_ equivalents, and most callbacks would need to be extended to pass in an object name, if available. On the other hand, with such support implemented then the AppArmor policy compilation tools could be transformed into a simple SELinux policy generator. I estimate that the number of new lines of kernel code for such a modified SELinux would be 100x less than the kernel code in AppArmor. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
CC trimmed to remove a few poor overloaded inboxes from this tangent. On May 27, 2007, at 04:34:10, Cliffe wrote: Kyle wrote: On the other hand, if you actually want to protect the _data_, then tagging the _name_ is flawed; tag the *DATA* instead. Would it make sense to label the data (resource) with a list of paths (names) that can be used to access it? Therefore the data would be protected against being accessed via alternative arbitrary names. This may be a simple label to maintain and (possibly to) enforce, allowing path based confinement to protect a resource. This may allow for the benefits of pathname based confinement while avoiding some of its problems. The primary problem with that is that mv somefile otherfile must change the labels, which means that every process that issues a rename () syscall needs to have special handling of labels. The other problem is that many of the features and capabilities of SELinux get left by the wayside. On an SELinux system 90% of the programs don't need to be modified to understand labels, since the policy can define automatic label transitions. SELinux also allows you to have conditional label privileges based on boolean variables, something that cannot be done if the privileges themselves are stored in the filesystem. Finally, such an approach does not allow you to differentiate between programs. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
On May 27, 2007, at 03:25:27, Toshiharu Harada wrote: 2007/5/27, Kyle Moffett [EMAIL PROTECTED]: On May 26, 2007, at 19:08:56, Toshiharu Harada wrote: 2007/5/27, James Morris [EMAIL PROTECTED]: On Sat, 26 May 2007, Kyle Moffett wrote: AppArmor). On the other hand, if you actually want to protect the _data_, then tagging the _name_ is flawed; tag the *DATA* instead. Bingo. (This is how traditional Unix DAC has always functioned, and is what SELinux does: object labeling). Object labeling (or labeled security) looks simple and straight forward way, but it's not. (1) Object labeling has a assumption that labels are always properly defined and maintained. This can not be easily achieved. That's a circular argument, and a fairly trivial one at that. Sorry Kyle, I don't think it's a trivial one. The opposite. How is that argument not trivially circular? Foo has an assumption that foo-property is always properly defined and maintained. That could be said about *anything*: * Unix permissions have an assumption that mode bits are always properly defined and maintained * Apache .htaccess security has an assumtion that .htaccess files are always properly defined and maintained. * Functional email communication has an assumption that the email servers are always properly defined and maintained If you can't properly manage your labels, then how do you expect any security at all? Please read my message again. I didn't say, This can never be achieved. I said, This can not be easily achieved. So you said (data labels) can not be easily achieved. My question for you is: How do you manage secure UNIX systems without standard UNIX permission bits? Also: If you have problems with data labels then what makes pathname based labels easier? If there is something that could be done to improve SELinux and make it more readily configurable then it should probably be done. If you can't achieve the first with reasonable security, then you probably can't achieve the second either. Also, if you can't manage correct object labeling then I'm very interested in how you are maintaining secure Linux systems without standard DAC. I'm very interested in how you can know that you have the correct object labeling (this is my point). Could you tell? I know that I have the correct object labeling because: 1) I rewrote/modified the default policy to be extremely strict on the system where I wanted the extra security and hassle. 2) I ensured that the type transitions were in place for almost everything that needed to be done to administer the system. 3) I wrote a file-contexts file and relabeled *once* 4) I loaded the customized policy plus policy for restorecon and relabeled for the last time 5) I reloaded the customized policy without restorecon privileges and without the ability to reload the policy again. 6) I never reboot the system without enforcing mode. 7) If there are unexpected errors or files have incorrect labels, I have to get the security auditor to log in on the affected system and relabel the problematic files manually (rare occurrence which requires excessive amounts of paperwork). (2) Also, assigning a label is something like inventing and assigning a *new* name (label name) to objects which can cause flaws. I don't understand how assigning new attributes to objects can cause flaws, nor what flaws those might be; could you elaborate further? In particular, I don't see how this is really all that more complicated than defining additional access control in apache .htaccess files. The principle is the same: by stacking multiple independent security-verification mechanisms (Classical UNIX DAC and Apache permissions) you can increase security, albeit at an increased management cost. You might also note that .htaccess files are yet another form of successful label-based security; the security context for a directory depends on the .htaccess label file found within. The *exact* same principles apply to SELinux: you add additional attributes backed by a simple and powerful state-machine. The cross-checks are lower-level than those from .htaccess files, but the principles are the same. I don't deny DAC at all. If we deny DAC, we can't live with Linux it's the base. MAC can be used to cover the shortages of DAC and Linux's simple user model, that's it. From security point of view, simplicity is always the virtue and the way to go. Inode combined label is guaranteed to be a single at any point time. This is the most noticeable advantage of label- based security. I would argue that pathname-based security breaks the simplicity is the best virtue (of a security system) paradigm, because it attributes multiple potentially-conflicting labels to the same piece of data. It also cannot protect the secrecy of specific *data* as well as SELinux can. For example
Re: Pass struct vfsmount to the inode_create LSM hook
On May 26, 2007, at 10:44:46, Tetsuo Handa wrote: Andreas Gruenbacher wrote: Tetsuo Handa wrote: Therefore, TOMOYO Linux checks the combination of filename and argv[0] passed to execve(). So you are indeed trying to control the value of argv[0]? Well, good luck with that, but it's totally insane. You are guaranteed to break some applications. TOMOYO Linux ristricts argv[0] using allow_argv0 syntax. allow_argv0 /bin/bash -bash to allow passing /bin/bash to filename and -bash to argv[0]. allow_argv0 /bin/gzip gunzip to allow passing /bin/gzip to filename and gunzip to argv[0]. allow_argv0 /sbin/busybox cat to allow passing /sbin/busybox to filename and cat to argv[0]. No need to use allow_argv0 syntax if the basename of filename and basename of argv[0] are the same (i.e. allow_argv0 /bin/bash bash is not required). TOMOYO Linux doesn't unconditionally forbid passing different values for filename and argv[0]. TOMOYO Linux allows passing different values for filename and argv[0] only if it is allowed by allow_argv0 syntax. Could you please explain me why this approach breaks applications? One of my servers runs 3 different instances of the kadmind Kerberos daemon, one for each realm which I need to be able to modify/ change-passwords/etc. In order to differentiate and stop/restart the appropriate daemon, I have a simple starter script which runs each kadmind process with a unique name derived from the realm (EG: kadmind(EXAMPLE.COM), kadmind(OTHER.EXAMPLE.COM)). Since this is a Kerberos server I use a very strict SELinux-based policy, yet my management tools need to be able to easily add and remove realms in a secure fashion. It sounds like TOMOYO Linux would not be able to handle this situation at all; I would either have to completely turn off that security feature and lose most of the functionality of TOMOYO Linux, or hard-code the list of realms into the policy file and have to completely reload policy every time I need to add/remove realms (big gaping security hole). Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
On May 26, 2007, at 19:08:56, Toshiharu Harada wrote: 2007/5/27, James Morris [EMAIL PROTECTED]: On Sat, 26 May 2007, Kyle Moffett wrote: AppArmor). On the other hand, if you actually want to protect the _data_, then tagging the _name_ is flawed; tag the *DATA* instead. Bingo. (This is how traditional Unix DAC has always functioned, and is what SELinux does: object labeling). Object labeling (or labeled security) looks simple and straight forward way, but it's not. (1) Object labeling has a assumption that labels are always properly defined and maintained. This can not be easily achieved. That's a circular argument, and a fairly trivial one at that. If you can't properly manage your labels, then how do you expect any security at all? If you can't manage your labels, then pathname- based security won't work either. This is analogous to saying Pathname-based security has an assumption that path-permissions are always properly defined and maintained, which is equally obvious. If you can't achieve the first with reasonable security, then you probably can't achieve the second either. Also, if you can't manage correct object labeling then I'm very interested in how you are maintaining secure Linux systems without standard DAC. (2) Also, assigning a label is something like inventing and assigning a *new* name (label name) to objects which can cause flaws. I don't understand how assigning new attributes to objects can cause flaws, nor what flaws those might be; could you elaborate further? In particular, I don't see how this is really all that more complicated than defining additional access control in apache .htaccess files. The principle is the same: by stacking multiple independent security-verification mechanisms (Classical UNIX DAC and Apache permissions) you can increase security, albeit at an increased management cost. You might also note that .htaccess files are yet another form of successful label-based security; the security context for a directory depends on the .htaccess label file found within. The *exact* same principles apply to SELinux: you add additional attributes backed by a simple and powerful state- machine. The cross-checks are lower-level than those from .htaccess files, but the principles are the same. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
On May 26, 2007, at 22:37:02, [EMAIL PROTECTED] wrote: On Sat, 26 May 2007 22:10:34 EDT, Kyle Moffett said: On May 26, 2007, at 19:08:56, Toshiharu Harada wrote: (1) Object labeling has a assumption that labels are always properly defined and maintained. This can not be easily achieved. That's a circular argument, and a fairly trivial one at that. If you can't properly manage your labels, then how do you expect any security at all? Unfortunately, it's not at all as simple as all that. Toshiharu is quite correct that it isn't always easy to actually implement. Consider how many ad-croc usages of 'restorecon' are needed to get a Fedora SELinux box through rc.sysinit: While I don't think restorecon is really necessary to properly boot SELinux-enabled (I've got a Debian box with some heavily customized policy which does so just fine), I am of the opinion that adding a name parameter to the file/directory create actions would be useful. For example, with such support you could actually specify a type-transition rule conditional on a specific name or substring: named_type_transition sshd_t tmp_t:sock_file prefix ssh- ssh_sock_t; Useful options for matching would be prefix, suffix, substr (start,len). regex would be nice but is sorta computationally intensive and would be likely to cause more problems than it solves. /sbin/restorecon -R /dev 2/dev/null [ -n $SELINUX_STATE ] restorecon /dev/mapper /dev/mapper/ control /dev/null 21 These can go away if you modify your policy and pass -o fscontext=system_u:object_r:dev_t to the mount command for the /dev tmpfs, changing both the filesystem and the default directory labels from the default system_u:object_r:tmpfs_t. This will work as long as your policy files contain appropriate transitions from the dev_t type. REBOOTFLAG=`restorecon -v /sbin/init` restorecon /etc/mtab /etc/ld.so.cache /etc/blkid/blkid.tab /etc/ resolv.conf /dev/null 21 [ -n $SELINUX_STATE ] restorecon /tmp [ -n $SELINUX_STATE ] restorecon /tmp/.ICE-unix /dev/null 21 I believe these are to handle rebooting from non-SELinux mode. There are two solutions to this kind of problem: (1) Failing the boot if the labels are wrong (2) Fixing the labels (and rebooting if necessary) It would appear that FC does the latter, although for certain high- security systems (such as firewalls handling classified/confidential data), the first option is the only acceptable one. [ -n $SELINUX_STATE ] restorecon /dev/pts /dev/null 21 I think this can be handled with some combination of appropriate SELinux policy and mount options. At least, I don't need this in the boot scripts on my heavily customized Debian SELinux box. [ -n $SELINUX_STATE -a -e $path ] restorecon -R $path I don't know what the point of this generic line is; but I certainly don't have anything of the sort on my test system, and it appears to work just fine. And that's just getting the system up to single-user. Things like sendmail and sshd require more restorecon handholding in their rc.init files. Or just look at the creeping horror that is 'restorecond' (in particular, consider that the default restorcond.conf contains the strings '~/public_html' and '~/.mozilla/plugins/ libflashplayer.so'. Yee. Frikkin. Hah. ;) Part of the reason that Fedora has a large quantity of that restorecon and restorecond crap is that there is a certain amount of broken binary software needing executable stack/heap (such as flashplayer), programs without comprehensive or complete policies, or programs which by definition need extra support for SELinux. For example, to really complete the SELinux model you need editors and tools which understand security labels the same way they understand UNIX permissions. With a bit of vim scripting I can probably make it run external commands to read file labels before it reads the file itself and modify /proc/self/attr/fscreate before writing out the file, similar to the way it would keep track of the standard DAC permissions on files it modifies. There's also the fact that corporate products have fixed release schedules so remaining bugs in each release tend to get papered over instead of fixed properly (such as the restorecon in FC init- scripts). I haven't seen many problems with the SELinux model which aren't associated with working around buggy software or missing features. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] AFS: Implement file locking
On May 25, 2007, at 22:23:42, J. Bruce Fields wrote: On Thu, May 24, 2007 at 05:55:54PM +0100, David Howells wrote: + /* only whole-file locks are supported */ + if (fl-fl_start != 0 || fl-fl_end != OFFSET_MAX) + return -EINVAL; Do you allow upgrades and downgrades? (Just curious.) I was actually under the impression that OpenAFS had support for byte- range locking (as well as lock upgrade/downgrade); though IIRC there was some secondary protocol. That's probably why the support is so basic at the moment; David's getting the basics in first and the more complicated stuff can come later. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
On May 24, 2007, at 14:58:41, Casey Schaufler wrote: On Fedora zcat, gzip and gunzip are all links to the same file. I can imagine (although it is a bit of a stretch) allowing a set of users access to gunzip but not gzip (or the other way around). That is a COMPLETE straw-man argument. I can override your check with this absolutely trivial perl code: exec { /usr/bin/gunzip } gzip, -9, some/file/to.gz; Pathname-based checks are pretty fundamentally insecure. If you want to protect a name, then you should tag the name with security attributes (IE: AppArmor). On the other hand, if you actually want to protect the _data_, then tagging the _name_ is flawed; tag the *DATA* instead. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] LogFS take three
On May 17, 2007, at 13:45:33, Evgeniy Polyakov wrote: On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt ([EMAIL PROTECTED]) wrote: My plan was to move this code to lib/ sooner or later. If you consider it useful in its current state, I can do it immediatly. And if someone else merged a superior btree library I'd happily remove mine and use the new one instead. Opinions? Why would we need another btree, when there is lib/rbtree.c? Or does yours do something fundamentally different? It is not red-black tree, it is b+ tree. It might be better to use the prefix bptree to help prevent confusion. A quick google search on bp-tree reveals only the perl B +-tree module Tree::BPTree, a U-Maryland Java CS project on B+- trees, and a news article about a BP tree-top protest. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] revokeat/frevoke system calls V5
On Feb 26, 2007, at 13:46:21, H. Peter Anvin wrote: Alan wrote: I'm not sure. Turning, for example, the statat(dir_fd, name == NULL) error case into fstat(dir_fd) sounds like a way for apps, admittedly buggy ones, to be surprised. Maybe libc would be exptected to catch the error before performing the shared system call? At that point would it not be cheaper to have two system calls, the table cost isn't very large. It's not just the table, though, you need two entry points, but even that isn't really all that big either, I guess. Well, I suppose there are multiple possibilities for consolidation: frevokeat(fd, /foo/bar/baz) = normal frevokeat frevokeat(-1, /foo/bar/baz) = revoke(/foo/bar/baz); frevokeat(fd, NULL) = frevoke(fd); Neither of those would ordinarily be considered to do anything useful and for new syscalls I can't see the possibility of breaking existing programs. On the other hand, it's not like we have any problems with the syscall tables getting too large. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html