Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Hi, --On 28 May 2007 12:45:59 PM +1000 David Chinner <[EMAIL PROTECTED]> wrote: On Mon, May 28, 2007 at 11:30:32AM +1000, Neil Brown wrote: Thanks everyone for your input. There was some very valuable observations in the various emails. I will try to pull most of it together and bring out what seem to be the important points. 1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP. Sounds good to me, but how do we test to see if the underlying device supports barriers? Do we just assume that they do and only change behaviour if -o nobarrier is specified in the mount options? I would assume so. Then when the block layer finds that they aren't supported and does non-barrier ones, then it could report a message. We, xfs, I guess can't take much other course of action and we aint doing much now other than not requesting them anymore and printing an error message. 2/ Maybe barriers provide stronger semantics than are required. All write requests are synchronised around a barrier write. This is often more than is required and apparently can cause a measurable slowdown. Also the FUA for the actual commit write might not be needed. It is important for consistency that the preceding writes are in safe storage before the commit write, but it is not so important that the commit write is immediately safe on storage. That isn't needed until a 'sync' or 'fsync' or similar. The use of barriers in XFS assumes the commit write to be on stable storage before it returns. One of the ordering guarantees that we need is that the transaction (commit write) is on disk before the metadata block containing the change in the transaction is written to disk and the current barrier behaviour gives us that. Yep, and that one is what we want the FUA for - for the write into the log. I'm taking it that the FUA write will just guarantee that that particular write has made it to disk on i/o completion (and no write cache flush is done). The other XFS constraint is that we know when the metadata hits the disk so that we can move the tail of the log. And that is what we are effectively getting from the pre-write-flush part of the barrier. It would ensure that any metadata not yet to disk would be on disk before we overwrite the tail of the log. If we could determine cases when we don't have to worry about overwriting the tail of the log, then it would be good if we could just do FUA writes for contraint 1 above. Is that possible? --Tim - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
On Mon, May 28, 2007 at 12:57:53PM +1000, Neil Brown wrote: > On Monday May 28, [EMAIL PROTECTED] wrote: > > On Mon, May 28, 2007 at 11:30:32AM +1000, Neil Brown wrote: > > > Thanks everyone for your input. There was some very valuable > > > observations in the various emails. > > > I will try to pull most of it together and bring out what seem to be > > > the important points. > > > > > > 1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP. > > > > Sounds good to me, but how do we test to see if the underlying > > device supports barriers? Do we just assume that they do and > > only change behaviour if -o nobarrier is specified in the mount > > options? > > What exactly do you want to know, and why do you care? If someone explicitly mounts "-o barrier" and the underlying device cannot do it, then we want to issue a warning or reject the mount. > The idea is that every "struct block_device" supports barriers. If the > underlying hardware doesn't support them directly, then they get > simulated by draining the queue and issuing a flush. Ok. But you also seem to be implying that there will be devices that cannot support barriers. Even if all devices do eventually support barriers, it may take some time before we reach that goal. Why not start by making it easy to determine what the capabilities of each device are. This can then be removed once we reach the holy grail Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
On Monday May 28, [EMAIL PROTECTED] wrote: > On Mon, May 28, 2007 at 11:30:32AM +1000, Neil Brown wrote: > > > > Thanks everyone for your input. There was some very valuable > > observations in the various emails. > > I will try to pull most of it together and bring out what seem to be > > the important points. > > > > > > 1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP. > > Sounds good to me, but how do we test to see if the underlying > device supports barriers? Do we just assume that they do and > only change behaviour if -o nobarrier is specified in the mount > options? > What exactly do you want to know, and why do you care? The idea is that every "struct block_device" supports barriers. If the underlying hardware doesn't support them directly, then they get simulated by draining the queue and issuing a flush. Theoretically there could be devices which have a write-back cache that cannot be flushed, and you couldn't implement barriers on such a device. So throw it out and buy another? As far as I can tell, the only thing XFS does differently with devices that don't support barriers is that it prints a warning message to the kernel logs. If the underlying device printed the message when it detected that barriers couldn't be supported, XFS wouldn't need to care at all. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
On Mon, May 28, 2007 at 11:30:32AM +1000, Neil Brown wrote: > > Thanks everyone for your input. There was some very valuable > observations in the various emails. > I will try to pull most of it together and bring out what seem to be > the important points. > > > 1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP. Sounds good to me, but how do we test to see if the underlying device supports barriers? Do we just assume that they do and only change behaviour if -o nobarrier is specified in the mount options? > 2/ Maybe barriers provide stronger semantics than are required. > > All write requests are synchronised around a barrier write. This is > often more than is required and apparently can cause a measurable > slowdown. > > Also the FUA for the actual commit write might not be needed. It is > important for consistency that the preceding writes are in safe > storage before the commit write, but it is not so important that the > commit write is immediately safe on storage. That isn't needed until > a 'sync' or 'fsync' or similar. The use of barriers in XFS assumes the commit write to be on stable storage before it returns. One of the ordering guarantees that we need is that the transaction (commit write) is on disk before the metadata block containing the change in the transaction is written to disk and the current barrier behaviour gives us that. > One possible alternative is: >- writes can overtake barriers, but barrier cannot overtake writes. No, that breaks the above usage of a barrier >- flush before the barrier, not after. > > This is considerably weaker, and hence cheaper. But I think it is > enough for all filesystems (providing it is still an option to call > blkdev_issue_flush on 'fsync'). No, not enough for XFS. > Another alternative would be to tag each bio was being in a > particular barrier-group. Then bio's in different groups could > overtake each other in either direction, but a BARRIER request must > be totally ordered w.r.t. other requests in the barrier group. > This would require an extra bio field, and would give the filesystem > more appearance of control. I'm not yet sure how much it would > really help... And that assumes the filesystem is tracking exact dependencies between I/Os. Such a mechanism would probably require filesystems to be redesigned to use this, but I can see how it would be useful for doing things like ensuring ordering between just an inode and it's data writes. What would the overhead of having to support several hundred thousand different barrier groups be (i.e. one per dirty inode in a system)? > I think the implementation priorities here are: Depending on the answer to my first question: 0/ implement a specific test for filesystems to run at mount time to determine if barriers are supported or not. > 1/ implement a zero-length BIO_RW_BARRIER option. > 2/ Use it (or otherwise) to make all dm and md modules handle >barriers (and loop?). > 3/ Devise and implement appropriate fall-backs with-in the block layer >so that -EOPNOTSUP is never returned. > 4/ Remove unneeded cruft from filesystems (and elsewhere). Sounds like a good start. ;) Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
On Friday May 25, [EMAIL PROTECTED] wrote: > 2007/5/25, Neil Brown <[EMAIL PROTECTED]>: > > - Are there other bit that we could handle better? > > BIO_RW_FAILFAST? BIO_RW_SYNC? What exactly do they mean? > > > BIO_RW_FAILFAST: means low-level driver shouldn't do much (or no) > error recovery. Mainly used by mutlipath targets to avoid long SCSI > recovery. This should just be propagated when passing requests on. Is it "much" or "no"? Would it be reasonable to use this for reads from a non-degraded raid1? What about writes? What I would really like is some clarification on what sort of errors get retried, how often, and how much timeout there is.. And does the 'error' code returned in ->bi_end_io allow us to differentiate media errors from other errors yet? > > BIO_RW_SYNC: means this is a bio of a synchronous request. I don't > know whether there are more uses to it but this at least causes queues > to be flushed immediately instead of waiting for more requests for a > short time. Should also just be passed on. Otherwise performance gets > poor since something above will rather wait for the current > request/bio to complete instead of sending more. Yes, this one is pretty straight forward.. I mentioned it more as a reminder to my self that I really should support it in raid5 :-( NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Thanks everyone for your input. There was some very valuable observations in the various emails. I will try to pull most of it together and bring out what seem to be the important points. 1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP. This is certainly a very attractive position - it makes the interface cleaner and makes life easier for filesystems and other clients of the block interface. Currently filesystems handle -EOPNOTSUP by a/ resubmitting the request without the BARRIER (after waiting for earlier requests to complete) and b/ possibly printing an error message to the kernel logs. The block layer can do both of these just as easily and it does make sense to do it there. md/dm modules could keep count of requests as has been suggested (though that would be a fairly big change for raid0 as it currently doesn't know when a request completes - bi_endio goes directly to the filesystem). However I think the idea of a zero-length BIO_RW_BARRIER would be a good option. raid0 could send one of these down each device, and when they all return, the barrier request can be sent to it's target device(s). I think this is a worthy goal that we should work towards. 2/ Maybe barriers provide stronger semantics than are required. All write requests are synchronised around a barrier write. This is often more than is required and apparently can cause a measurable slowdown. Also the FUA for the actual commit write might not be needed. It is important for consistency that the preceding writes are in safe storage before the commit write, but it is not so important that the commit write is immediately safe on storage. That isn't needed until a 'sync' or 'fsync' or similar. One possible alternative is: - writes can overtake barriers, but barrier cannot overtake writes. - flush before the barrier, not after. This is considerably weaker, and hence cheaper. But I think it is enough for all filesystems (providing it is still an option to call blkdev_issue_flush on 'fsync'). Another alternative would be to tag each bio was being in a particular barrier-group. Then bio's in different groups could overtake each other in either direction, but a BARRIER request must be totally ordered w.r.t. other requests in the barrier group. This would require an extra bio field, and would give the filesystem more appearance of control. I'm not yet sure how much it would really help... It would allow us to set FUA on all bios with a non-zero barrier-group. That would mean we don't have to flush the entire cache, just those blocks that are critical but I'm still not sure it's a good idea. Of course, these weaker rules would only apply inside the elevator. Once the request goes to the device we need to work with what the device provides, which probably means total-ordering around the barrier. I think this requires more discussion before a way forward is clear. 3/ Do we need explicit control of the 'ordered' mode? Consider a SCSI device that has NV RAM cache. mode_sense reports that write-back is enabled, so _FUA or _FLUSH will be used. But as it is *NV* ram, QUEUE_ORDER_DRAIN is really the best mode. But it seems there is no way to query this information. Using _FLUSH causes the NVRAM to be flushed to media which is a terrible performance problem. Setting SYNC_NV doesn't work on the particular device in question. We currently tell customers to mount with -o nobarriers, but that really feels like the wrong solution. We should be telling the scsi device "don't flush". An advantage of 'nobarriers' is it can go in /etc/fstab. Where would you record that a SCSI drive should be set to QUEUE_ORDERD_DRAIN ?? I think the implementation priorities here are: 1/ implement a zero-length BIO_RW_BARRIER option. 2/ Use it (or otherwise) to make all dm and md modules handle barriers (and loop?). 3/ Devise and implement appropriate fall-backs with-in the block layer so that -EOPNOTSUP is never returned. 4/ Remove unneeded cruft from filesystems (and elsewhere). Comments? Thanks, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XFS: curcular locking re iprune_mutex vs ip->i_iolock->mr_lock
On Sat, May 26, 2007 at 02:29:48AM +0400, Alexey Dobriyan wrote: > === > [ INFO: possible circular locking dependency detected ] > 2.6.22-rc2 #1 > --- > mplayer/16241 is trying to acquire lock: > (iprune_mutex){--..}, at: [] shrink_icache_memory+0x2e/0x16b > > but task is already holding lock: > (&(&ip->i_iolock)->mr_lock){}, at: [] xfs_ilock+0x44/0x86 > > which lock already depends on the new lock. Not A Bug, AFAICT. The locking order on memory reclaim is normally iprune_mutex - xfs_inode->i_iolock. But in this case, because the memory reclaim triggered from blockable_page_cache_readahead(), we've got: xfs_inode->i_iolock - iprune_mutex - some other xfs_inode->i_iolock triggering a warning. This can never produce circular deadlocks as the inodes being pruned have zero references, and the inode we already hold the lock on has >=1 reference so the pruning code won't every see it. So, false positive. What lockdep annotation are we supposed to use to fix this sort of thing? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] AFS: Implement file locking
On Sun, May 27, 2007 at 09:51:10AM +0100, David Howells wrote: > J. Bruce Fields <[EMAIL PROTECTED]> wrote: > > So if I request a write lock while holding a read lock, my request will > > be denied? > > At the moment, yes. Don't the POSIX and flock lock-handling routines in the > kernel normally do that anyway? No, they'd upgrade in that case. > > This is a little strange, though--if there's somebody waiting for a > > write lock on an inode (because somebody else already holds a read lock > > on it), that shouldn't block requests for read locks. > > That depends on whether you want fairness or not. Neither posix nor flock locks delay a lock because of pending conflicting locks. SUS, as I read it, wouldn't allow that. > Allowing read locks to jump the queue like this can lead to starvation > for your writers. If you want fairness the best that you can do is to ensure that when more than one pending lock can be applied, the one that has been waiting longest will be chosen. But you can't make all such lock requests wait for a lock that hasn't even been applied yet. You can contrive examples of applications that would be correct given the standard fcntl behavior, but that would deadlock on a system that didn't allow read locks to jump the queue in the above situation. I have no idea whether such applications actually exist in practice, but I'd be uneasy about changing the standard behavior without inventing some new kind of lock. --b. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
--- Cliffe <[EMAIL PROTECTED]> wrote: > >> On the other hand, if you actually want to protect the _data_, then > tagging the _name_ is flawed; tag the *DATA* instead. > > Would it make sense to label the data (resource) with a list of paths > (names) that can be used to access it? Program Access Lists (PALs) were* a feature of UNICOS. PALs could contain not only the list of programs that could use them, but what attributes the processes required as well. Further, you could restrict or raise privilege based on the uid, gid, MAC label, and privilege state of the process during exec based on the PAL. > Therefore the data would be protected against being accessed via > alternative arbitrary names. This may be a simple label to maintain and > (possibly to) enforce, allowing path based confinement to protect a > resource. This may allow for the benefits of pathname based confinement > while avoiding some of its problems. Yep, but you still have the label based system issues, the classic case being the text editor that uses "creat new", "unlink old", "rename new to old". When the labeling scheme is more sopisticated than "object gets label of subject" label management becomes a major issue. > Obviously this would not protect against a pathname pointing to > arbitrary data Protecting special data is easy. Protecting arbitrary data is the problem. > Just a thought. Not a bad one, and it would be an easy and fun LSM to create. If I were teaching a Linux kernel programming course I would consider it for a class project. - * I have used the past tense here in spite of the many instances of UNICOS still in operation. I don't believe that there is any current development on UNICOS. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
On May 27, 2007, at 03:25:27, Toshiharu Harada wrote: 2007/5/27, Kyle Moffett <[EMAIL PROTECTED]>: On May 26, 2007, at 19:08:56, Toshiharu Harada wrote: 2007/5/27, James Morris <[EMAIL PROTECTED]>: On Sat, 26 May 2007, Kyle Moffett wrote: AppArmor). On the other hand, if you actually want to protect the _data_, then tagging the _name_ is flawed; tag the *DATA* instead. Bingo. (This is how traditional Unix DAC has always functioned, and is what SELinux does: object labeling). Object labeling (or labeled security) looks simple and straight forward way, but it's not. (1) Object labeling has a assumption that labels are always properly defined and maintained. This can not be easily achieved. That's a circular argument, and a fairly trivial one at that. Sorry Kyle, I don't think it's a trivial one. The opposite. How is that argument not trivially circular? "Foo has an assumption that foo-property is always properly defined and maintained." That could be said about *anything*: * Unix permissions have an assumption that mode bits are always properly defined and maintained * Apache .htaccess security has an assumtion that .htaccess files are always properly defined and maintained. * Functional email communication has an assumption that the email servers are always properly defined and maintained If you can't properly manage your labels, then how do you expect any security at all? Please read my message again. I didn't say, "This can never be achieved". I said, "This can not be easily achieved". So you said "(data labels) can not be easily achieved". My question for you is: How do you manage secure UNIX systems without standard UNIX permission bits? Also: If you have problems with data labels then what makes pathname based labels "easier"? If there is something that could be done to improve SELinux and make it more readily configurable then it should probably be done. If you can't achieve the first with reasonable security, then you probably can't achieve the second either. Also, if you can't manage correct object labeling then I'm very interested in how you are maintaining secure Linux systems without standard DAC. I'm very interested in how you can know that you have the correct object labeling (this is my point). Could you tell? I know that I have the correct object labeling because: 1) I rewrote/modified the default policy to be extremely strict on the system where I wanted the extra security and hassle. 2) I ensured that the type transitions were in place for almost everything that needed to be done to administer the system. 3) I wrote a file-contexts file and relabeled *once* 4) I loaded the customized policy plus policy for restorecon and relabeled for the last time 5) I reloaded the customized policy without restorecon privileges and without the ability to reload the policy again. 6) I never reboot the system without enforcing mode. 7) If there are unexpected errors or files have incorrect labels, I have to get the security auditor to log in on the affected system and relabel the problematic files manually (rare occurrence which requires excessive amounts of paperwork). (2) Also, assigning a label is something like inventing and assigning a *new* name (label name) to objects which can cause flaws. I don't understand how assigning new attributes to objects "can cause flaws", nor what flaws those might be; could you elaborate further? In particular, I don't see how this is really all that more complicated than defining additional access control in apache .htaccess files. The principle is the same: by stacking multiple independent security-verification mechanisms (Classical UNIX DAC and Apache permissions) you can increase security, albeit at an increased management cost. You might also note that ".htaccess" files are yet another form of successful "label-based" security; the security context for a directory depends on the .htaccess "label" file found within. The *exact* same principles apply to SELinux: you add additional attributes backed by a simple and powerful state-machine. The cross-checks are lower-level than those from .htaccess files, but the principles are the same. I don't deny DAC at all. If we deny DAC, we can't live with Linux it's the base. MAC can be used to cover the shortages of DAC and Linux's simple user model, that's it. From security point of view, simplicity is always the virtue and the way to go. Inode combined label is guaranteed to be a single at any point time. This is the most noticeable advantage of label- based security. I would argue that pathname-based security breaks the "simplicity is the best virtue (of a security system)" paradigm, because it attributes multiple potentially-conflicting labels to the same piece of data. It also cannot protect the secrecy of specific *data* as well as SELi
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
CC trimmed to remove a few poor overloaded inboxes from this tangent. On May 27, 2007, at 04:34:10, Cliffe wrote: Kyle wrote: On the other hand, if you actually want to protect the _data_, then tagging the _name_ is flawed; tag the *DATA* instead. Would it make sense to label the data (resource) with a list of paths (names) that can be used to access it? Therefore the data would be protected against being accessed via alternative arbitrary names. This may be a simple label to maintain and (possibly to) enforce, allowing path based confinement to protect a resource. This may allow for the benefits of pathname based confinement while avoiding some of its problems. The primary problem with that is that "mv somefile otherfile" must change the labels, which means that every process that issues a rename () syscall needs to have special handling of labels. The other problem is that many of the features and capabilities of SELinux get left by the wayside. On an SELinux system 90% of the programs don't need to be modified to understand labels, since the policy can define automatic label transitions. SELinux also allows you to have conditional label privileges based on boolean variables, something that cannot be done if the privileges themselves are stored in the filesystem. Finally, such an approach does not allow you to differentiate between programs. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
>> On the other hand, if you actually want to protect the _data_, then tagging the _name_ is flawed; tag the *DATA* instead. Would it make sense to label the data (resource) with a list of paths (names) that can be used to access it? Therefore the data would be protected against being accessed via alternative arbitrary names. This may be a simple label to maintain and (possibly to) enforce, allowing path based confinement to protect a resource. This may allow for the benefits of pathname based confinement while avoiding some of its problems. Obviously this would not protect against a pathname pointing to arbitrary data… Just a thought. Z. Cliffe Schreuders. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] AFS: Implement file locking
J. Bruce Fields <[EMAIL PROTECTED]> wrote: > > > Do you allow upgrades and downgrades? (Just curious.) > > > > AFS does not, as far as I know. > > So if I request a write lock while holding a read lock, my request will > be denied? At the moment, yes. Don't the POSIX and flock lock-handling routines in the kernel normally do that anyway? > This is a little strange, though--if there's somebody waiting for a > write lock on an inode (because somebody else already holds a read lock > on it), that shouldn't block requests for read locks. That depends on whether you want fairness or not. Allowing read locks to jump the queue like this can lead to starvation for your writers. David - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook
2007/5/27, Kyle Moffett <[EMAIL PROTECTED]>: On May 26, 2007, at 19:08:56, Toshiharu Harada wrote: > 2007/5/27, James Morris <[EMAIL PROTECTED]>: >> On Sat, 26 May 2007, Kyle Moffett wrote: >>> AppArmor). On the other hand, if you actually want to protect >>> the _data_, then tagging the _name_ is flawed; tag the *DATA* >>> instead. >> >> Bingo. >> >> (This is how traditional Unix DAC has always functioned, and is >> what SELinux does: object labeling). > > Object labeling (or labeled security) looks simple and straight > forward way, but it's not. > > (1) Object labeling has a assumption that labels are always > properly defined and maintained. This can not be easily achieved. That's a circular argument, and a fairly trivial one at that. Sorry Kyle, I don't think it's a trivial one. The opposite. If you can't properly manage your labels, then how do you expect any security at all? Please read my message again. I didn't say, "This can never be achieved". I said, "This can not be easily achieved". If you can't manage your "labels", then pathname- based security won't work either. This is analogous to saying "Pathname-based security has an assumption that path-permissions are always properly defined and maintained", which is equally obvious. Yes,! You got the point. Both label-based and pathname-based apporaches have the advantaes and difficluties. In that sense they are equal. That's what I wanted to say. Both approaches can be improved and even can be used combined to overcome the difficulties. Let's not fight against and think together, then we can make Linux better. If you can't achieve the first with reasonable security, then you probably can't achieve the second either. Also, if you can't manage correct object labeling then I'm very interested in how you are maintaining secure Linux systems without standard DAC. I'm very interested in how you can know that you have the correct object labeling (this is my point). Could you tell? I assume your best efforts end up with - have a proper fc definitions and guard them (this can be done) - executes relabeling as needed (best efforts) - hope everything work fine > (2) Also, assigning a label is something like inventing and > assigning a *new* name (label name) to objects which can cause flaws. I don't understand how assigning new attributes to objects "can cause flaws", nor what flaws those might be; could you elaborate further? In particular, I don't see how this is really all that more complicated than defining additional access control in apache .htaccess files. The principle is the same: by stacking multiple independent security-verification mechanisms (Classical UNIX DAC and Apache permissions) you can increase security, albeit at an increased management cost. You might also note that ".htaccess" files are yet another form of successful "label-based" security; the security context for a directory depends on the .htaccess "label" file found within. The *exact* same principles apply to SELinux: you add additional attributes backed by a simple and powerful state- machine. The cross-checks are lower-level than those from .htaccess files, but the principles are the same. I don't deny DAC at all. If we deny DAC, we can't live with Linux it's the base. MAC can be used to cover the shortages of DAC and Linux's simple user model, that's it. From security point of view, simplicity is always the virtue and the way to go. Inode combined label is guaranteed to be a single at any point time. This is the most noticeable advantage of label-based security. But writing policy with labels are somewhat indirect way (I mean, we need "ls -Z" or "ps -Z"). Indirect way can cause flaw so we need a lot of work that is what I wanted to tell. Cheers, Toshiharu Harada [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html