Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-27 Thread Timothy Shimmin

Hi,

--On 28 May 2007 12:45:59 PM +1000 David Chinner <[EMAIL PROTECTED]> wrote:


On Mon, May 28, 2007 at 11:30:32AM +1000, Neil Brown wrote:


Thanks everyone for your input.  There was some very valuable
observations in the various emails.
I will try to pull most of it together and bring out what seem to be
the important points.


1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP.


Sounds good to me, but how do we test to see if the underlying
device supports barriers? Do we just assume that they do and
only change behaviour if -o nobarrier is specified in the mount
options?


I would assume so.
Then when the block layer finds that they aren't supported and does
non-barrier ones, then it could report a message.
We, xfs, I guess can't take much other course of action
and we aint doing much now other than not requesting them
anymore and printing an error message.


2/ Maybe barriers provide stronger semantics than are required.

 All write requests are synchronised around a barrier write.  This is
 often more than is required and apparently can cause a measurable
 slowdown.

 Also the FUA for the actual commit write might not be needed.  It is
 important for consistency that the preceding writes are in safe
 storage before the commit write, but it is not so important that the
 commit write is immediately safe on storage.  That isn't needed until
 a 'sync' or 'fsync' or similar.


The use of barriers in XFS assumes the commit write to be on stable
storage before it returns.  One of the ordering guarantees that we
need is that the transaction (commit write) is on disk before the
metadata block containing the change in the transaction is written
to disk and the current barrier behaviour gives us that.


Yep, and that one is what we want the FUA for -
for the write into the log.

I'm taking it that the FUA write will just guarantee that that
particular write has made it to disk on i/o completion
(and no write cache flush is done).

The other XFS constraint is that we know when the metadata hits the disk
so that we can move the tail of the log.
And that is what we are effectively getting from the pre-write-flush
part of the barrier. It would ensure that any metadata not yet to disk would
be on disk before we overwrite the tail of the log.
If we could determine cases when we don't have to worry about overwriting
the tail of the log, then it would be good if we could
just do FUA writes for contraint 1 above. Is that possible?

--Tim


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-27 Thread David Chinner
On Mon, May 28, 2007 at 12:57:53PM +1000, Neil Brown wrote:
> On Monday May 28, [EMAIL PROTECTED] wrote:
> > On Mon, May 28, 2007 at 11:30:32AM +1000, Neil Brown wrote:
> > > Thanks everyone for your input.  There was some very valuable
> > > observations in the various emails.
> > > I will try to pull most of it together and bring out what seem to be
> > > the important points.
> > > 
> > > 1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP.
> > 
> > Sounds good to me, but how do we test to see if the underlying
> > device supports barriers? Do we just assume that they do and
> > only change behaviour if -o nobarrier is specified in the mount
> > options?
> 
> What exactly do you want to know, and why do you care?

If someone explicitly mounts "-o barrier" and the underlying device
cannot do it, then we want to issue a warning or reject the
mount.

> The idea is that every "struct block_device" supports barriers.  If the
> underlying hardware doesn't support them directly, then they get
> simulated by draining the queue and issuing a flush.

Ok. But you also seem to be implying that there will be devices that
cannot support barriers.

Even if all devices do eventually support barriers, it may take some
time before we reach that goal.  Why not start by making it easy to
determine what the capabilities of each device are. This can then be
removed once we reach the holy grail

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-27 Thread Neil Brown
On Monday May 28, [EMAIL PROTECTED] wrote:
> On Mon, May 28, 2007 at 11:30:32AM +1000, Neil Brown wrote:
> > 
> > Thanks everyone for your input.  There was some very valuable
> > observations in the various emails.
> > I will try to pull most of it together and bring out what seem to be
> > the important points.
> > 
> > 
> > 1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP.
> 
> Sounds good to me, but how do we test to see if the underlying
> device supports barriers? Do we just assume that they do and
> only change behaviour if -o nobarrier is specified in the mount
> options?
> 

What exactly do you want to know, and why do you care?

The idea is that every "struct block_device" supports barriers.  If the
underlying hardware doesn't support them directly, then they get
simulated by draining the queue and issuing a flush.
Theoretically there could be devices which have a write-back cache
that cannot be flushed, and you couldn't implement barriers on such a
device.  So throw it out and buy another?

As far as I can tell, the only thing XFS does differently with devices
that don't support barriers is that it prints a warning message to the
kernel logs.  If the underlying device printed the message when it
detected that barriers couldn't be supported, XFS wouldn't need to
care at all.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-27 Thread David Chinner
On Mon, May 28, 2007 at 11:30:32AM +1000, Neil Brown wrote:
> 
> Thanks everyone for your input.  There was some very valuable
> observations in the various emails.
> I will try to pull most of it together and bring out what seem to be
> the important points.
> 
> 
> 1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP.

Sounds good to me, but how do we test to see if the underlying
device supports barriers? Do we just assume that they do and
only change behaviour if -o nobarrier is specified in the mount
options?

> 2/ Maybe barriers provide stronger semantics than are required.
> 
>  All write requests are synchronised around a barrier write.  This is
>  often more than is required and apparently can cause a measurable
>  slowdown.
> 
>  Also the FUA for the actual commit write might not be needed.  It is
>  important for consistency that the preceding writes are in safe
>  storage before the commit write, but it is not so important that the
>  commit write is immediately safe on storage.  That isn't needed until
>  a 'sync' or 'fsync' or similar.

The use of barriers in XFS assumes the commit write to be on stable
storage before it returns.  One of the ordering guarantees that we
need is that the transaction (commit write) is on disk before the
metadata block containing the change in the transaction is written
to disk and the current barrier behaviour gives us that.

>  One possible alternative is:
>- writes can overtake barriers, but barrier cannot overtake writes.

No, that breaks the above usage of a barrier

>- flush before the barrier, not after.
> 
>  This is considerably weaker, and hence cheaper. But I think it is
>  enough for all filesystems (providing it is still an option to call
>  blkdev_issue_flush on 'fsync').

No, not enough for XFS.

>  Another alternative would be to tag each bio was being in a
>  particular barrier-group.  Then bio's in different groups could
>  overtake each other in either direction, but a BARRIER request must
>  be totally ordered w.r.t. other requests in the barrier group.
>  This would require an extra bio field, and would give the filesystem
>  more appearance of control.  I'm not yet sure how much it would
>  really help...

And that assumes the filesystem is tracking exact dependencies
between I/Os.  Such a mechanism would probably require filesystems
to be redesigned to use this, but I can see how it would be useful
for doing things like ensuring ordering between just an inode and
it's data writes.  What would the overhead of having to support
several hundred thousand different barrier groups be (i.e. one per
dirty inode in a system)?

> I think the implementation priorities here are:

Depending on the answer to my first question:

0/ implement a specific test for filesystems to run at mount time
   to determine if barriers are supported or not.

> 1/ implement a zero-length BIO_RW_BARRIER option.
> 2/ Use it (or otherwise) to make all dm and md modules handle
>barriers (and loop?).
> 3/ Devise and implement appropriate fall-backs with-in the block layer
>so that  -EOPNOTSUP is never returned.
> 4/ Remove unneeded cruft from filesystems (and elsewhere).

Sounds like a good start. ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-27 Thread Neil Brown
On Friday May 25, [EMAIL PROTECTED] wrote:
> 2007/5/25, Neil Brown <[EMAIL PROTECTED]>:
> >  - Are there other bit that we could handle better?
> > BIO_RW_FAILFAST?  BIO_RW_SYNC?  What exactly do they mean?
> >
> BIO_RW_FAILFAST: means low-level driver shouldn't do much (or no)
> error recovery. Mainly used by mutlipath targets to avoid long SCSI
> recovery. This should just be propagated when passing requests on.

Is it "much" or "no"?
Would it be reasonable to use this for reads from a non-degraded
raid1?  What about writes?

What I would really like is some clarification on what sort of errors
get retried, how often, and how much timeout there is..

And does the 'error' code returned in ->bi_end_io allow us to
differentiate media errors from other errors yet?

> 
> BIO_RW_SYNC: means this is a bio of a synchronous request. I don't
> know whether there are more uses to it but this at least causes queues
> to be flushed immediately instead of waiting for more requests for a
> short time. Should also just be passed on. Otherwise performance gets
> poor since something above will rather wait for the current
> request/bio to complete instead of sending more.

Yes, this one is pretty straight forward.. I mentioned it more as a
reminder to my self that I really should support it in raid5 :-(

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-27 Thread Neil Brown

Thanks everyone for your input.  There was some very valuable
observations in the various emails.
I will try to pull most of it together and bring out what seem to be
the important points.


1/ A BIO_RW_BARRIER request should never fail with -EOPNOTSUP.

 This is certainly a very attractive position - it makes the interface
 cleaner and makes life easier for filesystems and other clients of
 the block interface.
 Currently filesystems handle -EOPNOTSUP by
  a/ resubmitting the request without the BARRIER (after waiting for
earlier requests to complete) and
  b/ possibly printing an error message to the kernel logs.

 The block layer can do both of these just as easily and it does make
 sense to do it there.

 md/dm modules could keep count of requests as has been suggested
 (though that would be a fairly big change for raid0 as it currently
 doesn't know when a request completes - bi_endio goes directly to the
 filesystem). 
 However I think the idea of a zero-length BIO_RW_BARRIER would be a
 good option.  raid0 could send one of these down each device, and
 when they all return, the barrier request can be sent to it's target
 device(s).

 I think this is a worthy goal that we should work towards.

2/ Maybe barriers provide stronger semantics than are required.

 All write requests are synchronised around a barrier write.  This is
 often more than is required and apparently can cause a measurable
 slowdown.

 Also the FUA for the actual commit write might not be needed.  It is
 important for consistency that the preceding writes are in safe
 storage before the commit write, but it is not so important that the
 commit write is immediately safe on storage.  That isn't needed until
 a 'sync' or 'fsync' or similar.

 One possible alternative is:
   - writes can overtake barriers, but barrier cannot overtake writes.
   - flush before the barrier, not after.

 This is considerably weaker, and hence cheaper. But I think it is
 enough for all filesystems (providing it is still an option to call
 blkdev_issue_flush on 'fsync').

 Another alternative would be to tag each bio was being in a
 particular barrier-group.  Then bio's in different groups could
 overtake each other in either direction, but a BARRIER request must
 be totally ordered w.r.t. other requests in the barrier group.
 This would require an extra bio field, and would give the filesystem
 more appearance of control.  I'm not yet sure how much it would
 really help...
 It would allow us to set FUA on all bios with a non-zero
 barrier-group.  That would mean we don't have to flush the entire
 cache, just those blocks that are critical but I'm still not sure
 it's a good idea.

 Of course, these weaker rules would only apply inside the elevator.
 Once the request goes to the device we need to work with what the
 device provides, which probably means total-ordering around the
 barrier. 

 I think this requires more discussion before a way forward is clear.

3/ Do we need explicit control of the 'ordered' mode?

  Consider a SCSI device that has NV RAM cache.  mode_sense reports
  that write-back is enabled, so _FUA or _FLUSH will be used.
  But as it is *NV* ram, QUEUE_ORDER_DRAIN is really the best mode.
  But it seems there is no way to query this information.
  Using _FLUSH causes the NVRAM to be flushed to media which is a
  terrible performance problem.
  Setting SYNC_NV doesn't work on the particular device in question.
  We currently tell customers to mount with -o nobarriers, but that
  really feels like the wrong solution.  We should be telling the scsi
  device "don't flush".
  An advantage of 'nobarriers' is it can go in /etc/fstab.  Where
  would you record that a SCSI drive should be set to
  QUEUE_ORDERD_DRAIN ??


I think the implementation priorities here are:

1/ implement a zero-length BIO_RW_BARRIER option.
2/ Use it (or otherwise) to make all dm and md modules handle
   barriers (and loop?).
3/ Devise and implement appropriate fall-backs with-in the block layer
   so that  -EOPNOTSUP is never returned.
4/ Remove unneeded cruft from filesystems (and elsewhere).

Comments?

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XFS: curcular locking re iprune_mutex vs ip->i_iolock->mr_lock

2007-05-27 Thread David Chinner
On Sat, May 26, 2007 at 02:29:48AM +0400, Alexey Dobriyan wrote:
> ===
> [ INFO: possible circular locking dependency detected ]
> 2.6.22-rc2 #1
> ---
> mplayer/16241 is trying to acquire lock:
>  (iprune_mutex){--..}, at: [] shrink_icache_memory+0x2e/0x16b
> 
> but task is already holding lock:
>  (&(&ip->i_iolock)->mr_lock){}, at: [] xfs_ilock+0x44/0x86
> 
> which lock already depends on the new lock.

Not A Bug, AFAICT. The locking order on memory reclaim is normally
iprune_mutex - xfs_inode->i_iolock.

But in this case, because the memory reclaim triggered from
blockable_page_cache_readahead(), we've got:

xfs_inode->i_iolock - iprune_mutex - some other xfs_inode->i_iolock

triggering a warning. This can never produce circular deadlocks
as the inodes being pruned have zero references, and the inode we
already hold the lock on has >=1 reference so the pruning code
won't every see it.

So, false positive. What lockdep annotation are we supposed to
use to fix this sort of thing?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] AFS: Implement file locking

2007-05-27 Thread J. Bruce Fields
On Sun, May 27, 2007 at 09:51:10AM +0100, David Howells wrote:
> J. Bruce Fields <[EMAIL PROTECTED]> wrote:
> > So if I request a write lock while holding a read lock, my request will
> > be denied?
> 
> At the moment, yes.  Don't the POSIX and flock lock-handling routines in the
> kernel normally do that anyway?

No, they'd upgrade in that case.

> > This is a little strange, though--if there's somebody waiting for a
> > write lock on an inode (because somebody else already holds a read lock
> > on it), that shouldn't block requests for read locks.
> 
> That depends on whether you want fairness or not.

Neither posix nor flock locks delay a lock because of pending
conflicting locks.  SUS, as I read it, wouldn't allow that.

> Allowing read locks to jump the queue like this can lead to starvation
> for your writers.

If you want fairness the best that you can do is to ensure that when
more than one pending lock can be applied, the one that has been waiting
longest will be chosen.  But you can't make all such lock requests wait
for a lock that hasn't even been applied yet.

You can contrive examples of applications that would be correct given
the standard fcntl behavior, but that would deadlock on a system that
didn't allow read locks to jump the queue in the above situation.  I
have no idea whether such applications actually exist in practice, but
I'd be uneasy about changing the standard behavior without inventing
some new kind of lock.

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-27 Thread Casey Schaufler

--- Cliffe <[EMAIL PROTECTED]> wrote:

>  >> On the other hand, if you actually want to protect the _data_, then 
> tagging the _name_ is flawed; tag the *DATA* instead.
> 
> Would it make sense to label the data (resource) with a list of paths 
> (names) that can be used to access it?

Program Access Lists (PALs) were* a feature of UNICOS. PALs could
contain not only the list of programs that could use them, but what
attributes the processes required as well. Further, you could
restrict or raise privilege based on the uid, gid, MAC label, and
privilege state of the process during exec based on the PAL. 

> Therefore the data would be protected against being accessed via 
> alternative arbitrary names. This may be a simple label to maintain and 
> (possibly to) enforce, allowing path based confinement to protect a 
> resource. This may allow for the benefits of pathname based confinement 
> while avoiding some of its problems.

Yep, but you still have the label based system issues, the classic
case being the text editor that uses "creat new", "unlink old",
"rename new to old". When the labeling scheme is more sopisticated
than "object gets label of subject" label management becomes a major
issue.

> Obviously this would not protect against a pathname pointing to 
> arbitrary data…

Protecting special data is easy. Protecting arbitrary data is the
problem.

> Just a thought.

Not a bad one, and it would be an easy and fun LSM to create.
If I were teaching a Linux kernel programming course I would
consider it for a class project.

-
* I have used the past tense here in spite of the many
  instances of UNICOS still in operation. I don't believe
  that there is any current development on UNICOS.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-27 Thread Kyle Moffett

On May 27, 2007, at 03:25:27, Toshiharu Harada wrote:

2007/5/27, Kyle Moffett <[EMAIL PROTECTED]>:

On May 26, 2007, at 19:08:56, Toshiharu Harada wrote:

2007/5/27, James Morris <[EMAIL PROTECTED]>:

On Sat, 26 May 2007, Kyle Moffett wrote:
AppArmor).  On the other hand, if you actually want to protect  
the _data_, then tagging the _name_ is flawed; tag the *DATA*  
instead.


Bingo.

(This is how traditional Unix DAC has always functioned, and is  
what SELinux does: object labeling).


Object labeling (or labeled security) looks simple and straight  
forward way, but it's not.


(1) Object labeling has a assumption that labels are always  
properly defined and maintained. This can not be easily achieved.


That's a circular argument, and a fairly trivial one at that.


Sorry Kyle, I don't think it's a trivial one.  The opposite.


How is that argument not trivially circular?  "Foo has an assumption  
that foo-property is always properly defined and maintained."  That  
could be said about *anything*:
  *  Unix permissions have an assumption that mode bits are always  
properly defined and maintained
  *  Apache .htaccess security has an assumtion that .htaccess files  
are always properly defined and maintained.
  *  Functional email communication has an assumption that the email  
servers are always properly defined and maintained


If you can't properly manage your labels, then how do you expect  
any security at all?


Please read my message again. I didn't say, "This can never be  
achieved".  I said, "This can not be easily achieved".


So you said "(data labels) can not be easily achieved".  My question  
for you is: How do you manage secure UNIX systems without standard  
UNIX permission bits?  Also:  If you have problems with data labels  
then what makes pathname based labels "easier"?  If there is  
something that could be done to improve SELinux and make it more  
readily configurable then it should probably be done.


If you can't achieve the first with reasonable security, then you  
probably can't achieve the second either.  Also, if you can't  
manage correct object labeling then I'm very interested in how you  
are maintaining secure Linux systems without standard DAC.


I'm very interested in how you can know that you have the correct  
object labeling (this is my point). Could you tell?


I know that I have the correct object labeling because:
  1) I rewrote/modified the default policy to be extremely strict on  
the system where I wanted the extra security and hassle.
  2) I ensured that the type transitions were in place for almost  
everything that needed to be done to administer the system.

  3) I wrote a file-contexts file and relabeled *once*
  4) I loaded the customized policy plus policy for restorecon and  
relabeled for the last time
  5) I reloaded the customized policy without restorecon privileges  
and without the ability to reload the policy again.

  6) I never reboot the system without enforcing mode.
  7) If there are unexpected errors or files have incorrect labels,  
I have to get the security auditor to log in on the affected system  
and relabel the problematic files manually (rare occurrence which  
requires excessive amounts of paperwork).


(2) Also, assigning a label is something like inventing and  
assigning a *new* name (label name) to objects which can cause  
flaws.


I don't understand how assigning new attributes to objects "can  
cause flaws", nor what flaws those might be; could you elaborate  
further? In particular, I don't see how this is really all that  
more complicated than defining additional access control in  
apache .htaccess files.  The principle is the same:  by stacking  
multiple independent security-verification mechanisms (Classical  
UNIX DAC and Apache permissions) you can increase security, albeit  
at an increased management cost.  You might also note that  
".htaccess" files are yet another form of successful "label-based"  
security; the security context for a directory depends on  
the .htaccess "label" file found within.  The *exact* same  
principles apply to SELinux: you add additional attributes backed  
by a simple and powerful state-machine.  The cross-checks are  
lower-level than those from .htaccess files, but the principles  
are the same.


I don't deny DAC at all.  If we deny DAC, we can't live with Linux  
it's the base.  MAC can be used to cover the shortages of DAC and  
Linux's simple user model, that's it.


From security point of view, simplicity is always the virtue and  
the way to go.  Inode combined label is guaranteed to be a single  
at any point time.  This is the most noticeable advantage of label- 
based security.


I would argue that pathname-based security breaks the "simplicity is  
the best virtue (of a security system)" paradigm, because it  
attributes multiple potentially-conflicting labels to the same piece  
of data.  It also cannot protect the secrecy of specific *data* as  
well as SELi

Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-27 Thread Kyle Moffett

CC trimmed to remove a few poor overloaded inboxes from this tangent.

On May 27, 2007, at 04:34:10, Cliffe wrote:

Kyle wrote:
On the other hand, if you actually want to protect the _data_,  
then tagging the _name_ is flawed; tag the *DATA* instead.


Would it make sense to label the data (resource) with a list of  
paths (names) that can be used to access it?


Therefore the data would be protected against being accessed via  
alternative arbitrary names. This may be a simple label to maintain  
and (possibly to) enforce, allowing path based confinement to  
protect a resource. This may allow for the benefits of pathname  
based confinement while avoiding some of its problems.


The primary problem with that is that "mv somefile otherfile" must  
change the labels, which means that every process that issues a rename 
() syscall needs to have special handling of labels.  The other  
problem is that many of the features and capabilities of SELinux get  
left by the wayside.  On an SELinux system 90% of the programs don't  
need to be modified to understand labels, since the policy can define  
automatic label transitions.  SELinux also allows you to have  
conditional label privileges based on boolean variables, something  
that cannot be done if the privileges themselves are stored in the  
filesystem.  Finally, such an approach does not allow you to  
differentiate between programs.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-27 Thread Cliffe
>> On the other hand, if you actually want to protect the _data_, then 
tagging the _name_ is flawed; tag the *DATA* instead.


Would it make sense to label the data (resource) with a list of paths 
(names) that can be used to access it?


Therefore the data would be protected against being accessed via 
alternative arbitrary names. This may be a simple label to maintain and 
(possibly to) enforce, allowing path based confinement to protect a 
resource. This may allow for the benefits of pathname based confinement 
while avoiding some of its problems.


Obviously this would not protect against a pathname pointing to 
arbitrary data…



Just a thought.

Z. Cliffe Schreuders.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] AFS: Implement file locking

2007-05-27 Thread David Howells
J. Bruce Fields <[EMAIL PROTECTED]> wrote:

> > > Do you allow upgrades and downgrades?  (Just curious.)
> > 
> > AFS does not, as far as I know.
> 
> So if I request a write lock while holding a read lock, my request will
> be denied?

At the moment, yes.  Don't the POSIX and flock lock-handling routines in the
kernel normally do that anyway?

> This is a little strange, though--if there's somebody waiting for a
> write lock on an inode (because somebody else already holds a read lock
> on it), that shouldn't block requests for read locks.

That depends on whether you want fairness or not.  Allowing read locks to jump
the queue like this can lead to starvation for your writers.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-27 Thread Toshiharu Harada

2007/5/27, Kyle Moffett <[EMAIL PROTECTED]>:

On May 26, 2007, at 19:08:56, Toshiharu Harada wrote:
> 2007/5/27, James Morris <[EMAIL PROTECTED]>:
>> On Sat, 26 May 2007, Kyle Moffett wrote:
>>> AppArmor).  On the other hand, if you actually want to protect
>>> the _data_, then tagging the _name_ is flawed; tag the *DATA*
>>> instead.
>>
>> Bingo.
>>
>> (This is how traditional Unix DAC has always functioned, and is
>> what SELinux does: object labeling).
>
> Object labeling (or labeled security) looks simple and straight
> forward way, but it's not.
>
> (1) Object labeling has a assumption that labels are always
> properly defined and maintained. This can not be easily achieved.

That's a circular argument, and a fairly trivial one at that.


Sorry Kyle, I don't think it's a trivial one.  The opposite.


If you can't properly manage your labels, then how do you expect any
security at all?


Please read my message again. I didn't say, "This can never be achieved".
I said, "This can not be easily achieved".


 If you can't manage your "labels", then pathname-
based security won't work either.  This is analogous to saying
"Pathname-based security has an assumption that path-permissions are
always properly defined and maintained", which is equally obvious.


Yes,! You got the point.
Both label-based and pathname-based apporaches have the advantaes and
difficluties.
In that sense they are equal. That's what I wanted to say.
Both approaches can be improved and even can be used combined to
overcome the difficulties. Let's not fight against and think together,
then we can make Linux better.


If you can't achieve the first with reasonable security, then you
probably can't achieve the second either.  Also, if you can't manage
correct object labeling then I'm very interested in how you are
maintaining secure Linux systems without standard DAC.


I'm very interested in how you can know that you have the
correct object labeling (this is my point). Could you tell?
I assume your best efforts end up with
- have a proper fc definitions and guard them (this can be done)
- executes relabeling as needed (best efforts)
- hope everything work fine


> (2) Also, assigning a label is something like inventing and
> assigning a *new* name (label name) to objects which can cause flaws.

I don't understand how assigning new attributes to objects "can cause
flaws", nor what flaws those might be; could you elaborate further?
In particular, I don't see how this is really all that more
complicated than defining additional access control in
apache .htaccess files.  The principle is the same:  by stacking
multiple independent security-verification mechanisms (Classical UNIX
DAC and Apache permissions) you can increase security, albeit at an
increased management cost.  You might also note that ".htaccess"
files are yet another form of successful "label-based" security; the
security context for a directory depends on the .htaccess "label"
file found within.  The *exact* same principles apply to SELinux: you
add additional attributes backed by a simple and powerful state-
machine.  The cross-checks are lower-level than those from .htaccess
files, but the principles are the same.


I don't deny DAC at all.  If we deny DAC, we can't live with Linux
it's the base.  MAC can be used to cover the shortages of DAC and
Linux's simple user model, that's it.


From security point of view, simplicity is always the virtue and the way to go.

Inode combined label is guaranteed to be a single at any point time.
This is the most noticeable advantage of label-based security.
But writing policy with labels are somewhat indirect way (I mean, we need
"ls -Z" or "ps -Z").  Indirect way can cause flaw so we need a lot of work
that is  what I wanted to tell.

Cheers,
Toshiharu Harada
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html