Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Kyle Moffett

On Sep 27, 2007, at 17:34:45, Greg KH wrote:

On Thu, Sep 27, 2007 at 02:37:42PM -0400, Theodore Tso wrote:
That fact that sysfs is all laid out in a directory, but for which  
some directories/symlinks are OK to use, and some are NOT OK to  
use --- is why I call the sysfs interface an open pit.


And because of the original design mistakes, we have only been able  
to change things for the better in a slow manner.  We have had  
userspace programs fixed up for _years_ before we are able to make  
the corresponding changes in the kernel, so as to not break the  
distros that are slow to upgrade packages and kernels (like Debian.)


Hey!  No poking fingers at Debian here; it's been *MUCH* improved  
lately.  I far more frequently have problems with boxes still running  
some ancient release of RHEL-4 or something than I do with those  
running Debian stable (virtually always the latest Debian stable).


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/25] Unionfs: add un/likely conditionals on copyup ops

2007-09-26 Thread Kyle Moffett

On Sep 26, 2007, at 09:40:20, Erez Zadok wrote:

In message [EMAIL PROTECTED], Kok, Auke writes:
I've been told several times that adding these is almost always  
bogus - either it messes up the CPU branch prediction or the  
compiler/CPU just does a lot better at finding the right way  
without these hints.


Adding them as a blanket seems rather strange. Have you got any  
numbers that this really improves performance?


Auke, that's a good question, but I found it hard to find any info  
about it.  There's no discussion on it in Documentation/, and very  
little I could find elsewhere.  I did see one url explaining what  
un/likely does precisely, but no guidelines.  My understanding is  
that it can improve performance, as long as it's used carefully  
(otherwise it may hurt performance).


Hmm, even still I agree with Auke, you probably use it too much.


Recently we've done a full audit of the entire code, and added un/ 
likely where we felt that the chance of succeeding is 95% or better  
(e.g., error conditions that should rarely happen, and such).


Actually due to the performance penalty on some systems I think you  
only want to use it if the chance of succeeding is 99% or better, as  
the benefit if predicted is a cycle or two and the harm if  
mispredicted can be more than 50 cycles, depending on the CPU.  You  
should also remember than in filesystems many failures are  
triggered by things like the ld.so library searches, where it  
literally calls access() 20 different times on various possible paths  
for library files, failing the first 19.  It does this once for each  
necessary library.


Typically you only want to add unlikely() or likely() for about 2  
reasons:
  (A)  It's a hot path and the unlikely case is just going to burn a  
bunch of CPU anyways
  (B)  It really is extremely unlikely that it fails (Think physical  
hardware failure)


Anything else is just bogus.

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-24 Thread Kyle Moffett

On Sep 23, 2007, at 02:22:12, Goswin von Brederlow wrote:

[EMAIL PROTECTED] (Mel Gorman) writes:

On (16/09/07 23:58), Goswin von Brederlow didst pronounce:
But when you already have say 10% of the ram in mixed groups then  
it is a sign the external fragmentation happens and some time  
should be spend on moving movable objects.


I'll play around with it on the side and see what sort of results  
I get.  I won't be pushing anything any time soon in relation to  
this though.  For now, I don't intend to fiddle more with grouping  
pages by mobility for something that may or may not be of benefit  
to a feature that hasn't been widely tested with what exists today.


I watched the videos you posted. A nice and quite clear improvement  
with and without your logic. Cudos.


When you play around with it may I suggest a change to the display  
of the memory information. I think it would be valuable to use a  
Hilbert Curve to arange the pages into pixels. Like this:


# #  0  3
# #
###  1  2

### ###  0 1 E F
  # #
### ###  3 2 D C
# #
# ### #  4 7 8 B
# # # #
### ###  5 6 9 A


Here's an excellent example of an 0-255 numbered hilbert curve used  
to enumerate the various top-level allocations of IPv4 space:

http://xkcd.com/195/

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage. Move away from char device ioctls.

2007-09-16 Thread Kyle Moffett
-accessed inode  
objects and creates non-fragmented copies before deleting the old ones.


There's a lot of other technical details which would need resolution  
in an actual implementation, but this is enough of a summary to give  
you the gist of the concept.  Most likely there will be some major  
flaw which makes it impossible to produce reliably, but the concept  
contains the things I would be interested in for a real networked  
filesystem.


Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding a security parameter to VFS functions

2007-08-16 Thread Kyle Moffett

On Aug 16, 2007, at 18:57:24, Linus Torvalds wrote:

On Wed, 15 Aug 2007, David Howells wrote:
Would you object greatly to functions like vfs_mkdir() gaining a  
security parameter?  What I'm thinking of is this:
int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode,  
struct security *security)


I personally consider this an affront to everythign that is decent.

Why the *hell* would mkdir() be so magical as to need something  
like that?


Not speaking directly for David, but I believe the reason is for  
background kernel code which needs to do filesystem access during a  
thread's execution with *completely* different security context from  
that of the thread.  Examples should be reasonably obvious; kNFSd is  
one, but it also includes anything where the kernel would poke  
directly into the filesystem, such as network filesystem cachefiles.


Make it something sane, like a struct nameidata instead, and make  
it at least try to look like the path creation that is done by open 
().  Or create a struct file * or something.


I can imagine having mkdir() being passed similar data as open 
() (ie lookup()), but I cannot _possibly_ imagine it ever being  
valid to pass in something totally made-up to just mkdir(), and  
nothing else. There's something fundamentally wrong there.


I would offer the suggestion of using the described struct security  
in-place in the task struct, in place of using all those fields  
individually.  That would be, in effect the default security  
context for any given task, if NULL is passed to the appropriate  
vfs function.  For CacheFiles and kNFSd, they could each allocate  
their own during initialization or new-connection and pass that to  
any mkdir(), etc that they do on behalf of a given client.




What makes mkdir() so magical?

Also, what about all the other ops? Why is mkdir() special, but not  
mknod()? Why is mkdir() special, but not rmdir()? Really,  
none of this seems to make any sense unless you describe what is so  
magical about mkdir().


I think mkdir() was just an example he was using, probably because it  
was the first VFS call he needed to set a security context on.   Next  
would come anything which CacheFiles or NFSd call on the underlying  
filesystem.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Kyle Moffett

On Jun 26, 2007, at 22:24:03, John Johansen wrote:

other issues that have been raised are:
- the use of d_path to generate the pathname used for mediation when a
  file is opened.
  - Generating the pathname using a reverse walk is considered ugly


A little more than ugly.  In this basic concurrent rename() and  
path-lookup load:


mkdir -p /a/b/0
mkdir -p /a/b/2
mkdir -p /c
touch /a/b/0/1

cd /a/b
while true; mv 0/1 2/3; mv 2/3 0/1; done 
cd /
while true; do mv a/b c/d; mv c/d a/b; done 
while true; do cat a/b/0/1  done
while true; do cat a/b/2/3  done
while true; do cat c/d/0/1  done
while true; do cat c/d/2/3  done

I seem to recall you could actually end up racing and building a path  
to the file in those directories as a/d/0/3 or some other path at  
which it never even remotely existed.  I'd love to be wrong, but I  
can't help but see this problem in any reverse-pathname-generation  
proposal which gets the locking right.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-27 Thread Kyle Moffett

On Jun 26, 2007, at 07:14:14, Nick Piggin wrote:

On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote:
Can we call it a block mapping layer or something like that? e.g.  
struct blkmap?


I'm not fixed on fsblock, but blkmap doesn't grab me either. It is  
a map from the pagecache to the block layer, but blkmap sounds like  
it is a map from the block to somewhere.


fsblkmap ;)


vmblock? pgblock?

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Versioning file system

2007-06-19 Thread Kyle Moffett

On Jun 19, 2007, at 03:58:57, Bron Gondwana wrote:

On Mon, Jun 18, 2007 at 11:10:42PM -0400, Kyle Moffett wrote:

On Jun 18, 2007, at 13:56:05, Bryan Henderson wrote:
The question remains is where to implement versioning: directly  
in individual filesystems or in the vfs code so all filesystems  
can use it?


Or not in the kernel at all.  I've been doing versioning of the  
types I described for years with user space code and I don't  
remember feeling that I compromised in order not to involve the  
kernel.


What I think would be particularly interesting in this domain is  
something similar in concept to GIT, except in a file-system:


[...snip...]

It can work, but there's one big pain at the file level: no mmap.


IMHO it's actually not that bad.  The gitfs would divide larger  
files up into manageable chunks (say 4MB) which could be quickly  
SHA-1ed.  When a file is mmapped and partially modified, the SHA-1  
would be marked as locally invalid, but since mmap() loses most  
consistency guarantees that's OK.  A time or writeout based commit  
scheme might still freeze, SHA-1, and write-out the page at regular  
intervals without the program's knowledge, but since you only have to  
SHA-1 the relatively-small 4MB chunk (which is about to hit disk  
anyways), it's not a significant time penalty.  Even if under memory  
pressure and swapping data out to disk you don't have to update the  
SHA-1 and create a new commit as long as you keep a reference to the  
object stored in the volume header somewhere and maintain the SHA-1  
out-of-date bit.


A program which carefully uses msync() would be fine, of course (with  
proper configuration) as that would create a new commit as appropriate.


Since mmap() is poorly defined on network filesystems in the absence  
of msync(), I don't see that such behaviour would be a problem.  And  
it certainly would be fine on local filesystems as there you can just  
stuff the SHA-1 out-of-date bit and a reference to the parent  
commit and path in the object itself.  Then you just need to keep a  
useful reference to that object in a table somewhere in the volume  
and you're set.


If you don't want to support mmap it can work reasonably happily,  
though you may want to keep your sha1 (or other digest) state as  
well as the final digest so you can cheaply calculate the digest  
for a small append without walking the entire file.  You may also  
want to keep state checkpoints every so often along a big file so  
that truncates don't cost too much to recalculate.


That may be worth it even if the file is divided into 4MB chunks (or  
other configurable value), but it would need benchmarking.


Luckily in a userspace VFS that's only accessed via FTP and DAV we  
can support a limited set of operations (basically create, append,  
read, delete)  You don't get that luxury for a general purpose  
filesystem, and that's the problem.  There will always be  
particular usage patterns (especially something that mmaps or seeks  
and touches all over the place like a loopback mounted filesystem  
or a database file) that just dodn't work for file-level sha1s.


I'd think that loopback-mounted filesystems wouldn't be that difficult
  1)  Set the SHA-1 block size appropriately to divide the big file  
into a bunch of little manageable files.  Could conceivably be multi- 
layered like directories, depending on the size of the file.
  2)  Mark the file as exempt from normal commits (IE: without  
special syscalls or fsync/msync() on the file itself, it is never  
updated in the tree objects.
  3)  Set up the loopback device to call the gitfs commit code when  
it receives barriers or flushes from the parent filesystem.


And database files aren't a big issue.  I have yet to see a networked  
filesystem which you could stick a MySQL database on it from one node  
and expect to get useful/recent read results from other nodes.  If  
you really wanted something like that for such a gitfs, you could  
just add code to MySQL to create a gitfs commit every N transactions  
and not otherwise.  The best part is: that would make online MySQL  
backups from another node trivial!  Just pick any arbitrary  
appropriate commit object and mount that object, then cp -a  
mysql_db_dir mysql_backup_dir.  That's not to say it wouldn't have a  
performance penalty, but for some people the performance penalty  
might be worth it.


Oh, and for those programs which want multi-master replication, this  
makes it ten times easier:

  1)  Put each master-server on a different gitfs branch
  2)  Write your program as gitfs aware.  Make it create gitfs  
commits at appropriate times (so the data is accessible from other  
nodes).
  3)  Come up with a useful non-interactive database-file merge  
algorithm.  Useful examples of different kinds of merge engines may  
be found in the git project.  This should take $BASE_VERSION,  
$NEWVERSION1, $NEWVERSION2, and produce a $MERGEDVERSION.  A good  
algorithm

Re: Versioning file system

2007-06-18 Thread Kyle Moffett
 erased object you  
would use a History archived object with a little bit of string  
data to indicate which volume it's stored on (and where on the  
volume).  When you stick that volume into the system you could easily  
tell the kernel to use it as an alternate for the given storage group.


Q. What enforces data integrity?
A. Ensure that a new tree object and its associated sub objects are  
on disk before you delete the old one.  Doesn't need any actual full  
syncs at all, just barriers.  If you replace the tree object before  
write-out is complete then just skip writing the old one and write  
the new one in its place.


Q. What consists of a commit?
A. Anything the administrator wants to define it as.  Useful  
algorithms include: Once per x Mbyte of page dirtying, Once per 5  
min, Only when sync() or fsync() are called, Only when gitfs- 
commit is called.  You could even combine them:  Every x Mbyte of  
page dirtying or every 5 minutes, whichever is shorter (or longer,  
depending on admin requirements).  There would also be appropriate  
syscalls to trigger appropriate git-like behavior.  Network- 
accessible gitfs would want to have mechanisms to trigger commits  
based on activity on other systems (needs more thought).


Q. How do you access old versions?
A. Mount another instance of the filesystem with an SHA-1 ID, a tag- 
name, or a branch-name in a special mount option.  Should be user  
accessible with some restrictions (needs more thought).


Q. How do you deal with conflicts on networked filesystems.
A. Once again, however the administrator wants to deal with them.   
Options:

   1)  Forcibly create a new branch for the conflicted tree.
   2)  Attempt to merge changes using the standard git-merge semantics
   3)  Merge independent changes to different files and pick one for  
changes to the same file
   4)  Your Algorithm Here(TM).  GIT makes it easy to extend  
conflict-resolution.


Q. How do you deal with little scattered changes in big (or sparse)  
files?
A. Two questions, two answers:  For sparse files, git would need  
extending to understand (and hash) the nature of the sparse-ness.   
For big files, you should be able to introduce a compound-file  
datatype and configure git to deal with specific X-Mbyte chunks of it  
independently.  This might not be a bad idea for native git as well.   
Would need system-specific configuration.


Q. How do you prevent massive data consumption by spurious tiny changes
A. You have a few options:
   1)  Configure your commit algorithm as above to not commit so often
   2)  Configure a stepped commit-discard algorithm as described  
above in the How do you delete things question

   3)  Archive unused data to secondary storage more often

Q. What about all the unanswered questions?
A. These are all the ones I could think of off the top of my head but  
there are at least a hundred more.  I'm pretty sure these are some of  
the most significant ones.


Q. That's a great idea and I'll implement it right away!
A. Yay!  (but that's not a question :-D)  Good luck and happy hacking.

Q. That's a stupid idea and would never ever work!
A. Thanks for your useful input! (but that's not a question either)   
I'm sure anybody who takes up a project like this will consider such  
opinions.


Q. *flamage*
A. I'm glad you have such strong opinions, feel free to to continue  
to spam my /dev/null device (and that's also not a question).


All opinions and comments welcomed.

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Versioning file system

2007-06-18 Thread Kyle Moffett

On Jun 18, 2007, at 17:24:23, Brad Boyer wrote:

On Tue, Jun 19, 2007 at 12:26:57AM +0200, Jörn Engel wrote:
Pointless here means that _I_ don't see the point.  Maybe there  
are valid uses for extended attributes.  If there are, noone has  
explained them to me yet.


The users of extended attributes that I've dealt with are ACL  
support and SELinux. These both use extended attributes under the  
covers. It's just not immediately obvious if you aren't looking.


Yeah, extended attributes are typically used for exactly that:  
attributes like labels, permissions, encoding, cached file-type,  
DOS/Windows/Mac metadata, etc.  Sometimes people suggest sticking  
icons in there, but that's probably a bad idea.  At most stick an  
icon label attribute which refers to a file /usr/share/icons/ 
by_attr/$ICON_LABEL.png.  If you're trying to put more than 256  
bytes of data in an extended attribute then you're probably doing  
something wrong.  They're very good for cached attributes (like file- 
type) where you don't care if the data is lost by tar, and they're  
reasonable for security-related attributes where you don't want  
attribute-unaware programs trying to save and restore them (like  
SELinux labels).


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-09 Thread Kyle Moffett

On Jun 09, 2007, at 01:18:40, [EMAIL PROTECTED] wrote:
SELinux is like a default allow IPS system, you have to describe  
EVERYTHING to the system so that it knows what to allow and what to  
stop.


WRONG.  You clearly don't understand SELinux at all.  Try booting in  
enforcing mode with an empty policy file (well, not quite empty,  
there are a few mandatory labels you have to create before it's a  
valid policy file).  /sbin/init will load the initial policy, attempt  
to re-exec() itself... and promptly grind to a halt.  End-of-story.


Typical targetted policies leave all user logins as unrestricted,  
adding security for daemons but not getting in the way of users who  
would otherwise turn SELinux off.  On the other hand, a targeted  
policy has a trusted type for user logins which is explicitly  
allowed access to everything.


That said, if you actually want your system to *work* with any  
default-deny policy then you have to describe EVERYTHING anyways.   
How exactly do you expect AppArmor to work if you don't allow users  
to run /bin/passwd, for example.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-09 Thread Kyle Moffett

On Jun 09, 2007, at 12:46:40, [EMAIL PROTECTED] wrote:

On Sat, 9 Jun 2007, Kyle Moffett wrote:
Typical targetted policies leave all user logins as  
unrestricted, adding security for daemons but not getting in the  
way of users who would otherwise turn SELinux off.  On the other  
hand, a targeted policy has a trusted type for user logins which  
is explicitly allowed access to everything.


Ok, it sounds as if I did misunderstand SELinux. I thought that by  
labeling the individual files you couldn't do the 'only restrict  
apache' type of thing.


That said, if you actually want your system to *work* with any  
default-deny policy then you have to describe EVERYTHING anyways.   
How exactly do you expect AppArmor to work if you don't allow  
users to run /bin/passwd, for example.


for AA you don't try to define permissions for every executable,  
and ones that you don't define policy are unrestricted.


so as I understand this with SELinux you will have lots of labels  
around your system (more as you lock down the system more) you need  
to define policy so that your unrestricted users must have access  
to every label, and every time you create a new label you need to  
go back to all your policies to see if the new label needs to be  
allowed from that policy


Actually, it's easier than that.  There are type attributes which may  
be assigned to an arbitrary set of types, and each type field in an  
access rule may use either a type or an attribute.  So you don't  
actually need to modify existing rules when adding new types, you  
just add the appropriate existing attributes to your new type.  For  
example, you could set up a logfile attribute which allows  
logrotate to archive old versions and allows audit-admin users to  
modify/delete them, then whenever you need to add a new logfile you  
just declare the my_foo_log_t type to have the logfile attribute.


On the other hand, I seem to recall that typical targeted policies  
don't grant most of the additional access via access rules, they  
instead add a special case to the fundamental constraints in the  
policy (IE: If the subject type has the trusted attribute then skip  
some of the other type-based checks).


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-09 Thread Kyle Moffett

On Jun 09, 2007, at 13:32:05, [EMAIL PROTECTED] wrote:

On Sat, 9 Jun 2007, Kyle Moffett wrote:

On Jun 09, 2007, at 12:46:40, [EMAIL PROTECTED] wrote:
so as I understand this with SELinux you will have lots of labels  
around your system (more as you lock down the system more) you  
need to define policy so that your unrestricted users must have  
access to every label, and every time you create a new label you  
need to go back to all your policies to see if the new label  
needs to be allowed from that policy


Actually, it's easier than that.  There are type attributes which  
may be assigned to an arbitrary set of types, and each type  
field in an access rule may use either a type or an attribute.  So  
you don't actually need to modify existing rules when adding new  
types, you just add the appropriate existing attributes to your  
new type.  For example, you could set up a logfile attribute  
which allows logrotate to archive old versions and allows audit- 
admin users to modify/delete them, then whenever you need to add a  
new logfile you just declare the my_foo_log_t type to have the  
logfile attribute.


isn't this just the flip side of the same problem?

every time you define a new attribute you need to go through all  
the files and decide if the new attribute needs to be given to that  
file.


No you don't, you can add attributes to a type after-the-fact.  In  
concept this problem is very similar to programming:  You have  
various documented interfaces used by different policy files to  
interact with each other.  As long as your policy files conform to  
the documented interfaces then you *DONT* have to manually inspect  
each file because you can make basic assumptions.  On the other hand,  
when you break that interface contract you will get very unexpected  
results.  For the above example:


My syslog policy file would create a logfile attribute and types  
for /var/log/auth/auth.log, /var/log/kern/kern.log, and /var/log/ 
messages.  It would also create a logdaemon attribute which has  
automatic type transitions to create files in different /var/log/*  
directories  Finally, it would allow the syslogd type to create and  
append to its specific file types for auth.log, kern.log, and  
messages.


My logrotate policy file would depend on the syslog policy and would  
declare the logrotate daemon type as a logdaemon, and additionally  
allow logrotate to read, rename, append, and delete logfile types.   
Since logrotate is a logdaemon, it already has the appropriate type  
transitions for new types.


My samba policy file would depend on the syslog policy and would  
declare the samba daemon type as a logdaemon and the /var/log/ 
samba/* type as a logfile.  Then it would add a type transition  
rule so when logdaemon creates new files in samba_log_dir_t, they  
have the appropriate samba_log_t label.  Finally, samba would allow  
itself to append to samba_log_t files.


Note that now when logrotate runs and rotates files in /var/log/ 
samba, it will automatically create the new files with type  
samba_log_t, even though there are no *direct* associations between  
those types.  If the syslog policy file was poorly written it could  
seriously adversely affect the security of the system, but hopefully  
that's obvious :-D.  Policy development is _hard_, it's a whole  
separate state-machine and pseudo-programming-language that should  
mostly be left to security professionals or very experienced  
developers/sysadmins.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-28 Thread Kyle Moffett

On May 28, 2007, at 06:41:11, Toshiharu Harada wrote:

2007/5/27, Kyle Moffett [EMAIL PROTECTED]:
If you can't properly manage your labels, then how do you expect  
any security at all?
Please read my message again. I didn't say, This can never be  
achieved.  I said, This can not be easily achieved.


So you said (data labels) can not be easily achieved.  My  
question for you is: How do you manage secure UNIX systems without  
standard UNIX permission bits?  Also:  If you have problems with  
data labels then what makes pathname based labels easier?  If  
there is something that could be done to improve SELinux and make  
it more readily configurable then it should probably be done.


Permission bits can be checked easily with ls command but  
assuring the correctness of labels are not that easy. I'll try to  
explain.


The correctness of the permission bit for a given file can be  
judged solely by the result of ls command.  The correctness of  
the label, on the other hand, can't be judged without understanding  
of whole policy including domain transitions. (see the attached  
figure) I can imagine that once one get the complete SELinux  
policy, then it is able to modify and maintain it.


That's why there are a number of efforts to make modular SELinux  
policies.  A good SELinux policy provides a few core system types and  
labels which a policy developer needs to understand, as well as some  
good macros to simplify the human-editable policy files.  For  
instance, in my customized policy a daemon run by an initscript which  
reads a single config file in /etc needs this policy (Note that I use  
_d as a suffix for process domains instead of the usual _t):


initrc_daemon(foo_exec_t, foo_d)
daemon_config(foo_d, foo_conf_t)

Add maybe 2 lines for network port access, another 2 for database  
files in /var, plus maybe an iptables rule or two in your firewall  
file.


I don't say making a complete SELinux policy is impossible, and  
actually you said you did it.  But to be frank, I don't think you  
are the average level user at all. ;-)


Average users are not supposed to be writing security policy.  To be  
honest, even average-level system administrators should not be  
writing security policy.  It's OK for such sysadmins to tweak  
existing policy to give access to additional web-docs or such, but  
only expert sysadmin/developers or security professionals should be  
writing security policy.  It's just too damn easy to get completely  
wrong.


I'm very interested in how you can know that you have the correct  
object labeling (this is my point). Could you tell?


I know that I have the correct object labeling because:


Do you mind if I add this?

0) I understood the default policy and perfectly understand the  
every behavior of my system.


this is where the difficulties exist.


You don't have to understand the entire default policy; that's the  
point of modular policy.  You only have to understand how to _use_  
the interfaces of the system policy (which are documented) and how  
the particular daemon policy is supposed to work.  The people  
developing the core system policy need to understand the inner  
workings of said policy, but they don't need to understand how the  
rest of the system works.  The core functionality behind this  
separation is macro interfaces and attributes.  By grouping types  
with attributes it is possible for arbitrary daemon types to  
categorize themselves under access rules defined by the base policy,  
and with interfaces the daemons don't really even need to know what  
those attributes are called.


I don't deny DAC at all.  If we deny DAC, we can't live with  
Linux it's the base.  MAC can be used to cover the shortages of  
DAC and Linux's simple user model, that's it.


From security point of view, simplicity is always the virtue and  
the way to go.  Inode combined label is guaranteed to be a single  
at any point time.  This is the most noticeable advantage of  
label-based security.


I would argue that pathname-based security breaks the simplicity  
is the best virtue (of a security system) paradigm, because it  
attributes multiple potentially-conflicting labels to the same piece


I have a question for you.  With current implementation of SELinux,  
only one label can be assigned.  But there are cases
that one object can be used in different context, so I think it  
might help if SELinux would allow objects to have multiple labels.  
(I'm not talking about conflicts here)  What do you think?


This is the whole advantage of SELinux type attributes: you can  
define a type var_foo_t which has a specific list of attributes;  
rules which accept type specifiers can also accept attribute  
specifiers as well.  If what you want is a label which may be  
accessed in two different ways, then you declare attributes for each  
access method and declare a type which has the attributes filetype,  
access1, and access2 (assuming access1 and access2

Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-28 Thread Kyle Moffett

On May 28, 2007, at 16:38:38, Pavel Machek wrote:

Kyle Moffett wrote:
I am of the opinion that adding a  name parameter to the file/ 
directory create actions would be useful.  For example, with such  
support you could actually specify a  type-transition rule  
conditional on a specific name or substring:


named_type_transition sshd_t tmp_t:sock_file prefix ssh-  
ssh_sock_t;


Useful options for matching would be prefix, suffix, substr  
(start,len).  regex would be nice but is sorta computationally  
intensive and would be likely to cause more problems than it solves.


Could someone implement this? AFAICT that prevents SELinux from  
being superset of AppArmor... Doing this should be significantly  
simpler than whole AA, and hopefully it will end up less ugly, too.


Really it would need to extend all action-match items with new  
named_ equivalents, and most callbacks would need to be extended to  
pass in an object name, if available.  On the other hand, with such  
support implemented then the AppArmor policy compilation tools could  
be transformed into a simple SELinux policy generator.  I estimate  
that the number of new lines of kernel code for such a modified  
SELinux would be 100x less than the kernel code in AppArmor.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-27 Thread Kyle Moffett

CC trimmed to remove a few poor overloaded inboxes from this tangent.

On May 27, 2007, at 04:34:10, Cliffe wrote:

Kyle wrote:
On the other hand, if you actually want to protect the _data_,  
then tagging the _name_ is flawed; tag the *DATA* instead.


Would it make sense to label the data (resource) with a list of  
paths (names) that can be used to access it?


Therefore the data would be protected against being accessed via  
alternative arbitrary names. This may be a simple label to maintain  
and (possibly to) enforce, allowing path based confinement to  
protect a resource. This may allow for the benefits of pathname  
based confinement while avoiding some of its problems.


The primary problem with that is that mv somefile otherfile must  
change the labels, which means that every process that issues a rename 
() syscall needs to have special handling of labels.  The other  
problem is that many of the features and capabilities of SELinux get  
left by the wayside.  On an SELinux system 90% of the programs don't  
need to be modified to understand labels, since the policy can define  
automatic label transitions.  SELinux also allows you to have  
conditional label privileges based on boolean variables, something  
that cannot be done if the privileges themselves are stored in the  
filesystem.  Finally, such an approach does not allow you to  
differentiate between programs.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-27 Thread Kyle Moffett

On May 27, 2007, at 03:25:27, Toshiharu Harada wrote:

2007/5/27, Kyle Moffett [EMAIL PROTECTED]:

On May 26, 2007, at 19:08:56, Toshiharu Harada wrote:

2007/5/27, James Morris [EMAIL PROTECTED]:

On Sat, 26 May 2007, Kyle Moffett wrote:
AppArmor).  On the other hand, if you actually want to protect  
the _data_, then tagging the _name_ is flawed; tag the *DATA*  
instead.


Bingo.

(This is how traditional Unix DAC has always functioned, and is  
what SELinux does: object labeling).


Object labeling (or labeled security) looks simple and straight  
forward way, but it's not.


(1) Object labeling has a assumption that labels are always  
properly defined and maintained. This can not be easily achieved.


That's a circular argument, and a fairly trivial one at that.


Sorry Kyle, I don't think it's a trivial one.  The opposite.


How is that argument not trivially circular?  Foo has an assumption  
that foo-property is always properly defined and maintained.  That  
could be said about *anything*:
  *  Unix permissions have an assumption that mode bits are always  
properly defined and maintained
  *  Apache .htaccess security has an assumtion that .htaccess files  
are always properly defined and maintained.
  *  Functional email communication has an assumption that the email  
servers are always properly defined and maintained


If you can't properly manage your labels, then how do you expect  
any security at all?


Please read my message again. I didn't say, This can never be  
achieved.  I said, This can not be easily achieved.


So you said (data labels) can not be easily achieved.  My question  
for you is: How do you manage secure UNIX systems without standard  
UNIX permission bits?  Also:  If you have problems with data labels  
then what makes pathname based labels easier?  If there is  
something that could be done to improve SELinux and make it more  
readily configurable then it should probably be done.


If you can't achieve the first with reasonable security, then you  
probably can't achieve the second either.  Also, if you can't  
manage correct object labeling then I'm very interested in how you  
are maintaining secure Linux systems without standard DAC.


I'm very interested in how you can know that you have the correct  
object labeling (this is my point). Could you tell?


I know that I have the correct object labeling because:
  1) I rewrote/modified the default policy to be extremely strict on  
the system where I wanted the extra security and hassle.
  2) I ensured that the type transitions were in place for almost  
everything that needed to be done to administer the system.

  3) I wrote a file-contexts file and relabeled *once*
  4) I loaded the customized policy plus policy for restorecon and  
relabeled for the last time
  5) I reloaded the customized policy without restorecon privileges  
and without the ability to reload the policy again.

  6) I never reboot the system without enforcing mode.
  7) If there are unexpected errors or files have incorrect labels,  
I have to get the security auditor to log in on the affected system  
and relabel the problematic files manually (rare occurrence which  
requires excessive amounts of paperwork).


(2) Also, assigning a label is something like inventing and  
assigning a *new* name (label name) to objects which can cause  
flaws.


I don't understand how assigning new attributes to objects can  
cause flaws, nor what flaws those might be; could you elaborate  
further? In particular, I don't see how this is really all that  
more complicated than defining additional access control in  
apache .htaccess files.  The principle is the same:  by stacking  
multiple independent security-verification mechanisms (Classical  
UNIX DAC and Apache permissions) you can increase security, albeit  
at an increased management cost.  You might also note that  
.htaccess files are yet another form of successful label-based  
security; the security context for a directory depends on  
the .htaccess label file found within.  The *exact* same  
principles apply to SELinux: you add additional attributes backed  
by a simple and powerful state-machine.  The cross-checks are  
lower-level than those from .htaccess files, but the principles  
are the same.


I don't deny DAC at all.  If we deny DAC, we can't live with Linux  
it's the base.  MAC can be used to cover the shortages of DAC and  
Linux's simple user model, that's it.


From security point of view, simplicity is always the virtue and  
the way to go.  Inode combined label is guaranteed to be a single  
at any point time.  This is the most noticeable advantage of label- 
based security.


I would argue that pathname-based security breaks the simplicity is  
the best virtue (of a security system) paradigm, because it  
attributes multiple potentially-conflicting labels to the same piece  
of data.  It also cannot protect the secrecy of specific *data* as  
well as SELinux can.  For example

Re: Pass struct vfsmount to the inode_create LSM hook

2007-05-26 Thread Kyle Moffett

On May 26, 2007, at 10:44:46, Tetsuo Handa wrote:

Andreas Gruenbacher wrote:

Tetsuo Handa wrote:
Therefore, TOMOYO Linux checks the combination of filename and  
argv[0] passed to execve().


So you are indeed trying to control the value of argv[0]? Well,  
good luck with that, but it's totally insane. You are guaranteed  
to break some applications.


TOMOYO Linux ristricts argv[0] using allow_argv0 syntax.   
allow_argv0 /bin/bash -bash to allow passing /bin/bash to  
filename and -bash to argv[0].  allow_argv0 /bin/gzip gunzip to  
allow passing /bin/gzip to filename and gunzip to argv[0].   
allow_argv0 /sbin/busybox cat to allow passing /sbin/busybox to  
filename and cat to argv[0].  No need to use allow_argv0 syntax  
if the basename of filename and basename of argv[0] are the same  
(i.e. allow_argv0 /bin/bash bash is not required).  TOMOYO Linux  
doesn't unconditionally forbid passing different values for  
filename and argv[0].  TOMOYO Linux allows passing different values  
for filename and argv[0] only if it is allowed by allow_argv0 syntax.


Could you please explain me why this approach breaks applications?


One of my servers runs 3 different instances of the kadmind  
Kerberos daemon, one for each realm which I need to be able to modify/ 
change-passwords/etc.  In order to differentiate and stop/restart the  
appropriate daemon, I have a simple starter script which runs each  
kadmind process with a unique name derived from the realm (EG:  
kadmind(EXAMPLE.COM), kadmind(OTHER.EXAMPLE.COM)).  Since this is  
a Kerberos server I use a very strict SELinux-based policy, yet my  
management tools need to be able to easily add and remove realms in a  
secure fashion.  It sounds like TOMOYO Linux would not be able to  
handle this situation at all;  I would either have to completely turn  
off that security feature and lose most of the functionality of  
TOMOYO Linux, or hard-code the list of realms into the policy file  
and have to completely reload policy every time I need to add/remove  
realms (big gaping security hole).


Cheers,
Kyle Moffett



-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-26 Thread Kyle Moffett

On May 26, 2007, at 19:08:56, Toshiharu Harada wrote:

2007/5/27, James Morris [EMAIL PROTECTED]:

On Sat, 26 May 2007, Kyle Moffett wrote:
AppArmor).  On the other hand, if you actually want to protect  
the _data_, then tagging the _name_ is flawed; tag the *DATA*  
instead.


Bingo.

(This is how traditional Unix DAC has always functioned, and is  
what SELinux does: object labeling).


Object labeling (or labeled security) looks simple and straight  
forward way, but it's not.


(1) Object labeling has a assumption that labels are always  
properly defined and maintained. This can not be easily achieved.


That's a circular argument, and a fairly trivial one at that.  If you  
can't properly manage your labels, then how do you expect any  
security at all?  If you can't manage your labels, then pathname- 
based security won't work either.  This is analogous to saying  
Pathname-based security has an assumption that path-permissions are  
always properly defined and maintained, which is equally obvious.   
If you can't achieve the first with reasonable security, then you  
probably can't achieve the second either.  Also, if you can't manage  
correct object labeling then I'm very interested in how you are  
maintaining secure Linux systems without standard DAC.


(2) Also, assigning a label is something like inventing and  
assigning a *new* name (label name) to objects which can cause flaws.


I don't understand how assigning new attributes to objects can cause  
flaws, nor what flaws those might be; could you elaborate further?   
In particular, I don't see how this is really all that more  
complicated than defining additional access control in  
apache .htaccess files.  The principle is the same:  by stacking  
multiple independent security-verification mechanisms (Classical UNIX  
DAC and Apache permissions) you can increase security, albeit at an  
increased management cost.  You might also note that .htaccess  
files are yet another form of successful label-based security; the  
security context for a directory depends on the .htaccess label  
file found within.  The *exact* same principles apply to SELinux: you  
add additional attributes backed by a simple and powerful state- 
machine.  The cross-checks are lower-level than those from .htaccess  
files, but the principles are the same.


Cheers,
Kyle Moffett



-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-26 Thread Kyle Moffett

On May 26, 2007, at 22:37:02, [EMAIL PROTECTED] wrote:

On Sat, 26 May 2007 22:10:34 EDT, Kyle Moffett said:

On May 26, 2007, at 19:08:56, Toshiharu Harada wrote:



(1) Object labeling has a assumption that labels are always
properly defined and maintained. This can not be easily achieved.


That's a circular argument, and a fairly trivial one at that.  If you
can't properly manage your labels, then how do you expect any
security at all?


Unfortunately, it's not at all as simple as all that. Toshiharu is  
quite correct that it isn't always easy to actually implement.   
Consider how many ad-croc usages of 'restorecon' are needed to get  
a Fedora SELinux box through rc.sysinit:


While I don't think restorecon is really necessary to properly boot  
SELinux-enabled (I've got a Debian box with some heavily customized  
policy which does so just fine), I am of the opinion that adding a  
name parameter to the file/directory create actions would be  
useful.  For example, with such support you could actually specify a  
type-transition rule conditional on a specific name or substring:


named_type_transition sshd_t tmp_t:sock_file prefix ssh- ssh_sock_t;

Useful options for matching would be prefix, suffix, substr 
(start,len).  regex would be nice but is sorta computationally  
intensive and would be likely to cause more problems than it solves.



/sbin/restorecon  -R /dev 2/dev/null
[ -n $SELINUX_STATE ]  restorecon /dev/mapper /dev/mapper/ 
control /dev/null 21


These can go away if you modify your policy and  pass -o  
fscontext=system_u:object_r:dev_t to the mount command for the /dev  
tmpfs, changing both the filesystem and the default directory labels  
from the default system_u:object_r:tmpfs_t.  This will work as long  
as your policy files contain appropriate transitions from the dev_t  
type.



REBOOTFLAG=`restorecon -v /sbin/init`
restorecon /etc/mtab /etc/ld.so.cache /etc/blkid/blkid.tab /etc/ 
resolv.conf /dev/null 21

[ -n $SELINUX_STATE ]  restorecon /tmp
[ -n $SELINUX_STATE ]  restorecon /tmp/.ICE-unix /dev/null 21


I believe these are to handle rebooting from non-SELinux mode.  There  
are two solutions to this kind of problem:

(1) Failing the boot if the labels are wrong
(2) Fixing the labels (and rebooting if necessary)
It would appear that FC does the latter, although for certain high- 
security systems (such as firewalls handling classified/confidential  
data), the first option is the only acceptable one.



[ -n $SELINUX_STATE ]  restorecon /dev/pts /dev/null 21


I think this can be handled with some combination of appropriate  
SELinux policy and mount options.  At least, I don't need this in the  
boot scripts on my heavily customized Debian SELinux box.



[ -n $SELINUX_STATE -a -e $path ]  restorecon -R $path


I don't know what the point of this generic line is; but I certainly  
don't have anything of the sort on my test system, and it appears to  
work just fine.


And that's just getting the system up to single-user.  Things like  
sendmail and sshd require more restorecon handholding in their  
rc.init files.


Or just look at the creeping horror that is 'restorecond' (in  
particular, consider that the default restorcond.conf contains the  
strings '~/public_html' and '~/.mozilla/plugins/ 
libflashplayer.so'.  Yee. Frikkin. Hah. ;)


Part of the reason that Fedora has a large quantity of that  
restorecon and restorecond crap is that there  is a certain amount of  
broken binary software needing executable stack/heap (such as  
flashplayer), programs without comprehensive or complete policies, or  
programs which by definition need extra support for SELinux.


For example, to really complete the SELinux model you need editors  
and tools which understand security labels the same way they  
understand UNIX permissions.  With a bit of vim scripting I can  
probably make it run external commands to read file labels before it  
reads the file itself and modify /proc/self/attr/fscreate before  
writing out the file, similar to the way it would keep track of the  
standard DAC permissions on files it modifies.


There's also the fact that corporate products have fixed release  
schedules so remaining bugs in each release tend to get papered over  
instead of fixed properly (such as the restorecon in FC init- 
scripts).  I haven't seen many problems with the SELinux model which  
aren't associated with working around buggy software or missing  
features.


Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] AFS: Implement file locking

2007-05-25 Thread Kyle Moffett

On May 25, 2007, at 22:23:42, J. Bruce Fields wrote:

On Thu, May 24, 2007 at 05:55:54PM +0100, David Howells wrote:

+   /* only whole-file locks are supported */
+   if (fl-fl_start != 0 || fl-fl_end != OFFSET_MAX)
+   return -EINVAL;


Do you allow upgrades and downgrades?  (Just curious.)


I was actually under the impression that OpenAFS had support for byte- 
range locking (as well as lock upgrade/downgrade); though IIRC there  
was some secondary protocol.  That's probably why the support is so  
basic at the moment; David's getting the basics in first and the more  
complicated stuff can come later.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-25 Thread Kyle Moffett

On May 24, 2007, at 14:58:41, Casey Schaufler wrote:
On Fedora zcat, gzip and gunzip are all links to the same file.  I  
can imagine (although it is a bit of a stretch) allowing a set of  
users access to gunzip but not gzip (or the other way around).


That is a COMPLETE straw-man argument.  I can override your check  
with this absolutely trivial perl code:


exec { /usr/bin/gunzip } gzip, -9, some/file/to.gz;

Pathname-based checks are pretty fundamentally insecure.  If you want  
to protect a name, then you should tag the name with security  
attributes (IE: AppArmor).  On the other hand, if you actually want  
to protect the _data_, then tagging the _name_ is flawed; tag the  
*DATA* instead.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] LogFS take three

2007-05-17 Thread Kyle Moffett

On May 17, 2007, at 13:45:33, Evgeniy Polyakov wrote:
On Thu, May 17, 2007 at 07:26:07PM +0200, Jan Engelhardt  
([EMAIL PROTECTED]) wrote:
My plan was to move this code to lib/ sooner or later.  If you  
consider it useful in its current state, I can do it immediatly.   
And if someone else merged a superior btree library I'd happily  
remove mine and use the new one instead.


Opinions?


Why would we need another btree, when there is lib/rbtree.c?  Or  
does yours do something fundamentally different?


It is not red-black tree, it is b+ tree.


It might be better to use the prefix bptree to help prevent  
confusion.  A quick google search on bp-tree reveals only the perl B 
+-tree module Tree::BPTree, a U-Maryland Java CS project on B+- 
trees, and a news article about a BP tree-top protest.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] revokeat/frevoke system calls V5

2007-02-26 Thread Kyle Moffett

On Feb 26, 2007, at 13:46:21, H. Peter Anvin wrote:

Alan wrote:
I'm not sure.  Turning, for example, the statat(dir_fd, name ==  
NULL)  error case into fstat(dir_fd) sounds like a way for apps,  
admittedly  buggy ones, to be surprised.  Maybe libc would be  
exptected to catch  the error before performing the shared system  
call?
At that point would it not be cheaper to have two system calls,  
the table cost isn't very large.


It's not just the table, though, you need two entry points, but  
even that isn't really all that big either, I guess.


Well, I suppose there are multiple possibilities for consolidation:
  frevokeat(fd, /foo/bar/baz) = normal frevokeat
  frevokeat(-1, /foo/bar/baz)   = revoke(/foo/bar/baz);
  frevokeat(fd, NULL)   = frevoke(fd);

Neither of those would ordinarily be considered to do anything useful  
and for new syscalls I can't see the possibility of breaking existing  
programs.  On the other hand, it's not like we have any problems with  
the syscall tables getting too large.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html