Re: [RFC] fsblock

2007-06-27 Thread David Chinner
On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote:
 I think using fsblock to drive the IO and keep the pagecache flags
 uptodate and using a btree in the filesystem to manage extents of block
 allocations wouldn't be a bad idea though. Do any filesystems actually
 do this?

Yes. XFS. But we still need to hold state in buffer heads (BH_delay,
BH_unwritten) that is needed to determine what type of
allocation/extent conversion is necessary during writeback. i.e.
what we originally mapped the page as during the -prepare_write
call.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread John Johansen
On Tue, Jun 26, 2007 at 07:47:00PM -0700, Andrew Morton wrote:
 On Tue, 26 Jun 2007 19:24:03 -0700 John Johansen [EMAIL PROTECTED] wrote:
 
   
   so...  where do we stand with this?  Fundamental, irreconcilable
   differences over the use of pathname-based security?
   
  There certainly seems to be some differences of opinion over the use
  of pathname-based-security.
 
 I was refreshed to have not been cc'ed on a lkml thread for once.  I guess
 it couldn't last.
 
sorry about that

 Do you agree with the irreconcilable part?  I think I do.
 
I will concede that this may be the case for some.  However I am still
hopeful (perhaps naive) that this isn't the case in general.

 I suspect that we're at the stage of having to decide between
 
 a) set aside the technical issues and grudgingly merge this stuff as a
service to Suse and to their users (both of which entities are very
important to us) and leave it all as an object lesson in
how-not-to-develop-kernel-features.
 
Minimisation of the impact on the rest of the kernel is of course
very important here.
Agreed, and I hope any changes that are made are for the benefit
of the kernel in general and will find uses in other parts.

 
 versus
 
 b) leave it out and require that Suse wear the permanent cost and
quality impact of maintaining it out-of-tree.  It will still be an
object lesson in how-not-to-develop-kernel-features.
 
 Sigh.  Please don't put us in this position again.  Get stuff upstream
 before shipping it to customers, OK?  It ain't rocket science.
 
Indeed, I can only appologize for the past, and offer reassurances
that we intend to do our best to do, it right going forward.

   Are there any other sticking points?
   
   
  The conditional passing of the vfsmnt mount in the vfs, as done in this
  patch series, has received a NAK.  This problem results from NFS passing
  a NULL nameidata into the vfs.  We have a second patch series that we
  have posted for discussion that addresses this by splitting the nameidata
  struct.
  Message-Id: [EMAIL PROTECTED]
  Subject: [RFD 0/4] AppArmor - Don't pass NULL nameidata to
  vfs_create/lookup/permission IOPs
  
  other issues that have been raised are:
  - AppArmor does not currently mediate IPC and network communications.
Mediation of these is a wip
  - the use of d_path to generate the pathname used for mediation when a
file is opened.
- Generating the pathname using a reverse walk is considered ugly
- A buffer is alloced to store the generated path name.
- The  buffer size has a configurable upper limit which will cause
  opens to fail if the pathname length exceeds this limit.  This
  is a fail closed behavior.
- there have been some concerns expressed about the performance
  of this approach
We are evaluating our options on how best to address this issue.
 
 OK, useful summary, thanks.  I'd encourage you to proceed apace.

thankyou


pgpqgXwQShDon.pgp
Description: PGP signature


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Kyle Moffett

On Jun 26, 2007, at 22:24:03, John Johansen wrote:

other issues that have been raised are:
- the use of d_path to generate the pathname used for mediation when a
  file is opened.
  - Generating the pathname using a reverse walk is considered ugly


A little more than ugly.  In this basic concurrent rename() and  
path-lookup load:


mkdir -p /a/b/0
mkdir -p /a/b/2
mkdir -p /c
touch /a/b/0/1

cd /a/b
while true; mv 0/1 2/3; mv 2/3 0/1; done 
cd /
while true; do mv a/b c/d; mv c/d a/b; done 
while true; do cat a/b/0/1  done
while true; do cat a/b/2/3  done
while true; do cat c/d/0/1  done
while true; do cat c/d/2/3  done

I seem to recall you could actually end up racing and building a path  
to the file in those directories as a/d/0/3 or some other path at  
which it never even remotely existed.  I'd love to be wrong, but I  
can't help but see this problem in any reverse-pathname-generation  
proposal which gets the locking right.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-27 Thread Chris Mason
On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote:
 On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote:
  On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote:
   On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote:
  
  [ ... fsblocks vs extent range mapping ]
  
   iomaps can double as range locks simply because iomaps are
   expressions of ranges within the file.  Seeing as you can only
   access a given range exclusively to modify it, inserting an empty
   mapping into the tree as a range lock gives an effective method of
   allowing safe parallel reads, writes and allocation into the file.
   
   The fsblocks and the vm page cache interface cannot be used to
   facilitate this because a radix tree is the wrong type of tree to
   store this information in. A sparse, range based tree (e.g. btree)
   is the right way to do this and it matches very well with
   a range based API.
  
  I'm really not against the extent based page cache idea, but I kind of
  assumed it would be too big a change for this kind of generic setup.  At
  any rate, if we'd like to do it, it may be best to ditch the idea of
  attach mapping information to a page, and switch to lookup mapping
  information and range locking for a page.
 
 Well the get_block equivalent API is extent based one now, and I'll
 look at what is required in making map_fsblock a more generic call
 that could be used for an extent-based scheme.
 
 An extent based thing IMO really isn't appropriate as the main generic
 layer here though. If it is really useful and popular, then it could
 be turned into generic code and sit along side fsblock or underneath
 fsblock...

Lets look at a typical example of how IO actually gets done today,
starting with sys_write():

sys_write(file, buffer, 1MB)
for each page:
prepare_write()
allocate contiguous chunks of disk
attach buffers
copy_from_user()
commit_write()
dirty buffers

pdflush:
writepages()
find pages with contiguous chunks of disk
build and submit large bios

So, we replace prepare_write and commit_write with an extent based api,
but we keep the dirty each buffer part.  writepages has to turn that
back into extents (bio sized), and the result is completely full of dark
dark corner cases.

I do think fsblocks is a nice cleanup on its own, but Dave has a good
point that it makes sense to look for ways generalize things even more.

-chris
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-27 Thread Kyle Moffett

On Jun 26, 2007, at 07:14:14, Nick Piggin wrote:

On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote:
Can we call it a block mapping layer or something like that? e.g.  
struct blkmap?


I'm not fixed on fsblock, but blkmap doesn't grab me either. It is  
a map from the pagecache to the block layer, but blkmap sounds like  
it is a map from the block to somewhere.


fsblkmap ;)


vmblock? pgblock?

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-27 Thread David Chinner
On Tue, Jun 26, 2007 at 11:49:15PM -0400, Andreas Dilger wrote:
 On Jun 27, 2007  09:14 +1000, David Chinner wrote:
  Someone on the XFs list had an interesting request - preallocated
  swap files. You can't use unwritten extents for this because
  of sys_swapon()s use of bmap() (XFS returns holes for reading
  unwritten extents), so we need a method of preallocating that does
  not zero or mark the extent unread. i.e. FA_MKSWAP.
 
 Is there a reason why unwritten extents return 0 to bmap()?

It's a fallout of xfs_get_blocks not mapping unwritten extents
on read because we want do_mpage_readpage() to treat them
as a hole. i.e. zero fill them instead of doing I/O. This is
the way XFS was shoehorned into the generic read path :/

 This
 would seem to be the only impediment from using fallocated files
 for swap files.  Maybe if FIEMAP was used by mkswap to get an
 UNWRITTEN flag back instead of HOLE it wouldn't be a problem.

Probably. If we taught do_mpage_readpage() about unwritten mappings,
then would could map them on read if and then sys_swapon can remain
blissfully unaware of unwritten extents.

I think this is pretty much all I need to do to acheive that is
(untested):

---

Teach do_mpage_readpage() about unwritten extents so we can
always map them in get_blocks rather than they are are holes on
read. Allows setup_swap_extents() to use preallocated files on XFS
filesystems for swap files without ever needing to convert them.

Signed-Off-By: Dave Chinner [EMAIL PROTECTED]

---
 fs/mpage.c  |5 +++--
 fs/xfs/linux-2.6/xfs_aops.c |   13 +++--
 2 files changed, 6 insertions(+), 12 deletions(-)

Index: 2.6.x-xfs-new/fs/mpage.c
===
--- 2.6.x-xfs-new.orig/fs/mpage.c   2007-05-29 16:17:59.0 +1000
+++ 2.6.x-xfs-new/fs/mpage.c2007-06-27 22:39:35.568852270 +1000
@@ -207,7 +207,8 @@ do_mpage_readpage(struct bio *bio, struc
 * Map blocks using the result from the previous get_blocks call first.
 */
nblocks = map_bh-b_size  blkbits;
-   if (buffer_mapped(map_bh)  block_in_file  *first_logical_block 
+   if (buffer_mapped(map_bh)  !buffer_unwritten(map_bh) 
+   block_in_file  *first_logical_block 
block_in_file  (*first_logical_block + nblocks)) {
unsigned map_offset = block_in_file - *first_logical_block;
unsigned last = nblocks - map_offset;
@@ -242,7 +243,7 @@ do_mpage_readpage(struct bio *bio, struc
*first_logical_block = block_in_file;
}
 
-   if (!buffer_mapped(map_bh)) {
+   if (!buffer_mapped(map_bh) || buffer_unwritten(map_bh)) {
fully_mapped = 0;
if (first_hole == blocks_per_page)
first_hole = page_block;
Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_aops.c  2007-06-05 
22:14:39.0 +1000
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c   2007-06-27 22:39:29.545636749 
+1000
@@ -1340,16 +1340,9 @@ __xfs_get_blocks(
return 0;
 
if (iomap.iomap_bn != IOMAP_DADDR_NULL) {
-   /*
-* For unwritten extents do not report a disk address on
-* the read case (treat as if we're reading into a hole).
-*/
-   if (create || !(iomap.iomap_flags  IOMAP_UNWRITTEN)) {
-   xfs_map_buffer(bh_result, iomap, offset,
-  inode-i_blkbits);
-   }
-   if (create  (iomap.iomap_flags  IOMAP_UNWRITTEN)) {
-   if (direct)
+   xfs_map_buffer(bh_result, iomap, offset, inode-i_blkbits);
+   if (iomap.iomap_flags  IOMAP_UNWRITTEN) {
+   if (create  direct)
bh_result-b_private = inode;
set_buffer_unwritten(bh_result);
}


Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Andreas Gruenbacher
On Wednesday 27 June 2007 12:58, Kyle Moffett wrote:
 I seem to recall you could actually end up racing and building a path
 to the file in those directories as a/d/0/3 or some other path at
 which it never even remotely existed. I'd love to be wrong,

Cheer up, you recall wrong.

 but I can't help but see this problem in any reverse-pathname-generation
 proposal which gets the locking right.

Have a look at how __d_path() is implemented (with the fixes): It takes the 
dcache_lock, and the vfsmount_lock where necessary, and this ensures that the 
pathname can't change under it, neither because of a rename nor unlink nor 
remount. The pathname computed is *exactly* the name the file has at that 
specific point time.

A few more details about how pathnames work are explained in the tech doc at:
http://forge.novell.com/modules/xfcontent/downloads.php/apparmor/LKML_Submission-May_07

Andreas
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Adrian Bunk
On Tue, Jun 26, 2007 at 07:47:00PM -0700, Andrew Morton wrote:
 On Tue, 26 Jun 2007 19:24:03 -0700 John Johansen [EMAIL PROTECTED] wrote:
 
   
   so...  where do we stand with this?  Fundamental, irreconcilable
   differences over the use of pathname-based security?
   
  There certainly seems to be some differences of opinion over the use
  of pathname-based-security.
 
 I was refreshed to have not been cc'ed on a lkml thread for once.  I guess
 it couldn't last.
 
 Do you agree with the irreconcilable part?  I think I do.
 
 I suspect that we're at the stage of having to decide between
 
 a) set aside the technical issues and grudgingly merge this stuff as a
service to Suse and to their users (both of which entities are very
important to us) and leave it all as an object lesson in
how-not-to-develop-kernel-features.
 
Minimisation of the impact on the rest of the kernel is of course
very important here.
 
 versus
 
 b) leave it out and require that Suse wear the permanent cost and
quality impact of maintaining it out-of-tree.  It will still be an
object lesson in how-not-to-develop-kernel-features.
...

versus

c) if [1] AppArmor is considered to be something that wouldn't 
   be merged if it wasn't already widely deployed by Suse: leave it out, 
   work on an ideal solution [2], and let Suse wear the one-time cost
   of migrating their users to the ideal solution

One important point is that if AppArmor gets merged there will be much 
more distribution support for it, and many people on !Suse will start 
using it.

I'm not claiming to understand the technical details, but from both 
slightly reading over the previous discussions and the What are the 
advantages of AppArmor over SELinux? section in the AppArmor FAQ [3] my 
impression is that a main advantage of AppArmor are more user friendly 
userspace tools. Therefore, if [1] AppArmor is considered technically 
inferior to SELinux, it might still become more popular than SELinux 
simply because it's easier to use - and although it's technically 
inferior.

cu
Adrian

[1] note the if
[2] could be, but not necessarily, SELinux
[3] http://developer.novell.com/wiki/index.php/Apparmor_FAQ

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-27 Thread Anton Altaparmakov

On 27 Jun 2007, at 12:50, Chris Mason wrote:

On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote:

On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote:

On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote:

On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote:


[ ... fsblocks vs extent range mapping ]


iomaps can double as range locks simply because iomaps are
expressions of ranges within the file.  Seeing as you can only
access a given range exclusively to modify it, inserting an empty
mapping into the tree as a range lock gives an effective method of
allowing safe parallel reads, writes and allocation into the file.

The fsblocks and the vm page cache interface cannot be used to
facilitate this because a radix tree is the wrong type of tree to
store this information in. A sparse, range based tree (e.g. btree)
is the right way to do this and it matches very well with
a range based API.


I'm really not against the extent based page cache idea, but I  
kind of
assumed it would be too big a change for this kind of generic  
setup.  At

any rate, if we'd like to do it, it may be best to ditch the idea of
attach mapping information to a page, and switch to lookup  
mapping

information and range locking for a page.


Well the get_block equivalent API is extent based one now, and I'll
look at what is required in making map_fsblock a more generic call
that could be used for an extent-based scheme.

An extent based thing IMO really isn't appropriate as the main  
generic

layer here though. If it is really useful and popular, then it could
be turned into generic code and sit along side fsblock or underneath
fsblock...


Lets look at a typical example of how IO actually gets done today,
starting with sys_write():


Yes, this is very inefficient which is one of the reasons I don't use  
the generic file write helpers in NTFS.  The other reasons are that  
supporting larger logical block sizes than PAGE_CACHE_SIZE becomes a  
pain if it is not done this way when the write targets a hole as that  
requires all pages in the hole to be locked simultaneously which  
would mean dropping the page lock to acquire the others that are of  
lower page index and to then re-take the page lock which is horrible  
- much better to lock all at once from the outset and the other  
reason is that in NTFS there is such a thing as the initialized size  
of an attribute which basically states anything past this byte  
offset must be returned as 0 on read, i.e. it does not have to be  
read from disk at all, and on write beyond the initialized_size you  
have to zero on disk everything between the old initialized size and  
the start of the write before you begin writing and certainly before  
you update the initalized_size otherwise a concurrent read would see  
random old data from the disk.


For NTFS this effectively becomes:


sys_write(file, buffer, 1MB)


allocate space for the entire 1MB write

if write offset past the initialized_size zero out on disk starting  
at initialized_size up to the start offset for the write and update  
the initialized size to be equal to the start offset of the write


do {
	if (current position is in a hole and the NTFS logical block size is  
 PAGE_CACHE_SIZE) {

work on (NTFS logical block size / PAGE_CACHE_SIZE) pages in 
one go;
do_pages = vol-cluster_size / PAGE_CACHE_SIZE;
} else {
work on only one page;
do_pages = 1;
}
	fault in for read (do_pages*PAGE_CACHE_SIZE) bytes worth of source  
pages

grab do_pages worth of pages
prepare_write - attach buffers to grabbed pages
copy data from source to grabbedprepared pages
commit_write the copied pages by dirtying their buffers
} while (data left to write);

The allocation in advance is a huge win both in terms of avoiding  
fragmentation (NTFS still uses a very simple/stupid allocator so you  
get a lot of fragmentation if two processes write to different files  
simultaneously and do so in small chunks) and in terms of performance.


I have wondered whether I should perhaps turn on the multi page  
stuff on for all writes rather than just for ones that go into a hole  
and the logical size is greater than the PAGE_CACHE_SIZE as that  
might improve performance even further but I haven't had the time/ 
inclination to experiment...


And I have also wondered whether to go direct to bio/wholes pages at  
once instead of bothering with dirtying each buffer but the buffers  
(which are always 512 bytes on NTFS) allow me to easily support  
dirtying smaller parts of the page which is desired at least on  
volumes with a logical block size  PAGE_CACHE_SIZE as different bits  
of the page could then reside on completely different locations on  
disk so writing out unneeded bits of the page could result in a lot  
of wasted disk head seek times.


Best regards,

Anton


for each page:

Re: [RFD 0/4] AppArmor - Don't pass NULL nameidata to vfs_create/lookup/permission IOPs

2007-06-27 Thread Andreas Gruenbacher
On Wednesday 27 June 2007 01:46, Trond Myklebust wrote:
 On Tue, 2007-06-26 at 16:15 -0700, [EMAIL PROTECTED] wrote:
  To remove conditionally passing of vfsmounts to the LSM, a nameidata
  struct can be instantiated in the nfsd and mqueue filesystems.  This
  however results in useless information being passed down, as not
  all fields in the nameidata struct will be meaingful.  The nameidata
  struct is split creating struct nameidata2 that contains only the
  fields
  that will carry meaningful information.
 
 I don't object to the concept per se, but could you please give it a
 more descriptive name please? struct vfs_intent would be a lot more
 accurate than nameidata2.

Yes, the name is pretty arbitrary. vfs_intent is better, even though the 
struct doesn't only include the intent data.

Thanks,
Andreas
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Crispin Cowan
Adrian Bunk wrote:
 On Tue, Jun 26, 2007 at 07:47:00PM -0700, Andrew Morton wrote:
   
 Do you agree with the irreconcilable part?  I think I do. 
I am hoping for a reconciliation where the people who don't like
AppArmor live with it by not using it. AppArmor is not intended to
replace SELinux, it is intended to address a different set of goals.

 I suspect that we're at the stage of having to decide between

 a) set aside the technical issues and grudgingly merge this stuff as a
service to Suse and to their users (both of which entities are very
important to us) and leave it all as an object lesson in
how-not-to-develop-kernel-features.

Minimisation of the impact on the rest of the kernel is of course
very important here.

 versus

 b) leave it out and require that Suse wear the permanent cost and
quality impact of maintaining it out-of-tree.  It will still be an
object lesson in how-not-to-develop-kernel-features.
 ...
 versus

 c) if [1] AppArmor is considered to be something that wouldn't 
be merged if it wasn't already widely deployed by Suse: leave it out, 
work on an ideal solution [2], and let Suse wear the one-time cost
of migrating their users to the ideal solution
   
We argue that the proposed patch is a viable solution for providing
AppArmor functionality. We would be happy for specific suggestions on
how to make it better.

 I'm not claiming to understand the technical details, but from both 
 slightly reading over the previous discussions and the What are the 
 advantages of AppArmor over SELinux? section in the AppArmor FAQ [3] my 
 impression is that a main advantage of AppArmor are more user friendly 
 userspace tools. Therefore, if [1] AppArmor is considered technically 
 inferior to SELinux, it might still become more popular than SELinux 
 simply because it's easier to use - and although it's technically 
 inferior.  
AppArmor's advantages come from the model, not the tools. AppArmor is
not inferior to SELinux, it is different than SELinux. Neither can
replace the other without horrid kludges.

Crispin

-- 
Crispin Cowan, Ph.D.   http://crispincowan.com/~crispin/
Director of Software Engineering   http://novell.com
AppArmor Chat: irc.oftc.net/#apparmor

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Sean
On Wed, 27 Jun 2007 14:06:04 -0700
Crispin Cowan [EMAIL PROTECTED] wrote:

 I am hoping for a reconciliation where the people who don't like
 AppArmor live with it by not using it. AppArmor is not intended to
 replace SELinux, it is intended to address a different set of goals.

You keep saying that.   But for that to be true you'd have to believe
_everyone_ using Novell distributions has needs that align exactly
with AppArmor.  Otherwise, how to explain that you don't offer and
support both SELinux and AppArmor to your users?

It seems as far as Novell is concerned, AppArmor _is_ meant to replace
SELinux.  Not that there is really anything wrong with that, but it's
a little disingenuous to then argue that they're meant to coexist.

Sean
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Crispin Cowan
Sean wrote:
 On Wed, 27 Jun 2007 14:06:04 -0700
 Crispin Cowan [EMAIL PROTECTED] wrote:
   
 I am hoping for a reconciliation where the people who don't like
 AppArmor live with it by not using it. AppArmor is not intended to
 replace SELinux, it is intended to address a different set of goals.
 
 You keep saying that.   But for that to be true you'd have to believe
 _everyone_ using Novell distributions has needs that align exactly
 with AppArmor.  Otherwise, how to explain that you don't offer and
 support both SELinux and AppArmor to your users?
   
They are meant to co-exist in the Linux kernel source tree.

It is a fact that there exist use cases where AppArmor is incapable of
meeting the need and SELinux is just the right thing. It is Novell's
business judgment that there are not enough of those situations in our
customer base to be worth the additional expense at this time.

But we do not want to prevent other people from using SELinux if it
suits them. Linux is about choice, and that is especially vital in
security. As Linus himself observed when LSM was started, there are a
lot of security models, they have various strengths and weaknesses, and
often are not compatible with each other. That is why it is important
that LSM persist, that SELinux not be the only in-tree user of LSM, and
why we think AppArmor should be included upstream, so that non-SUSE
users can also use AppArmor if it suits them.

Crispin

-- 
Crispin Cowan, Ph.D.   http://crispincowan.com/~crispin/
Director of Software Engineering   http://novell.com
AppArmor Chat: irc.oftc.net/#apparmor

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread David Miller
From: Crispin Cowan [EMAIL PROTECTED]
Date: Wed, 27 Jun 2007 15:46:57 -0700

 But we do not want to prevent other people from using SELinux if it
 suits them. Linux is about choice, and that is especially vital in
 security. As Linus himself observed when LSM was started, there are a
 lot of security models, they have various strengths and weaknesses, and
 often are not compatible with each other. That is why it is important
 that LSM persist, that SELinux not be the only in-tree user of LSM, and
 why we think AppArmor should be included upstream, so that non-SUSE
 users can also use AppArmor if it suits them.

Anyone can apply the apparmour patch to their tree, they get the
choice that way.  Nobody is currently prevented from using apparmour
if they want to, any such suggestion is pure rubbish.

It is even more incredulious to imply that just by having apparmour
in the upstream kernel all the userland bits will magically appear
on every user's distribution.

Give me a break.

What you get by the code going into the upstream kernel tree is that
it a) adds some pseudo legitimacy to AppArmour (which I don't
personally think is warranted) and b) gets the work of keeping
apparmour working with upstream largely off of your back and in the
hands of the upstream community.

Neither of those are reasons why something should go into the tree.

Frankly I think AppArmour is a joke, and all of this integration with
LSM business is just a face saving effort, nothing more.  And saving
face is not, and has never been, a reason for something to be put into
the upstream tree.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-27 Thread Nathan Scott
On Wed, 2007-06-27 at 23:36 +1000, David Chinner wrote:
  Allows setup_swap_extents() to use preallocated files on XFS
 filesystems for swap files without ever needing to convert them.

Using unwritten extents (as opposed to the MKSWAP flag mentioned
earlier) has the unfortunate down side of requiring transactions,
possibly additional IO, and memory allocation during swap.  (but,
this patch should probably go in regardless, as teaching generic
code about unwritten extents is not a bad idea).

cheers.

--
Nathan

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Casey Schaufler

--- David Miller [EMAIL PROTECTED] wrote:

 From: Crispin Cowan [EMAIL PROTECTED]
 Date: Wed, 27 Jun 2007 15:46:57 -0700
 
  But we do not want to prevent other people from using SELinux if it
  suits them. Linux is about choice, and that is especially vital in
  security. As Linus himself observed when LSM was started, there are a
  lot of security models, they have various strengths and weaknesses, and
  often are not compatible with each other. That is why it is important
  that LSM persist, that SELinux not be the only in-tree user of LSM, and
  why we think AppArmor should be included upstream, so that non-SUSE
  users can also use AppArmor if it suits them.
 
 Anyone can apply the apparmour patch to their tree, they get the
 choice that way.  Nobody is currently prevented from using apparmour
 if they want to, any such suggestion is pure rubbish.

The exact same argument was made prior to SELinux going upstream.
Look, if you can't be right, try at least to be original.

 It is even more incredulious to imply that just by having apparmour
 in the upstream kernel all the userland bits will magically appear
 on every user's distribution.

Just like all the SELinux userland magically appeared in everyone's
distribution? Nope, didn't happen.

 Give me a break.

No. You are out of line and spewing ignorance.

 What you get by the code going into the upstream kernel tree is that
 it a) adds some pseudo legitimacy to AppArmour (which I don't
 personally think is warranted) and b) gets the work of keeping
 apparmour working with upstream largely off of your back and in the
 hands of the upstream community.

Duh. Those are pretty much the reasons anyone goes through the
trouble of getting anything upstream.

 Neither of those are reasons why something should go into the tree.

They reflect the corporate reality of the open source community.
If you're going to go down the open source isn't for money
rathole please take it elsewhere. I've heard the arguments so many
times I can sing them to the tune of Lady Madonna.

 Frankly I think AppArmour is a joke,

SELinux, AppArmor, and Hilary Clinton walk into a bar ...

 and all of this integration with
 LSM business is just a face saving effort, nothing more.  And saving
 face is not, and has never been, a reason for something to be put into
 the upstream tree.

Believe what you will. Crispin has been working with LSM from the
inception those many years ago. He's been working on getting this
module in for over a year. If you don't like his module go write
your own and put him out of business.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread David Miller
From: Casey Schaufler [EMAIL PROTECTED]
Date: Wed, 27 Jun 2007 17:27:17 -0700 (PDT)

 --- David Miller [EMAIL PROTECTED] wrote:
 
  Neither of those are reasons why something should go into the tree.
 
 They reflect the corporate reality of the open source community.
 If you're going to go down the open source isn't for money
 rathole please take it elsewhere. I've heard the arguments so many
 times I can sing them to the tune of Lady Madonna.

That's a wonderful technical reason for apparmour to go into the tree.
Thanks for sharing and making it even clearer what is really behind
this submission.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-27 Thread David Chinner
On Thu, Jun 28, 2007 at 09:28:36AM +1000, Nathan Scott wrote:
 On Wed, 2007-06-27 at 23:36 +1000, David Chinner wrote:
   Allows setup_swap_extents() to use preallocated files on XFS
  filesystems for swap files without ever needing to convert them.
 
 Using unwritten extents (as opposed to the MKSWAP flag mentioned
 earlier) has the unfortunate down side of requiring transactions,
 possibly additional IO, and memory allocation during swap.  (but,
 this patch should probably go in regardless, as teaching generic
 code about unwritten extents is not a bad idea).

I don't think it does - swapfile I/O looks like it goes direct to
bio without passing through the filesystem.  When the swapfile is
mapped, it scans and records the extent map of the entire swapfile
in a separate structure and AFAICT the swap code uses that built map
without touching the filesystem at all.

If that is true then the written/unwritten state of the extents is
irrelevant; all we need is allocated disk space for the file and
swapping should work. And it's not like anyone should be reading
the contents of that swapfile through the filesystem, either. ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-27 Thread Nathan Scott
On Thu, 2007-06-28 at 10:39 +1000, David Chinner wrote:
 
 
 I don't think it does - swapfile I/O looks like it goes direct to
 bio without passing through the filesystem.  When the swapfile is
 mapped, it scans and records the extent map of the entire swapfile
 in a separate structure and AFAICT the swap code uses that built map
 without touching the filesystem at all.
 
 If that is true then the written/unwritten state of the extents is
 irrelevant; all we need is allocated disk space for the file and
 swapping should work. And it's not like anyone should be reading
 the contents of that swapfile through the filesystem, either. ;)

Ah, yes, good point - thats true.  Unwritten extents are ideal for
this then, as attempts to read swap via the regular interfaces will
return zeros instead of random swapped out memory contents.

cheers.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC:PATCH] How best to handle implicit clearing of setuid/setgid bits on NFS?

2007-06-27 Thread Jeff Layton
On Wed, 27 Jun 2007 18:15:55 -0400
Trond Myklebust [EMAIL PROTECTED] wrote:

 On Tue, 2007-05-29 at 12:47 -0400, Jeff Layton wrote:
  I've been looking at issue of clearing setuid/setgid bits when a file
  is written to on NFS. Here's the problem in a nutshell:
  
  We have 2 users. test1 and test2. Both are members of the group
  testgrp:
  
  [EMAIL PROTECTED] ls -l f1
  -rwxrwsr-x 1 test1 testgrp 2 2007-05-29 12:23 f1
  [EMAIL PROTECTED] echo foo  f1
  -bash: f1: Permission denied
  
  ...and f1 is unchanged. The problem is that the VFS calls remove_suid
  to wipe the setgid bit. This ends up causing a SETATTR call, which
  fails on NFS because we're attempting to remove these bits as user
  test2.
  
  Until recently, the situation here was worse. The VFS would truncate
  the file first and then try to clear the setgid bit. The truncate would
  succeed, but the perm change would fail. You'd end up with a zero-length
  file. This was fixed my making the size change and bit-clearing go via
  the same setattr call, so the whole operation just errors out now.
  
  My question is -- Is there anything we can do to make this work as it
  does on a local filesystem? Ideally there would be some way to tell the
  server clear the setuid/gid bits, without actually modifying the
  contents of the file. Is there a NFS call we can use that would do this?
  
  The only thing I can think of is to read the first byte of the file and
  then overwrite it with the same data, but that seems racy and may have
  other problems (and what do you do with a zero-length, setuid file?).
  
  Any suggestions appreciated...
 
 The answer should be simple: leave it to the server. All servers I know
 (except possibly Hummingbird) will clear the setuid/setgid bit whenever
 the file is modified. Anything else would be _extremely_ racy anyway:
 Consider the scenario where you are preparing to write to the file, when
 suddenly a user on another client makes the file setuid after you have
 open()ed the file, but before you actually issue the write().
 
 IOW: calling remove_suid() on the client is completely redundant, and
 should be suppressed.
 
 Trond
 

Ok. This is a bit more complex now since we remove suid bits on
truncate, but don't set ATTR_FORCE.

Here's a patch that should do this. I know there's a general
aversion to adding new flags to vfs structures, but I couldn't think of
a way to cleanly do this without adding one.

Note that I've not tested this patch at all so this is just a RFC.

CC'ing Al here since he's expressed interest in this problem as well.

Thoughts?

Signed-off-by: Jeff Layton [EMAIL PROTECTED]

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index ca20d3c..afdd82e 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -71,7 +71,7 @@ static struct file_system_type nfs_fs_type = {
.name   = nfs,
.get_sb = nfs_get_sb,
.kill_sb= nfs_kill_super,
-   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID,
 };
 
 struct file_system_type nfs_xdev_fs_type = {
@@ -79,7 +79,7 @@ struct file_system_type nfs_xdev_fs_type = {
.name   = nfs,
.get_sb = nfs_xdev_get_sb,
.kill_sb= nfs_kill_super,
-   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID,
 };
 
 static const struct super_operations nfs_sops = {
@@ -107,7 +107,7 @@ static struct file_system_type nfs4_fs_type = {
.name   = nfs4,
.get_sb = nfs4_get_sb,
.kill_sb= nfs4_kill_super,
-   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID,
 };
 
 struct file_system_type nfs4_xdev_fs_type = {
@@ -115,7 +115,7 @@ struct file_system_type nfs4_xdev_fs_type = {
.name   = nfs4,
.get_sb = nfs4_xdev_get_sb,
.kill_sb= nfs4_kill_super,
-   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID,
 };
 
 struct file_system_type nfs4_referral_fs_type = {
@@ -123,7 +123,7 @@ struct file_system_type nfs4_referral_fs_type = {
.name   = nfs4,
.get_sb = nfs4_referral_get_sb,
.kill_sb= nfs4_kill_super,
-   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID,
 };
 
 static const struct super_operations nfs4_sops = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6a41f4c..e6e18dd 100644
--- a/include/linux/fs.h
+++ 

Re: [RFC] fsblock

2007-06-27 Thread Nick Piggin
On Thu, Jun 28, 2007 at 08:35:48AM +1000, David Chinner wrote:
 On Wed, Jun 27, 2007 at 07:50:56AM -0400, Chris Mason wrote:
  Lets look at a typical example of how IO actually gets done today,
  starting with sys_write():
  
  sys_write(file, buffer, 1MB)
  for each page:
  prepare_write()
  allocate contiguous chunks of disk
  attach buffers
  copy_from_user()
  commit_write()
  dirty buffers
  
  pdflush:
  writepages()
  find pages with contiguous chunks of disk
  build and submit large bios
  
  So, we replace prepare_write and commit_write with an extent based api,
  but we keep the dirty each buffer part.  writepages has to turn that
  back into extents (bio sized), and the result is completely full of dark
  dark corner cases.

That's true but I don't think an extent data structure means we can
become too far divorced from the pagecache or the native block size
-- what will end up happening is that often we'll need stuff to map
between all those as well, even if it is only at IO-time.

But the point is taken, and I do believe that at least for APIs, extent
based seems like the best way to go. And that should allow fsblock to
be replaced or augmented in future without _too_ much pain.

 
 Yup - I've been on the painful end of those dark corner cases several
 times in the last few months.
 
 It's also worth pointing out that mpage_readpages() already works on
 an extent basis - it overloads bufferheads to provide a map_bh that
 can point to a range of blocks in the same state. The code then iterates
 the map_bh range a page at a time building bios (i.e. not even using
 buffer heads) from that map..

One issue I have with the current nobh and mpage stuff is that it
requires multiple calls into get_block (first to prepare write, then
to writepage), it doesn't allow filesystems to attach resources
required for writeout at prepare_write time, and it doesn't play nicely
with buffers in general. (not to mention that nobh error handling is
buggy).

I haven't done any mpage-like code for fsblocks yet, but I think they
wouldn't be too much trouble, and wouldn't have any of the above
problems...
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


DIO panic on 2.6.21.5

2007-06-27 Thread Badari Pulavarty

Hi Zach,

One of our perf. team ran into this while doing some runs.
I didn't see anything obvious - it looks like we converted
async IO to synchronous one. I didn't spend much time digging
around.

Is this a known issue ? Any ideas ?

Thanks,
Badari

[ cut here ]
kernel BUG at fs/direct-io.c:1113!
invalid opcode:  [1] SMP
CPU 11
Modules linked in: oprofile raw capability commoncap qla2xxx ipv6 button 
battery ac loop dm_mod tg3 ext3 jbd edd fan thermal processor scsi_transport_fc 
sg aic94xx libsas firmware_class scsi_transport_sas serverworks sd_mod scsePid: 
9709, comm: db2sysc Not tainted 2.6.21.5-smp #1
RIP: 0010:[802a7a1a]  [802a7a1a] 
__blockdev_direct_IO+0x96c/0xa4a

RSP: 0018:810c3d3efc08  EFLAGS: 00010202
RAX: 0246 RBX: 810041768b24 RCX: 80406918
RDX:  RSI: 0246 RDI: 810041768b24
RBP: 810041768800 R08: 0086 R09: 81003f0a2370
R10: 810c3f3ee9d8 R11: 802e9716 R12: 0001
R13: fdef R14: 810c5beb42b0 R15: 
FS:  2b3d14ffe940() GS:810c444c95c0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2b3e84f57000 CR3: 4511c000 CR4: 06e0
Process db2sysc (pid: 9709, threadinfo 810c3d3ee000, task 810041fee760)
Stack:  810c42f0fbc0 4c1f4000 811836306d48 0011803e7f95
0009  0008 
810041fee760 00090001 0001 810024ac9948
Call Trace:
[802a67c7] blkdev_direct_IO+0x45/0x4a
[802a66ec] blkdev_get_blocks+0x0/0x96
[80260108] generic_file_direct_IO+0xbd/0xfb
[802601a6] generic_file_direct_write+0x60/0xf5
[80260b70] __generic_file_aio_write_nolock+0x2e7/0x410
[8029a0ce] aio_read_evt+0x9a/0x108
[80260d8e] generic_file_aio_write_nolock+0x34/0x80
[80260d5a] generic_file_aio_write_nolock+0x0/0x80
[80299d23] aio_rw_vect_retry+0x72/0x14a
[8029a63f] aio_run_iocb+0x9a/0x134
[8029b10a] io_submit_one+0x311/0x35e
[8029b678] sys_io_submit+0x9b/0x101
[80229986] default_wake_function+0x0/0xe
[80209b8e] system_call+0x7e/0x83


Code: 0f 0b eb fe 4d 85 e4 75 1d 48 8b 74 24 08 44 89 ea 48 89 ef
RIP  [802a7a1a] __blockdev_direct_IO+0x96c/0xa4a
RSP 810c3d3efc08



-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Andreas Dilger
Any chance you can remove linux-fsdevel from the CC list?  I don't think this
has anything to do with filesystems.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html