vm/fs meetup in september?

2007-06-23 Thread Nick Piggin
I'd just like to take the chance also to ask about a VM/FS meetup some
time around kernel summit (maybe take a big of time during UKUUG or so).

I was thinking about trying to arrange a proper mini summit thing, but
it's a bit difficult and we could talk this year about doing it for
subsequent years. If there is a bit of interest, we could probably find
a small room somewhere this year on pretty short notice or do it as a
BOF or something.

I don't want to do it in the VM summit, because that kind of alienates
the filesystem guys. What I want to talk about is anything and everything
that the VM can do better to help the fs and vice versa.  I'd like to
stay away from memory management where not too applicable to the fs.

A few things I'd like to talk about are:

- the address space operations APIs, and their page based nature. I think
  it would be nice to generally move toward offset,length based ones as
  much as possible because it should give more efficiency and flexibility
  in the filesystem.

- write_begin API if it is still an issue by that date. Hope not :)

- truncate races

- fsblock if it hasn't been shot down by then

- how to make complex API changes without having to fix most things
  yourself.


Anyway, if you will be in the area and are interested, let me know (off
list) and we can work out time and place.

Thanks,
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-23 Thread William Lee Irwin III
On Sun, Jun 24, 2007 at 03:45:28AM +0200, Nick Piggin wrote:
> fsblock is a rewrite of the "buffer layer" (ding dong the witch is
> dead), which I have been working on, on and off and is now at the stage
> where some of the basics are working-ish. This email is going to be
> long...

Long overdue. Thank you.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-23 Thread Nick Piggin
On Sat, Jun 23, 2007 at 11:07:54PM -0400, Jeff Garzik wrote:
> Nick Piggin wrote:
> >- No deadlocks (hopefully). The buffer layer is technically deadlocky by
> >  design, because it can require memory allocations at page writeout-time.
> >  It also has one path that cannot tolerate memory allocation failures.
> >  No such problems for fsblock, which keeps fsblock metadata around for as
> >  long as a page is dirty (this still has problems vs get_user_pages, but
> >  that's going to require an audit of all get_user_pages sites. Phew).
> >
> >- In line with the above item, filesystem block allocation is performed
> >  before a page is dirtied. In the buffer layer, mmap writes can dirty a
> >  page with no backing blocks which is a problem if the filesystem is
> >  ENOSPC (patches exist for buffer.c for this).
> 
> This raises an eyebrow...  The handling of ENOSPC prior to mmap write is 
> more an ABI behavior, so I don't see how this can be fixed with internal 
> changes, yet without changing behavior currently exported to userland 
> (and thus affecting code based on such assumptions).

I believe people are happy to have it SIGBUS (which is how the VM
is already set up with page_mkwrite, and what fsblock does).
 
 
> >- An inode's metadata must be tracked per-inode in order for fsync to
> >  work correctly. buffer contains helpers to do this for basic
> >  filesystems, but any block can be only the metadata for a single inode.
> >  This is not really correct for things like inode descriptor blocks.
> >  fsblock can track multiple inodes per block. (This is non trivial,
> >  and it may be overkill so it could be reverted to a simpler scheme
> >  like buffer).
> 
> hrm; no specific comment but this seems like an idea/area that needs to 
> be fleshed out more, by converting some of the more advanced filesystems.

Yep. It's conceptually fairly simple though, and it might be easier
than having filesystems implement their own complex syncing that finds
and syncs everything themselves.


> >- Large block support. I can mount and run an 8K block size minix3 fs on
> >  my 4K page system and it didn't require anything special in the fs. We
> >  can go up to about 32MB blocks now, and gigabyte+ blocks would only
> >  require  one more bit in the fsblock flags. fsblock_superpage blocks
> >  are > PAGE_CACHE_SIZE, midpage ==, and subpage <.
> 
> definitely useful, especially if I rewrite my ibu filesystem for 2.6.x, 
> like I've been planning.

Yeah, it wasn't the primary motivation for the rewrite, but it would
be negligent to not even consider large blocks in such a rewrite, I
think.


> >So. Comments? Is this something we want? If yes, then how would we
> >transition from buffer.c to fsblock.c?
> 
> Your work is definitely interesting, but I think it will be even more 
> interesting once ext2 (w/ dir in pagecache) and ext3 (journalling) are 
> converted.

Well minix has dir in pagecache ;) But you're completely right: ext2
will be the next step and then ext3 and things like XFS and NTFS
will be the real test. I think I could eventually get ext2 done (one
of the biggest headaches is simply just converting ->b_data accesses),
however unlikely a journalling one.

 
> My gut feeling is that there are several problem areas you haven't hit 
> yet, with the new code.

I would agree with your gut :)

 
> Also, once things are converted, the question of transitioning from 
> buffer.c will undoubtedly answer itself.  That's the way several of us 
> handle transitions:  finish all the work, then look with fresh eyes and 
> conceive a path from the current code to your enhanced code.

Yeah that would be nice. It's very difficult because of so much
filesystem code. I'd say it would be feasible to step buffer.c into
fsblock.c, however if we were to track all (or even the common)
filesystems along with that it would introduce a huge number of
kind-of-redundant changes that I don't think all fs maintainers would
have time to write (and as I said, I can't do it myself). Anyway,
let's cross that bridge if and when we come to it.

For now, the big thing that needs to be done is convert a "big" fs
and see if the results tell us that it's workable.

Thanks for the comments Jeff.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-23 Thread Jeff Garzik

Nick Piggin wrote:

- No deadlocks (hopefully). The buffer layer is technically deadlocky by
  design, because it can require memory allocations at page writeout-time.
  It also has one path that cannot tolerate memory allocation failures.
  No such problems for fsblock, which keeps fsblock metadata around for as
  long as a page is dirty (this still has problems vs get_user_pages, but
  that's going to require an audit of all get_user_pages sites. Phew).

- In line with the above item, filesystem block allocation is performed
  before a page is dirtied. In the buffer layer, mmap writes can dirty a
  page with no backing blocks which is a problem if the filesystem is
  ENOSPC (patches exist for buffer.c for this).


This raises an eyebrow...  The handling of ENOSPC prior to mmap write is 
more an ABI behavior, so I don't see how this can be fixed with internal 
changes, yet without changing behavior currently exported to userland 
(and thus affecting code based on such assumptions).




- An inode's metadata must be tracked per-inode in order for fsync to
  work correctly. buffer contains helpers to do this for basic
  filesystems, but any block can be only the metadata for a single inode.
  This is not really correct for things like inode descriptor blocks.
  fsblock can track multiple inodes per block. (This is non trivial,
  and it may be overkill so it could be reverted to a simpler scheme
  like buffer).


hrm; no specific comment but this seems like an idea/area that needs to 
be fleshed out more, by converting some of the more advanced filesystems.




- Large block support. I can mount and run an 8K block size minix3 fs on
  my 4K page system and it didn't require anything special in the fs. We
  can go up to about 32MB blocks now, and gigabyte+ blocks would only
  require  one more bit in the fsblock flags. fsblock_superpage blocks
  are > PAGE_CACHE_SIZE, midpage ==, and subpage <.


definitely useful, especially if I rewrite my ibu filesystem for 2.6.x, 
like I've been planning.




So. Comments? Is this something we want? If yes, then how would we
transition from buffer.c to fsblock.c?


Your work is definitely interesting, but I think it will be even more 
interesting once ext2 (w/ dir in pagecache) and ext3 (journalling) are 
converted.


My gut feeling is that there are several problem areas you haven't hit 
yet, with the new code.


Also, once things are converted, the question of transitioning from 
buffer.c will undoubtedly answer itself.  That's the way several of us 
handle transitions:  finish all the work, then look with fresh eyes and 
conceive a path from the current code to your enhanced code.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-23 Thread Nick Piggin
Just clarify a few things. Don't you hate rereading a long work you
wrote? (oh, you're supposed to do that *before* you press send?).

On Sun, Jun 24, 2007 at 03:45:28AM +0200, Nick Piggin wrote:
> 
> I'm announcing "fsblock" now because it is quite intrusive and so I'd
> like to get some thoughts about significantly changing this core part
> of the kernel.
> 
> fsblock is a rewrite of the "buffer layer" (ding dong the witch is
> dead), which I have been working on, on and off and is now at the stage
> where some of the basics are working-ish. This email is going to be
> long...
> 
> Firstly, what is the buffer layer?  The buffer layer isn't really a
> buffer layer as in the buffer cache of unix: the block device cache
> is unified with the pagecache (in terms of the pagecache, a blkdev
> file is just like any other, but with a 1:1 mapping between offset
> and block).

I mean, in Linux, the block device cache is unified. UNIX I believe
did all its caching in a buffer cache, below the filesystem.

 
> - Large block support. I can mount and run an 8K block size minix3 fs on
>   my 4K page system and it didn't require anything special in the fs. We

Oh, and I don't have a Linux mkfs that makes minixv3 filesystems.
I had an image kindly made for me because I don't use minix. If
you want to test large block support, I won't email it to you though:
you can just convert ext2 or ext3 to fsblock ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/3] minix: convert to fsblock

2007-06-23 Thread Nick Piggin

Convert minix from buffer head to fsblock.

---
 fs/minix/bitmap.c   |  148 +--
 fs/minix/file.c |6 -
 fs/minix/inode.c|  172 ++--
 fs/minix/itree_common.c |  227 
 fs/minix/itree_v1.c |7 -
 fs/minix/itree_v2.c |7 -
 fs/minix/minix.h|   17 ++-
 7 files changed, 382 insertions(+), 202 deletions(-)

Index: linux-2.6/fs/minix/minix.h
===
--- linux-2.6.orig/fs/minix/minix.h
+++ linux-2.6/fs/minix/minix.h
@@ -1,4 +1,5 @@
 #include 
+#include 
 #include 
 #include 
 
@@ -37,16 +38,18 @@ struct minix_sb_info {
int s_dirsize;
int s_namelen;
int s_link_max;
-   struct buffer_head ** s_imap;
-   struct buffer_head ** s_zmap;
-   struct buffer_head * s_sbh;
+   struct fsblock_meta ** s_imap;
+   struct fsblock_meta ** s_zmap;
+   struct fsblock_meta * s_smblock;
struct minix_super_block * s_ms;
unsigned short s_mount_state;
unsigned short s_version;
 };
 
-extern struct minix_inode * minix_V1_raw_inode(struct super_block *, ino_t, 
struct buffer_head **);
-extern struct minix2_inode * minix_V2_raw_inode(struct super_block *, ino_t, 
struct buffer_head **);
+extern struct minix_inode * minix_V1_raw_inode(struct super_block *, ino_t, 
struct fsblock_meta **);
+extern void minix_put_raw_inode(struct super_block *sb, ino_t ino, struct 
fsblock_meta *mblock, struct minix_inode *p);
+extern struct minix2_inode * minix_V2_raw_inode(struct super_block *, ino_t, 
struct fsblock_meta **);
+extern void minix2_put_raw_inode(struct super_block *sb, ino_t ino, struct 
fsblock_meta *mblock, struct minix2_inode *p);
 extern struct inode * minix_new_inode(const struct inode * dir, int * error);
 extern void minix_free_inode(struct inode * inode);
 extern unsigned long minix_count_free_inodes(struct minix_sb_info *sbi);
@@ -60,8 +63,8 @@ extern void V2_minix_truncate(struct ino
 extern void minix_truncate(struct inode *);
 extern int minix_sync_inode(struct inode *);
 extern void minix_set_inode(struct inode *, dev_t);
-extern int V1_minix_get_block(struct inode *, long, struct buffer_head *, int);
-extern int V2_minix_get_block(struct inode *, long, struct buffer_head *, int);
+extern int V1_minix_insert_mapping(struct address_space *, loff_t, size_t, 
int);
+extern int V2_minix_insert_mapping(struct address_space *, loff_t, size_t, 
int);
 extern unsigned V1_minix_blocks(loff_t, struct super_block *);
 extern unsigned V2_minix_blocks(loff_t, struct super_block *);
 
Index: linux-2.6/fs/minix/itree_common.c
===
--- linux-2.6.orig/fs/minix/itree_common.c
+++ linux-2.6/fs/minix/itree_common.c
@@ -1,31 +1,29 @@
 /* Generic part */
 
 typedef struct {
-   block_t *p;
+   block_t *mem;
+   int offset;
block_t key;
-   struct buffer_head *bh;
+   struct fsblock_meta *mblock;
 } Indirect;
 
 static DEFINE_RWLOCK(pointers_lock);
 
-static inline void add_chain(Indirect *p, struct buffer_head *bh, block_t *v)
+static inline void add_chain(Indirect *p, struct fsblock_meta *mblock, block_t 
*mem, int offset)
 {
-   p->key = *(p->p = v);
-   p->bh = bh;
+   p->mem = mem;
+   p->offset = offset;
+   p->key = mem[offset];
+   p->mblock = mblock;
 }
 
 static inline int verify_chain(Indirect *from, Indirect *to)
 {
-   while (from <= to && from->key == *from->p)
+   while (from <= to && from->key == from->mem[from->offset])
from++;
return (from > to);
 }
 
-static inline block_t *block_end(struct buffer_head *bh)
-{
-   return (block_t *)((char*)bh->b_data + bh->b_size);
-}
-
 static inline Indirect *get_branch(struct inode *inode,
int depth,
int *offsets,
@@ -34,35 +32,43 @@ static inline Indirect *get_branch(struc
 {
struct super_block *sb = inode->i_sb;
Indirect *p = chain;
-   struct buffer_head *bh;
+   struct fsblock_meta *mblock;
 
*err = 0;
/* i_data is not going away, no lock needed */
-   add_chain (chain, NULL, i_data(inode) + *offsets);
+   add_chain (chain, NULL, i_data(inode), *offsets);
if (!p->key)
-   goto no_block;
+   goto out;
while (--depth) {
-   bh = sb_bread(sb, block_to_cpu(p->key));
-   if (!bh)
-   goto failure;
+   void *data;
+
+   mblock = sb_mbread(sb, block_to_cpu(p->key));
+   if (!mblock) {
+   *err = -EIO;
+   goto out;
+   }
read_lock(&pointers_lock);
-   if (!verify_chain(chain, p))
-   goto changed;
-   add_chain(++p, bh,

[patch 2/3] block_dev: convert to fsblock

2007-06-23 Thread Nick Piggin

Convert block_dev mostly to fsblocks.

---
 fs/block_dev.c  |  204 +++-
 fs/buffer.c |  113 ++--
 fs/super.c  |2 
 include/linux/buffer_head.h |9 -
 include/linux/fs.h  |   29 ++
 5 files changed, 225 insertions(+), 132 deletions(-)

Index: linux-2.6/fs/block_dev.c
===
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -16,7 +16,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -61,14 +63,14 @@ static void kill_bdev(struct block_devic
 {
if (bdev->bd_inode->i_mapping->nrpages == 0)
return;
-   invalidate_bh_lrus();
+   invalidate_bh_lrus(); /* XXX: this can go when buffers goes */
truncate_inode_pages(bdev->bd_inode->i_mapping, 0);
 }  
 
 int set_blocksize(struct block_device *bdev, int size)
 {
/* Size must be a power of two, and between 512 and PAGE_SIZE */
-   if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
+   if (size < 512 || !is_power_of_2(size))
return -EINVAL;
 
/* Size cannot be smaller than the size supported by the device */
@@ -92,7 +94,7 @@ int sb_set_blocksize(struct super_block 
if (set_blocksize(sb->s_bdev, size))
return 0;
/* If we get here, we know size is power of two
-* and it's value is between 512 and PAGE_SIZE */
+* and it's value is >= 512 */
sb->s_blocksize = size;
sb->s_blocksize_bits = blksize_bits(size);
return sb->s_blocksize;
@@ -112,19 +114,12 @@ EXPORT_SYMBOL(sb_min_blocksize);
 
 static int
 blkdev_get_block(struct inode *inode, sector_t iblock,
-   struct buffer_head *bh, int create)
+struct buffer_head *bh, int create)
 {
if (iblock >= max_block(I_BDEV(inode))) {
if (create)
return -EIO;
-
-   /*
-* for reads, we're just trying to fill a partial page.
-* return a hole, they will have to call get_block again
-* before they can fill it, and they will get -EIO at that
-* time
-*/
-   return 0;
+return 0;
}
bh->b_bdev = I_BDEV(inode);
bh->b_blocknr = iblock;
@@ -132,6 +127,66 @@ blkdev_get_block(struct inode *inode, se
return 0;
 }
 
+static int blkdev_insert_mapping(struct address_space *mapping, loff_t off,
+   size_t len, int create)
+{
+   sector_t blocknr;
+   struct inode *inode = mapping->host;
+   pgoff_t next, end;
+   struct pagevec pvec;
+   int ret = 0;
+
+   pagevec_init(&pvec, 0);
+   next = off >> PAGE_CACHE_SHIFT;
+   end = (off + len) >> PAGE_CACHE_SHIFT;
+   blocknr = off >> inode->i_blkbits;
+   while (next <= end && pagevec_lookup(&pvec, mapping, next,
+   min(end - next, (pgoff_t)PAGEVEC_SIZE))) {
+   unsigned int i;
+
+   for (i = 0; i < pagevec_count(&pvec); i++) {
+   struct fsblock *block;
+   struct page *page = pvec.pages[i];
+
+   BUG_ON(page->index != next);
+   BUG_ON(blocknr != pgoff_sector(next, inode->i_blkbits));
+   BUG_ON(!PageLocked(page));
+
+   if (blocknr >= max_block(I_BDEV(inode))) {
+   if (create)
+   ret = -ENOMEM;
+
+   /*
+* for reads, we're just trying to fill a
+* partial page.  return a hole, they will
+* have to call in again before they can fill
+* it, and they will get -EIO at that time
+*/
+   continue; /* xxx: could be smarter, stop now */
+   }
+
+   block = page_blocks(page);
+   if (fsblock_subpage(block)) {
+   struct fsblock *b;
+   for_each_block(block, b) {
+   if (!test_bit(BL_mapped, &b->flags))
+   map_fsblock(b, blocknr);
+   blocknr++;
+   }
+   } else {
+   if (!test_bit(BL_mapped, &block->flags))
+   map_fsblock(block, blocknr);
+   blocknr++;
+   }
+   next++;
+   }
+   pagevec_release(&pvec);
+   }
+
+   re

[RFC] fsblock

2007-06-23 Thread Nick Piggin

I'm announcing "fsblock" now because it is quite intrusive and so I'd
like to get some thoughts about significantly changing this core part
of the kernel.

fsblock is a rewrite of the "buffer layer" (ding dong the witch is
dead), which I have been working on, on and off and is now at the stage
where some of the basics are working-ish. This email is going to be
long...

Firstly, what is the buffer layer?  The buffer layer isn't really a
buffer layer as in the buffer cache of unix: the block device cache
is unified with the pagecache (in terms of the pagecache, a blkdev
file is just like any other, but with a 1:1 mapping between offset
and block).

There are filesystem APIs to access the block device, but these go
through the block device pagecache as well. These don't exactly
define the buffer layer either.

The buffer layer is a layer between the pagecache and the block
device for block based filesystems. It keeps a translation between
logical offset and physical block number, as well as meta
information such as locks, dirtyness, and IO status of each block.
This information is tracked via the buffer_head structure.

Why rewrite the buffer layer?  Lots of people have had a desire to
completely rip out the buffer layer, but we can't do that[*] because
it does actually serve a useful purpose. Why the bad rap? Because
the code is old and crufty, and buffer_head is an awful name. It must 
be among the oldest code in the core fs/vm, and the main reason is
because of the inertia of so many and such complex filesystems.

[*] About the furthest we could go is use the struct page for the
information otherwise stored in the buffer_head, but this would be
tricky and suboptimal for filesystems with non page sized blocks and
would probably bloat the struct page as well.

So why rewrite rather than incremental improvements? Incremental
improvements are logically the correct way to do this, and we probably
could go from buffer.c to fsblock.c in steps. But I didn't do this
because: a) the blinding pace at which things move in this area would
make me an old man before it would be complete; b) I didn't actually
know exactly what it was going to look like before starting on it; c)
I wanted stable root filesystems and such when testing it; and d) I
found it reasonably easy to have both layers coexist (it uses an extra
page flag, but even that wouldn't be needed if the old buffer layer
was better decoupled from the page cache).

I started this as an exercise to see how the buffer layer could be
improved, and I think it is working out OK so far. The name is fsblock
because it basically ties the fs layer to the block layer. I think
Andrew has wanted to rename buffer_head to block before, but block is
too clashy, and it isn't a great deal more descriptive than buffer_head.
I believe fsblock is.

I'll go through a list of things where I have hopefully improved on the
buffer layer, off the top of my head. The big caveat here is that minix
is the only real filesystem I have converted so far, and complex
journalled filesystems might pose some problems that water down its
goodness (I don't know).

- data structure size. struct fsblock is 20 bytes on 32-bit, and 40 on
  64-bit (could easily be 32 if we can have int bitops). Compare this
  to around 50 and 100ish for struct buffer_head. With a 4K page and 1K
  blocks, IO requires 10% RAM overhead in buffer heads alone. With
  fsblocks you're down to around 3%.

- Structure packing. A page gets a number of buffer heads that are
  allocated in a linked list. fsblocks are allocated contiguously, so
  cacheline footprint is smaller in the above situation.

- Data / metadata separation. I have a struct fsblock and a struct
  fsblock_meta, so we could put more stuff into the usually less used
  fsblock_meta without bloating it up too much. After a few tricks, these
  are no longer any different in my code, and dirty up the typing quite
  a lot (and I'm aware it still has some warnings, thanks). So if not
  useful this could be taken out.

- Locking. fsblocks completely use the pagecache for locking and lookups.
  The page lock is used, but there is no extra per-inode lock that buffer
  has. Would go very nicely with lockless pagecache. RCU is used for one
  non-blocking fsblock lookup (find_get_block), but I'd really rather hope
  filesystems can tolerate that blocking, and get rid of RCU completely.
  (actually this is not quite true because mapping->private_lock is still
  used for mark_buffer_dirty_inode equivalent, but that's a relatively
  rare operation).

- Coupling with pagecache metadata. Pagecache pages contain some metadata
  that is logically redundant because it is tracked in buffers as well
  (eg. a page is dirty if one or more buffers are dirty, or uptodate if
  all buffers are uptodate). This is great because means we can avoid that
  layer in some situations, but they can get out of sync. eg. if a
  filesystem writes a buffer out by hand, its pagecache page will stay
  dirty

Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-23 Thread Toshiharu Harada
This thread is amazing.  With so many smart people's precious time,

What are the results?
What are the issues anyway?
Is anyone happy? (I'm not and I assume Chris is not)

Yes, "waste of time" is taking place here, but
it's not for "pathname-based MAC" but for "wrongly posted messages",
I believe.  I'm a relatively new to this ml, let me ask.

Is this ml a place of judge or battle? (not to help or support?)

Nothing is perfect, so we can work to make things to better, right?
I have suggestions:

Let's clarify issues first.
- problems (or limitations) of pathname-based MAC
- advantages of pathname-based MAC
- how can pathname-based MAC supplement label based
(Stephen, James and Kyle, please help)

Let's start the arguments again if we get the issues.
Threads should be definitely separated per issue and
a assigning a chair may help.

Above issues are independent of SELinux. We should not *compare*
SELinux and AA, that can cause a problem. Every software has
shortages that's why we need to work and we can make progress.
For some issues we may need to compare them, in that case
moderators would help.

BTW I have posted a RFC of TOMOYO Linux that is another
pathname-based MAC.
http://lkml.org/lkml/2007/6/13/58
AA and TOMOYO Linux have BoF sessions at OLS2007,
so it would be a great opportunity to *talk* over the issues.

What I want to say is "let's make progress and help each other
to make Linux better".

Thank you,
Toshiharu Harada

Chris Wright wrote:
> * Chris Mason ([EMAIL PROTECTED]) wrote:
>> I'm sure people there will have a different versions of events.  The
>> one part that was discussed was if pathname based security was
>> useful, and a number of the people in the room (outside of 
>> novell) said it was.  Now, it could be that nobody wanted to argue
>> anymore, since most opinions had come out on one list or another by
>> then.  
> 
> Indeed.  The trouble is that's too high level compared with the actual
> implementation details.  AA is stalled because it has failed to get
> VFS support for it's model.  I don't see a nice way out unless it
> changes it's notion of policy language (globbing is the tough one)
> or gets traction to pass dentry/vfsmount all the way down.  Paths are
> completely relevant for security, esp. when considering the parent dir
> and the leaf (as in forward lookup case).  Retroactively creating the
> full path is at the minimum ugly, and in the worst case can be insecure
> (yes AA has taken many measures to mitigate that insecurity).
> 
>> But as someone who doesn't use either SElinux or AA, I really hope
>> we can get past the part of the debate where:
>>
>> while(1)
>> AA) we think we're making users happy with pathname security
>> SELINUX) pathname security sucks
> 
> Yes.  Please.  Both parties are miserably failing the sanity test.
> Doing the same thing over and over and expecting different results.
> 
> AA folks: deal with the VFS issues that your patchset have in a palatable
> way (which does not include passing NULL when it's inconvenient to
> do otherwise).  You've already missed an opportunity with Christoph's
> suggestions for changes in NFS.  I know you've considered many alternative
> approaches and consistently hit dead ends.  But please note, if you
> have coded yourself into a corner because of your policy language,
> that's your issue to solve, not ours.
> 
> SELinux folks: do something useful rather than quibbling over the TCSEC
> definition of MAC and AA's poor taste in marketing literature.  Here's
> some suggestions:
> 
> 1) Make SELinux usable (it's *still* the number one complaint).  While
> this is a bit of a cheap shot, it really is one of the core reasons AA
> advocates exist.
> 2) Work on a variant of Kyle's suggestion to squash the relevancy of AA.
> 3) Write an effective exploit against AA that demonstrates the fundamental
> weakness of the model (better make sure it's not also an issue for
> targetted policy).

-- 
Toshiharu Harada
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-23 Thread Toshiharu Harada

This thread is amazing.  With so many smart people's precious time,

What are the results?
What are the issues anyway?
Is anyone happy? (I'm not and I assume Chris is not)

Yes, "waste of time" is taking place here, but
it's not for "pathname-based MAC" but for "wrongly posted messages",
I believe.  I'm a relatively new to this ml, let me ask.

Is this ml a place of judge or battle? (not to help or support?)

Nothing is perfect, so we can work to make things to better, right?
I have suggestions:

Let's clarify issues first.
- problems (or limitations) of pathname-based MAC
- advantages of pathname-based MAC
- how can pathname-based MAC supplement label based
(Stephen, James and Kyle, please help)

Let's start the arguments again if we get the issues.
Threads should be definitely separated per issue and
a assigning a chair may help.


Well, I crated a Wiki page. If it helps, please
feel free to use it.  I mean I would like
people to add your issues here.  It's wiki, so
you are welcome to modify everything.

http://tomoyo.sourceforge.jp/wiki-e/?MAC-ISSUES

If ml is better, I have no objections.
I just wanted to help discussion.


Above issues are independent of SELinux. We should not *compare*
SELinux and AA, that can cause a problem. Every software has
shortages that's why we need to work and we can make progress.
For some issues we may need to compare them, in that case
moderators would help.

BTW I have posted a RFC of TOMOYO Linux that is another
pathname-based MAC.
http://lkml.org/lkml/2007/6/13/58
AA and TOMOYO Linux have BoF sessions at OLS2007,
so it would be a great opportunity to *talk* over the issues.

What I want to say is "let's make progress and help each other
to make Linux better".


Cheers,
Toshiharu Harada

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [-mm PATCH] ocfs2: ->fallocate() support

2007-06-23 Thread Mark Fasheh
On Sat, Jun 23, 2007 at 09:54:15AM -0700, Andrew Morton wrote:
> > On Thu, 21 Jun 2007 12:01:43 -0700 Mark Fasheh <[EMAIL PROTECTED]> wrote:
> > Plug ocfs2 into the ->fallocate() callback. We only support FA_ALLOCATE for
> > now - FA_DEALLOCATE will come later.
> 
> I get a healthy reject when applying this against your own tree.  Perhaps
> because my copy of it is a few days old, dunno.

Eek, yeah I based it on a pretty recent version of my stuff.


> I think I'll duck it, allow you to merge this in the normal fashion once
> fallocate hits mainline, OK?

Sounds good to me. Could I trouble you to add me to the CC for one of the
patches that add ->fallocate() so I know when it gets sent upstream?

Thanks,
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [-mm PATCH] ocfs2: ->fallocate() support

2007-06-23 Thread Andrew Morton
> On Thu, 21 Jun 2007 12:01:43 -0700 Mark Fasheh <[EMAIL PROTECTED]> wrote:
> Plug ocfs2 into the ->fallocate() callback. We only support FA_ALLOCATE for
> now - FA_DEALLOCATE will come later.

I get a healthy reject when applying this against your own tree.  Perhaps
because my copy of it is a few days old, dunno.

I think I'll duck it, allow you to merge this in the normal fashion once
fallocate hits mainline, OK?

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/26] Mount writer count and read-only bind mounts

2007-06-23 Thread Andrew Morton
> On Fri, 22 Jun 2007 13:03:03 -0700 Dave Hansen <[EMAIL PROTECTED]> wrote:
> Why do we need r/o bind mounts?
> 
> This feature allows a read-only view into a read-write filesystem.
> In the process of doing that, it also provides infrastructure for
> keeping track of the number of writers to any given mount.
> 
> This has a number of uses.  It allows chroots to have parts of
> filesystems writable.  It will be useful for containers in the future
> because users may have root inside a container, but should not
> be allowed to write to somefilesystems.  This also replaces 
> patches that vserver has had out of the tree for several years.
> 
> It allows security enhancement by making sure that parts of
> your filesystem read-only (such as when you don't trust your
> FTP server), when you don't want to have entire new filesystems
> mounted, or when you want atime selectively updated.
> I've been using the following script to test that the feature is
> working as desired.  It takes a directory and makes a regular
> bind and a r/o bind mount of it.  It then performs some normal
> filesystem operations on the three directories, including ones
> that are expected to fail, like creating a file on the r/o
> mount.

Doesn't selinux do some of this?

My overall reaction: owch.  There's a ton of tricksy code here and great
potential for us to accidentally break it in the future by forgetting a
mnt_may_write() as the kernel evolves.  And then there's the added
complexity and the added runtime overhead.

Balance that against some pretty obscure-looking benefits and I'm
struggling to see how a merge is justifiable?

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 25/26] r/o bind mounts: scalable writer count

2007-06-23 Thread Andrew Morton
> On Fri, 22 Jun 2007 13:03:36 -0700 Dave Hansen <[EMAIL PROTECTED]> wrote:
> +DEFINE_PER_CPU(struct mnt_writer, mnt_writers);

this can have static scope.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/26] r/o bind mounts: stub functions

2007-06-23 Thread Andrew Morton
> On Fri, 22 Jun 2007 13:03:09 -0700 Dave Hansen <[EMAIL PROTECTED]> wrote:
> This patch adds two function mnt_want_write() and
> mnt_drop_write()

ITYM "global, exported-to-modules yet 100% undocumented" functions.

tsk.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/26] filesystem helpers for custom 'struct file's

2007-06-23 Thread Andrew Morton
> On Fri, 22 Jun 2007 13:03:08 -0700 Dave Hansen <[EMAIL PROTECTED]> wrote:
> Christoph H. says this stands on its own and can go in before the
> rest of the r/o bind mount set.  
> 
> ---
> 
> Some filesystems forego the vfs and may_open() and create their
> own 'struct file's.
> 
> This patch creates a couple of helper functions which can be
> used by these filesystems, and will provide a unified place
> which the r/o bind mount code may patch.
> 
> Also, rename two existing, static-scope init_file() to less
> generic names.

the sysfs changes are no longer applicable here due to pending changes in
Greg's tree.

The net/socket.c change is a mystery.  My version of sock_attach_fd()
doesn't look like yours.

So I snarfed the first three patches, passed on this one and shall await a
quality review-and-ack from someone who is appropriately familiar with this
code, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 25/26] r/o bind mounts: scalable writer count

2007-06-23 Thread Miklos Szeredi
> I'd suggest something along these lines in final mntput:
> 
>   lock_and_coalesce_cpu_mnt_writer_counts();
>   mnt_unlock_cpus();
>   BUG_ON(atomic_read(&mnt->__mnt_writers));

Or rather a WARN_ON(), since it's not a fatal condition.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 25/26] r/o bind mounts: scalable writer count

2007-06-23 Thread Miklos Szeredi
> This uses a statically-allocated percpu variable.  All
> operations are local to a cpu as long that cpu operates on
> the same mount, and there are no writer count imbalances.
> Writer count imbalances happen when a write is taken on one
> cpu, and released on another, like when an open/close pair
> is performed on two different cpus because the task moved.
> 

I'd suggest something along these lines in final mntput:

lock_and_coalesce_cpu_mnt_writer_counts();
mnt_unlock_cpus();
BUG_ON(atomic_read(&mnt->__mnt_writers));

since there's basically no other we'll notice if there _is_ an
imbalance.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make iunique use a do/while loop rather than its obscure goto loop

2007-06-23 Thread Christoph Hellwig
On Wed, Apr 11, 2007 at 05:58:56PM -0400, Jeffrey Layton wrote:
> A while back, Christoph mentioned that he thought that iunique ought to be
> cleaned up to use a more conventional loop construct. This patch does that,
> turning the strange goto loop into a do/while.
> 
> Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 23fc1fd..90e7587 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -689,21 +689,18 @@ ino_t iunique(struct super_block *sb, ino_t 
> max_reserved)
>   struct inode *inode;
>   struct hlist_head * head;
>   ino_t res;
> +
>   spin_lock(&inode_lock);
> -retry:
> - if (counter > max_reserved) {
> - head = inode_hashtable + hash(sb,counter);
> + do {
> + if (counter <= max_reserved)
> + counter = max_reserved + 1;
>   res = counter++;
> + head = inode_hashtable + hash(sb, res);
>   inode = find_inode_fast(sb, head, res);
> - if (!inode) {
> - spin_unlock(&inode_lock);
> - return res;
> - }
> - } else {
> - counter = max_reserved + 1;
> - }
> - goto retry;
> - 
> + } while (inode != NULL);
> + spin_unlock(&inode_lock);
> +
> + return res;
>  }
>  
>  EXPORT_SYMBOL(iunique);
---end quoted text---
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 26/26] honor r/w changes at do_remount() time

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:37PM -0700, Dave Hansen wrote:
> 
> 
> Originally from: Herbert Poetzl <[EMAIL PROTECTED]>
> 
> This is the core of the read-only bind mount patch set.
> 
> Note that this does _not_ add a "ro" option directly to
> the bind mount operation.  If you require such a mount,
> you must first do the bind, then follow it up with a
> 'mount -o remount,ro' operation.
> 
> Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>

Ok.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/26] do_rmdir(): elevate write count

2007-06-23 Thread Christoph Hellwig
Ok.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/26] elevate mnt writers for vfs_unlink() callers

2007-06-23 Thread Christoph Hellwig
Ok.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/26] sys_mknodat(): elevate write count for vfs_mknod/create()

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:32PM -0700, Dave Hansen wrote:
> 
> 
> This takes care of all of the direct callers of vfs_mknod().
> Since a few of these cases also handle normal file creation
> as well, this also covers some calls to vfs_create().

Ok.

> diff -puN 
> fs/namei.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 
> fs/namei.c
> --- lxc/fs/namei.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 
> 2007-06-21 23:23:25.0 -0700
> +++ lxc-dave/fs/namei.c   2007-06-21 23:23:25.0 -0700
> @@ -1897,14 +1897,26 @@ asmlinkage long sys_mknodat(int dfd, con
>   if (!IS_ERR(dentry)) {
>   switch (mode & S_IFMT) {
>   case 0: case S_IFREG:
> + error = mnt_want_write(nd.mnt);
> + if (error)
> + break;
>   error = vfs_create(nd.dentry->d_inode,dentry,mode,&nd);
> + mnt_drop_write(nd.mnt);
>   break;
>   case S_IFCHR: case S_IFBLK:
> + error = mnt_want_write(nd.mnt);
> + if (error)
> + break;
>   error = vfs_mknod(nd.dentry->d_inode,dentry,mode,
>   new_decode_dev(dev));
> + mnt_drop_write(nd.mnt);
>   break;
>   case S_IFIFO: case S_IFSOCK:
> + error = mnt_want_write(nd.mnt);
> + if (error)
> + break;
>   error = vfs_mknod(nd.dentry->d_inode,dentry,mode,0);
> + mnt_drop_write(nd.mnt);
>   break;
>   case S_IFDIR:
>   error = -EPERM;

Should we just take the calls outside the switch statement?

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/26] elevate write count for do_sys_utime() and touch_atime()

2007-06-23 Thread Christoph Hellwig
Ok.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/26] elevate write count for do_utimes()

2007-06-23 Thread Christoph Hellwig
Ok.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/26] elevate writer count for do_sys_truncate()

2007-06-23 Thread Christoph Hellwig
Ok.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/26] nfs: check mnt instead of superblock directly

2007-06-23 Thread Christoph Hellwig
Ok.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/26] elevate write count over calls to vfs_rename()

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:25PM -0700, Dave Hansen wrote:
> 
> This also creates a little helper in the NFS code to
> make an if() a little bit less ugly.

That should probably be a separate patch.  Or better one to just rip out
the MSNFS ifdefs completely, they've always been true for about the last
ten years.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/26] mount_is_safe(): add comment

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:22PM -0700, Dave Hansen wrote:
> 
> 
> This area of code is currently #ifdef'd out, so add a comment
> for the time when it is actually used.

Ok.  Does this clash with the user mount patches?  Even if it does
I think we want this patch first in the series and fix the user mounts
up ontop of it.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/26] unix_find_other() elevate write count for touch_atime()

2007-06-23 Thread Christoph Hellwig
Ok.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/26] elevate write count for file_update_time()

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:21PM -0700, Dave Hansen wrote:
> 
> 
> Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>

We really want a guaranteed non-NULL file here, but I don't want to put
this on your plate also.  Please add a comment about bloody NFS exports
for now.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/26] elevate mnt writers for callers of vfs_mkdir()

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:16PM -0700, Dave Hansen wrote:
> 
> Pretty self-explanatory.  Fits in with the rest of the series.

Ok for this and similar patch 11-13.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/26] make access() use mnt check

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:14PM -0700, Dave Hansen wrote:
> 
> It is OK to let access() go without using a mnt_want/drop_write()
> pair because it doesn't actually do writes to the filesystem,
> and it is inherently racy anyway.  This is a rare case when it is
> OK to use __mnt_is_readonly() directly.

You probably want to add a big comment explaining why it's fine here.

That reminds me of something else I had in mind to debug that the
writer counts are okay:

 we should probably add a check in permission that we have an elevated
 writercount on the vfsmount/sb.  Of course we'll need some way to
 overrid it for access(), which means passing down a flag to it or
 something.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/26] elevate writer count for chown and friends

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:13PM -0700, Dave Hansen wrote:
> 
> 
> chown/chmod,etc... don't call permission in the same way
> that the normal "open for write" calls do.  They still
> write to the filesystem, so bump the write count during
> these operations.

Looks good. 

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/26] r/o bind mounts: elevate write count for some ioctls

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:12PM -0700, Dave Hansen wrote:
> 
> Some ioctl()s can cause writes to the filesystem.  Take
> these, and make them use mnt_want/drop_write() instead.
> 
> We need to pass the filp one layer deeper in XFS, but
> somebody _just_ pulled it out in February because nobody
> was using it, so I don't feel guilty for adding it back.

Could have been me :)

Would be nicer to just pass down the vfsmount instead of the file
in XFS, but it shouldn't really matter in the end.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/26] elevate write count open()'d files

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:10PM -0700, Dave Hansen wrote:
> 
> This is the first really tricky patch in the series.  It
> elevates the writer count on a mount each time a
> non-special file is opened for write.
> 
> This is not completely apparent in the patch because the
> two if() conditions in may_open() above the
> mnt_want_write() call are, combined, equivalent to
> special_file().
> 
> There is also an elevated count around the vfs_create()
> call in open_namei().  The count needs to be kept elevated
> all the way into the may_open() call.  Otherwise, when the
> write is dropped, a ro->rw transisition could occur.  This
> would lead to having rw access on the newly created file,
> while the vfsmount is ro.  That is bad.

Looks good.

> 
> Some filesystems forego the use of normal vfs calls to create
> struct files.  Make sure that these users elevate the mnt writer
> count because they will get __fput(), and we need to make
> sure they're balanced.

With this you mean ipc/mqueue.c?  You should probably mention
that single bastard child of a wannabe-filesystem explicitly :)

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/26] r/o bind mounts: stub functions

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:09PM -0700, Dave Hansen wrote:
> 
> This patch adds two function mnt_want_write() and
> mnt_drop_write().  These are used like a lock pair around
> and fs operations that might cause a write to the filesystem.
> 
> Before these can become useful, we must first cover each
> place in the VFS where writes are performed with a
> want/drop pair.  When that is complete, we can actually
> introduce code that will safely check the counts before
> allowing r/w<->r/o transitions to occur.

Ok,

> 
> Note that we put the linux/fs.h #include as far down in
> mount.h as possible.  This is to keep the mount.h->fs.h->
> sched.h->mount.h include dependency from biting us.

except for this.  Please just move __mnt_is_readonly out of line
or make it a macro, but we surely don't want to include fs.h
in mount.h
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/26] filesystem helpers for custom 'struct file's

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:08PM -0700, Dave Hansen wrote:
> 
> Christoph H. says this stands on its own and can go in before the
> rest of the r/o bind mount set.  
> 
> ---
> 
> Some filesystems forego the vfs and may_open() and create their
> own 'struct file's.
> 
> This patch creates a couple of helper functions which can be
> used by these filesystems, and will provide a unified place
> which the r/o bind mount code may patch.
> 
> Also, rename two existing, static-scope init_file() to less
> generic names.

Looks good.  Again this should go in ASAP after 2.6.23 independent of
the more complicated bits because it's a nice cleanup.  In the end
get_empty_filp should go away as an exported symbol.

Note that we've grown more instances of the crap you're fixing here,
e.g. fs/anon_inode.c
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/26] ext4: remove extra IS_RDONLY() check

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:06PM -0700, Dave Hansen wrote:
> 
> ext4_change_inode_journal_flag() is only called from one
> location: ext4_ioctl(EXT3_IOC_SETFLAGS).  That ioctl
> case already has a IS_RDONLY() call in it so this one
> is superfluous.

Ditto.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/26] ext3: remove extra IS_RDONLY() check

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:05PM -0700, Dave Hansen wrote:
> 
> ext3_change_inode_journal_flag() is only called from one
> location: ext3_ioctl(EXT3_IOC_SETFLAGS).  That ioctl
> case already has a IS_RDONLY() call in it so this one
> is superfluous.

Ok, and stands nicely on it's own.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/26] document nlink function

2007-06-23 Thread Christoph Hellwig
On Fri, Jun 22, 2007 at 01:03:04PM -0700, Dave Hansen wrote:
> 
> These should have been documented from the beginning.  Fix it.

Ok.  This is a trivial doc fix and should go into the 2.6.22 queue IMHO.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html