Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-10-02 Thread Peter Zijlstra
On Mon, 2007-10-01 at 14:30 -0700, Andrew Morton wrote:
 On Mon, 1 Oct 2007 13:55:29 -0700 (PDT)
 Christoph Lameter [EMAIL PROTECTED] wrote:
 
  On Sat, 29 Sep 2007, Andrew Morton wrote:
  
atomic allocations. And with SLUB using higher order pages, atomic !0
order allocations will be very very common.
   
   Oh OK.
   
   I thought we'd already fixed slub so that it didn't do that.  Maybe that
   fix is in -mm but I don't think so.
   
   Trying to do atomic order-1 allocations on behalf of arbitray slab caches
   just won't fly - this is a significant degradation in kernel reliability,
   as you've very easily demonstrated.
  
  Ummm... SLAB also does order 1 allocations. We have always done them.
  
  See mm/slab.c
  
  /*
   * Do not go above this order unless 0 objects fit into the slab.
   */
  #define BREAK_GFP_ORDER_HI  1
  #define BREAK_GFP_ORDER_LO  0
  static int slab_break_gfp_order = BREAK_GFP_ORDER_LO;
 
 Do slab and slub use the same underlying page size for each slab?
 
 Single data point: the CONFIG_SLAB boxes which I have access to here are
 using order-0 for radix_tree_node, so they won't be failing in the way in
 which Peter's machine is.
 
 I've never ever before seen reports of page allocation failures in the
 radix-tree node allocation code, and that's the bottom line.  This is just
 a drop-dead must-fix show-stopping bug.  We cannot rely upon atomic order-1
 allocations succeeding so we cannot use them for radix-tree nodes.  Nor for
 lots of other things which we have no chance of identifying.
 
 Peter, is this bug -mm only, or is 2.6.23 similarly failing?

I'm mainly using -mm (so you have at least one tester :-), I think the
-mm specific SLUB patch that ups slub_min_order makes the problem -mm
specific, would have to test .23.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH 11/30] IGET: Stop EXT2 from using iget() and read_inode()

2007-10-02 Thread Jan Kara
 diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
 index 0079b2c..d8fb795 100644
 --- a/fs/ext2/inode.c
 +++ b/fs/ext2/inode.c
...
 +
 + raw_inode = ext2_get_inode(inode-i_sb, ino, bh);
   if (IS_ERR(raw_inode))
   goto bad_inode;
  Hmm, why don't you use the return value from raw_inode? It can be
either -EIO or -EINVAL if 'ino' was invalid...
  Otherwise the patch looks fine.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/30] IGET: Stop EXT3 from using iget() and read_inode()

2007-10-02 Thread Jan Kara
 diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
 index e45dbd6..c2f0a0d 100644
 --- a/fs/ext3/ialloc.c
 +++ b/fs/ext3/ialloc.c
 @@ -646,7 +646,7 @@ struct inode *ext3_orphan_get(struct super_block *sb, 
 unsigned long ino)
   unsigned long block_group;
   int bit;
   struct buffer_head *bitmap_bh = NULL;
 - struct inode *inode = NULL;
 + struct inode *inode = ERR_PTR(-EIO);
  
   /* Error cases - e2fsck has already cleaned up for us */
   if (ino  max_ino) {
 @@ -668,9 +668,14 @@ struct inode *ext3_orphan_get(struct super_block *sb, 
 unsigned long ino)
* is a valid orphan (no e2fsck run on fs).  Orphans also include
* inodes that were being truncated, so we can't check i_nlink==0.
*/
 - if (!ext3_test_bit(bit, bitmap_bh-b_data) ||
 - !(inode = iget(sb, ino)) || is_bad_inode(inode) ||
 - NEXT_ORPHAN(inode)  max_ino) {
 + if (ext3_test_bit(bit, bitmap_bh-b_data))
 + goto out;
 +
 + inode = ext3_iget(sb, ino);
 + if (IS_ERR(inode))
 + goto out;
 +
 + if (NEXT_ORPHAN(inode)  max_ino) {
   ext3_warning(sb, __FUNCTION__,
bad orphan inode %lu!  e2fsck was run?, ino);
   printk(KERN_NOTICE ext3_test_bit(bit=%d, block=%llu) = %d\n,
  But if you 'goto out' in some branches, we loose the ext3_warning()
which we probably don't want.

 diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
 index c1fa190..78bfab5 100644
 --- a/fs/ext3/namei.c
 +++ b/fs/ext3/namei.c
 @@ -1046,17 +1046,11 @@ static struct dentry *ext3_lookup(struct inode * dir, 
 struct dentry *dentry, str
   if (!ext3_valid_inum(dir-i_sb, ino)) {
   ext3_error(dir-i_sb, ext3_lookup,
  bad inode number: %lu, ino);
 - inode = NULL;
 - } else
 - inode = iget(dir-i_sb, ino);
 -
 - if (!inode)
   return ERR_PTR(-EACCES);
  Wouldn't here -EIO be more appropriate?

 @@ -1085,18 +1079,13 @@ struct dentry *ext3_get_parent(struct dentry *child)
   if (!ext3_valid_inum(child-d_inode-i_sb, ino)) {
   ext3_error(child-d_inode-i_sb, ext3_get_parent,
  bad inode number: %lu, ino);
 - inode = NULL;
 - } else
 - inode = iget(child-d_inode-i_sb, ino);
 -
 - if (!inode)
   return ERR_PTR(-EACCES);
  And similarly here...

 @@ -1415,6 +1413,7 @@ static int ext3_fill_super (struct super_block *sb, 
 void *data, int silent)
   int db_count;
   int i;
   int needs_recovery;
 + int ret = -EINVAL;
   __le32 features;
  
   sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
 @@ -1770,19 +1769,25 @@ static int ext3_fill_super (struct super_block *sb, 
 void *data, int silent)
* so we can safely mount the rest of the filesystem now.
*/
  
 - root = iget(sb, EXT3_ROOT_INO);
 - sb-s_root = d_alloc_root(root);
 - if (!sb-s_root) {
 + root = ext3_iget(sb, EXT3_ROOT_INO);
 + if (IS_ERR(root)) {
   printk(KERN_ERR EXT3-fs: get root inode failed\n);
 - iput(root);
 + if (PTR_ERR(root) == -ENOMEM)
 + ret = -ENOMEM;
  Why don't we use PTR_ERR() always? Is there some reason not to return
-EIO?

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/30] IGET: Stop EXT3 from using iget() and read_inode()

2007-10-02 Thread Jan Kara
  And one more thing...

 diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
 index e45dbd6..c2f0a0d 100644
 --- a/fs/ext3/ialloc.c
 +++ b/fs/ext3/ialloc.c
 @@ -646,7 +646,7 @@ struct inode *ext3_orphan_get(struct super_block *sb, 
 unsigned long ino)
   unsigned long block_group;
   int bit;
   struct buffer_head *bitmap_bh = NULL;
 - struct inode *inode = NULL;
 + struct inode *inode = ERR_PTR(-EIO);
  
   /* Error cases - e2fsck has already cleaned up for us */
   if (ino  max_ino) {
 @@ -668,9 +668,14 @@ struct inode *ext3_orphan_get(struct super_block *sb, 
 unsigned long ino)
* is a valid orphan (no e2fsck run on fs).  Orphans also include
* inodes that were being truncated, so we can't check i_nlink==0.
*/
 - if (!ext3_test_bit(bit, bitmap_bh-b_data) ||
 - !(inode = iget(sb, ino)) || is_bad_inode(inode) ||
 - NEXT_ORPHAN(inode)  max_ino) {
 + if (ext3_test_bit(bit, bitmap_bh-b_data))
 + goto out;
  You inverted the test here, didn't you?


Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/30] IGET: Stop EXT4 from using iget() and read_inode()

2007-10-02 Thread Jan Kara
 diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
 index 427f830..8bb63f2 100644
 --- a/fs/ext4/ialloc.c
 +++ b/fs/ext4/ialloc.c
 @@ -682,9 +682,14 @@ struct inode *ext4_orphan_get(struct super_block *sb, 
 unsigned long ino)
* is a valid orphan (no e2fsck run on fs).  Orphans also include
* inodes that were being truncated, so we can't check i_nlink==0.
*/
 - if (!ext4_test_bit(bit, bitmap_bh-b_data) ||
 - !(inode = iget(sb, ino)) || is_bad_inode(inode) ||
 - NEXT_ORPHAN(inode)  max_ino) {
 + if (ext4_test_bit(bit, bitmap_bh-b_data))
 + goto out;
 +
 + inode = ext4_iget(sb, ino);
 + if (IS_ERR(inode))
 + goto out;
 +
 + if (NEXT_ORPHAN(inode)  max_ino) {
   ext4_warning(sb, __FUNCTION__,
bad orphan inode %lu!  e2fsck was run?, ino);
   printk(KERN_NOTICE ext4_test_bit(bit=%d, block=%llu) = %d\n,
  Same comments as for ext3 - I think you reversed ext4_test_bit() test
and also the warning won't be issued in some cases which is wrong.

 diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
 index 5fdb862..af98d07 100644
 --- a/fs/ext4/namei.c
 +++ b/fs/ext4/namei.c
 @@ -1044,17 +1044,11 @@ static struct dentry *ext4_lookup(struct inode * dir, 
 struct dentry *dentry, str
   if (!ext4_valid_inum(dir-i_sb, ino)) {
   ext4_error(dir-i_sb, ext4_lookup,
  bad inode number: %lu, ino);
 - inode = NULL;
 - } else
 - inode = iget(dir-i_sb, ino);
 -
 - if (!inode)
   return ERR_PTR(-EACCES);
  -EIO more appropriate here?


Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/30] IGET: Stop BEFS from using iget() and read_inode()

2007-10-02 Thread David Howells
Zach Brown [EMAIL PROTECTED] wrote:

 /* haha, continuing the fine tradition of terrible names in this  api..  */
 static inline void *PTR_PTR(void *err_ptr) {
   BUG_ON(!IS_ERR(err_ptr) || !err_ptr);
   return err_ptr;
 }

How about ERR_CAST() instead?  Or maybe CAST_ERR()?

struct dentry *blah(...)
{
struct inode *inode;

inode = thing(...);
if (IS_ERR(inode))
return ERR_CAST(inode);
}

Where ERR_CAST is defined as:

static inline void *ERR_CAST(const void *error)
{
return (void *) x;
}

David
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/30] IGET: Stop EXT2 from using iget() and read_inode()

2007-10-02 Thread David Howells
Jan Kara [EMAIL PROTECTED] wrote:

   Hmm, why don't you use the return value from raw_inode? It can be
 either -EIO or -EINVAL if 'ino' was invalid...

Good point.  Altered.

David
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/30] IGET: Stop BEFS from using iget() and read_inode()

2007-10-02 Thread Dave Kleikamp
On Tue, 2007-10-02 at 13:32 +0100, David Howells wrote:
 Zach Brown [EMAIL PROTECTED] wrote:
 
  /* haha, continuing the fine tradition of terrible names in this  api..  */
  static inline void *PTR_PTR(void *err_ptr) {
  BUG_ON(!IS_ERR(err_ptr) || !err_ptr);
  return err_ptr;
  }
 
 How about ERR_CAST() instead?  Or maybe CAST_ERR()?

It's a better name than PTR_PTR(). :-)
 
   struct dentry *blah(...)
   {
   struct inode *inode;
 
   inode = thing(...);
   if (IS_ERR(inode))
   return ERR_CAST(inode);
   }
 
 Where ERR_CAST is defined as:
 
   static inline void *ERR_CAST(const void *error)
   {
   return (void *) x;
   }

Of course, the cast is unnecessary, and I'm sure you meant to return
error:

static inline void *ERR_CAST(const void *error)
{
return error;
}

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/30] IGET: Stop EXT3 from using iget() and read_inode()

2007-10-02 Thread David Howells
Jan Kara [EMAIL PROTECTED] wrote:

   But if you 'goto out' in some branches, we loose the ext3_warning()
 which we probably don't want.

Ugh.  Okay, I need to rework the changes to that function.

  return ERR_PTR(-EACCES);
   Wouldn't here -EIO be more appropriate?

I would have thought so, but -EACCES was what it returned before I touched it.
OTOH, it's calling ext3_error(), so EIO ought to be the right thing to do.
I'll alter it and see if anyone complains.

   Why don't we use PTR_ERR() always? Is there some reason not to return
 -EIO?

I do wonder why it used to return EINVAL rather than EIO.  It's understandable
if the magic number doesn't match, but if it appears to be an otherwise
corrupt filesystem, then yes, I guess it should return EIO.  I'll change it.

David
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/30] IGET: Stop BEFS from using iget() and read_inode()

2007-10-02 Thread David Howells
Dave Kleikamp [EMAIL PROTECTED] wrote:

 Of course, the cast is unnecessary,

The cast is necessary as the argument is a const pointer and the return type
is not.

 and I'm sure you meant to return error:

Oops.  Yes, I changed my mind and renamed the argument to be 'error', but
forgot to change the reference to it.

David
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/30] IGET: Stop EXT4 from using iget() and read_inode()

2007-10-02 Thread David Howells
Jan Kara [EMAIL PROTECTED] wrote:

   Same comments as for ext3 - I think you reversed ext4_test_bit() test
 and also the warning won't be issued in some cases which is wrong.

Same replies as for ext3 too.  Blech.

You should find altered patches landing in your inbox if you'd care to inspect
them.

Thanks,

David
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


batching support for transactions

2007-10-02 Thread Ric Wheeler


After several years of helping tune file systems for normal (ATA/S-ATA) 
drives, we have been doing some performance work on ext3  reiserfs on 
disk arrays.


One thing that jumps out is that the way we currently batch synchronous 
work loads into transactions does really horrible things to performance 
for storage devices which have really low latency.


For example, one a mid-range clariion box, we can use a single thread to 
write around 750 (10240 byte) files/sec to a single directory in ext3. 
That gives us an average time around 1.3ms per file.


With 2 threads writing to the same directory, we instantly drop down to 
234 files/sec.


The culprit seems to be the assumptions in journal_stop() which throw in 
a call to schedule_timeout_uninterruptible(1):


/*
 * Implement synchronous transaction batching.  If the handle
 * was synchronous, don't force a commit immediately.  Let's
 * yield and let another thread piggyback onto this transaction.
 * Keep doing that while new threads continue to arrive.
 * It doesn't cost much - we're about to run a commit and sleep
 * on IO anyway.  Speeds up many-threaded, many-dir operations
 * by 30x or more...
 *
 * But don't do this if this process was the most recent one to
 * perform a synchronous write.  We do this to detect the case 
where a
 * single process is doing a stream of sync writes.  No point 
in waiting

 * for joiners in that case.
 */
pid = current-pid;
if (handle-h_sync  journal-j_last_sync_writer != pid) {
journal-j_last_sync_writer = pid;
do {
old_handle_count = transaction-t_handle_count;
schedule_timeout_uninterruptible(1);
} while (old_handle_count != transaction-t_handle_count);
}


reiserfs and ext4 have similar if not exactly the same logic.

What seems to be needed here is either a static per file system/storage 
device tunable to allow us to change this timeout (maybe with 0 
defaulting back to the old reiserfs trick of simply doing a yield()?) or 
a more dynamic, per device way to keep track of the average time it 
takes to commit a transaction to disk. Based on that rate, we could 
dynamically adjust our logic to account for lower latency devices.


A couple of last thoughts. One, if for some reason you don't have a low 
latency storage array handy and want to test this for yourselves, you 
can test the worst case by using a ram disk.


The test we used was fs_mark with 10240 bytes files, writing to one 
shared directory with varying the numbers of threads from 1 up to 40. In 
the ext3 case, it takes 8 concurrent threads to catch up to the single 
thread writing case.


We are continuing to play with the code and try out some ideas, but I 
wanted to bounce this off the broader list to see if this makes sense...


ric


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/30] IGET: Stop BEFS from using iget() and read_inode()

2007-10-02 Thread Dave Kleikamp
On Tue, 2007-10-02 at 14:24 +0100, David Howells wrote:
 Dave Kleikamp [EMAIL PROTECTED] wrote:
 
  Of course, the cast is unnecessary,
 
 The cast is necessary as the argument is a const pointer and the return type
 is not.

Ah yes.  I stand corrected.

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/2] getattr - fill the size of pipes

2007-10-02 Thread Jan Engelhardt

[PATCH]: Fill the size of pipes

Instead of reporting 0 in size when stating() a pipe, we give the number of
queued bytes. This might avoid using ioctl(FIONREAD) to get this information.

References and derived from: http://lkml.org/lkml/2007/4/2/138
Cc: Eric Dumazet [EMAIL PROTECTED]
Signed-off-by: Jan Engelhardt [EMAIL PROTECTED]

---
 fs/pipe.c |   49 -
 1 file changed, 36 insertions(+), 13 deletions(-)

Index: linux-2.6.23/fs/pipe.c
===
--- linux-2.6.23.orig/fs/pipe.c
+++ linux-2.6.23/fs/pipe.c
@@ -577,27 +577,35 @@ bad_pipe_w(struct file *filp, const char
return -EBADF;
 }
 
+static unsigned int pipe_size(struct inode *inode)
+{
+   struct pipe_inode_info *pipe;
+   unsigned int count, buf;
+   int nrbufs;
+
+   mutex_lock(inode-i_mutex);
+   pipe   = inode-i_pipe;
+   count  = 0;
+   buf= pipe-curbuf;
+   nrbufs = pipe-nrbufs;
+   while (--nrbufs = 0) {
+   count += pipe-bufs[buf].len;
+   buf = (buf + 1)  (PIPE_BUFFERS - 1);
+   }
+   mutex_unlock(inode-i_mutex);
+   return count;
+}
+
 static int
 pipe_ioctl(struct inode *pino, struct file *filp,
   unsigned int cmd, unsigned long arg)
 {
struct inode *inode = filp-f_path.dentry-d_inode;
-   struct pipe_inode_info *pipe;
-   int count, buf, nrbufs;
+   unsigned int count;
 
switch (cmd) {
case FIONREAD:
-   mutex_lock(inode-i_mutex);
-   pipe = inode-i_pipe;
-   count = 0;
-   buf = pipe-curbuf;
-   nrbufs = pipe-nrbufs;
-   while (--nrbufs = 0) {
-   count += pipe-bufs[buf].len;
-   buf = (buf+1)  (PIPE_BUFFERS-1);
-   }
-   mutex_unlock(inode-i_mutex);
-
+   count = pipe_size(inode);
return put_user(count, (int __user *)arg);
default:
return -EINVAL;
@@ -915,6 +923,20 @@ static struct dentry_operations pipefs_d
.d_dname= pipefs_dname,
 };
 
+int pipe_getattr(struct vfsmount *mnt, struct dentry *dentry,
+ struct kstat *stat)
+{
+   struct inode *inode = dentry-d_inode;
+
+   generic_fillattr(inode, stat);
+   stat-size = pipe_size(inode);
+   return 0;
+}
+
+const struct inode_operations pipe_inode_operations = {
+   .getattr = pipe_getattr,
+};
+
 static struct inode * get_pipe_inode(void)
 {
struct inode *inode = new_inode(pipe_mnt-mnt_sb);
@@ -928,6 +950,7 @@ static struct inode * get_pipe_inode(voi
goto fail_iput;
inode-i_pipe = pipe;
 
+   inode-i_op = pipe_inode_operations;
pipe-readers = pipe-writers = 1;
inode-i_fop = rdwr_pipe_fops;
 

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/2] getattr - fill the size of FIFOs

2007-10-02 Thread Jan Engelhardt

[PATCH]: Fill the size of FIFOs

Instead of reporting 0 in size when stating() a pipe, we give the number of
queued bytes. This might avoid using ioctl(FIONREAD) to get this information.

References: http://lkml.org/lkml/2007/4/2/138
Cc: Eric Dumazet [EMAIL PROTECTED]
Signed-off-by: Jan Engelhardt [EMAIL PROTECTED]

---
 fs/pipe.c |   20 +++-
 fs/stat.c |   14 ++
 include/linux/pipe_fs_i.h |2 ++
 3 files changed, 15 insertions(+), 21 deletions(-)

Index: linux-2.6.23/fs/pipe.c
===
--- linux-2.6.23.orig/fs/pipe.c
+++ linux-2.6.23/fs/pipe.c
@@ -577,14 +577,15 @@ bad_pipe_w(struct file *filp, const char
return -EBADF;
 }
 
-static unsigned int pipe_size(struct inode *inode)
+unsigned int pipe_size(struct inode *inode)
 {
struct pipe_inode_info *pipe;
unsigned int count, buf;
int nrbufs;
 
+   if ((pipe = inode-i_pipe) == NULL)
+   return 0;
mutex_lock(inode-i_mutex);
-   pipe   = inode-i_pipe;
count  = 0;
buf= pipe-curbuf;
nrbufs = pipe-nrbufs;
@@ -923,20 +924,6 @@ static struct dentry_operations pipefs_d
.d_dname= pipefs_dname,
 };
 
-int pipe_getattr(struct vfsmount *mnt, struct dentry *dentry,
- struct kstat *stat)
-{
-   struct inode *inode = dentry-d_inode;
-
-   generic_fillattr(inode, stat);
-   stat-size = pipe_size(inode);
-   return 0;
-}
-
-const struct inode_operations pipe_inode_operations = {
-   .getattr = pipe_getattr,
-};
-
 static struct inode * get_pipe_inode(void)
 {
struct inode *inode = new_inode(pipe_mnt-mnt_sb);
@@ -950,7 +937,6 @@ static struct inode * get_pipe_inode(voi
goto fail_iput;
inode-i_pipe = pipe;
 
-   inode-i_op = pipe_inode_operations;
pipe-readers = pipe-writers = 1;
inode-i_fop = rdwr_pipe_fops;
 
Index: linux-2.6.23/fs/stat.c
===
--- linux-2.6.23.orig/fs/stat.c
+++ linux-2.6.23/fs/stat.c
@@ -11,6 +11,7 @@
 #include linux/highuid.h
 #include linux/fs.h
 #include linux/namei.h
+#include linux/pipe_fs_i.h
 #include linux/security.h
 #include linux/syscalls.h
 #include linux/pagemap.h
@@ -46,11 +47,16 @@ int vfs_getattr(struct vfsmount *mnt, st
if (retval)
return retval;
 
-   if (inode-i_op-getattr)
-   return inode-i_op-getattr(mnt, dentry, stat);
+   if (inode-i_op-getattr) {
+   retval = inode-i_op-getattr(mnt, dentry, stat);
+   } else {
+   generic_fillattr(inode, stat);
+   retval = 0;
+   }
 
-   generic_fillattr(inode, stat);
-   return 0;
+   if (retval == 0  S_ISFIFO(inode-i_mode))
+   stat-size = pipe_size(inode);
+   return retval;
 }
 
 EXPORT_SYMBOL(vfs_getattr);
Index: linux-2.6.23/include/linux/pipe_fs_i.h
===
--- linux-2.6.23.orig/include/linux/pipe_fs_i.h
+++ linux-2.6.23/include/linux/pipe_fs_i.h
@@ -134,6 +134,8 @@ struct pipe_buf_operations {
memory allocation, whereas PIPE_BUF makes atomicity guarantees.  */
 #define PIPE_SIZE  PAGE_SIZE
 
+extern unsigned int pipe_size(struct inode *);
+
 /* Drop the inode semaphore and wait for a pipe event, atomically */
 void pipe_wait(struct pipe_inode_info *pipe);
 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] isofs: add +w bit for non-RR discs

2007-10-02 Thread Jan Engelhardt

Add %S_IWUGO bit for files on ISO-9660 filesystems without RockRidge
extensions. This allows one to modify the files right after copying,
without having to do an extra recursive chmod if `cp -p` or
`rsync -p` is used.

References: http://lkml.org/lkml/2007/4/1/164
Signed-off-by: Jan Engelhardt [EMAIL PROTECTED]

---
 fs/isofs/inode.c |   13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

Index: linux-2.6.23/fs/isofs/inode.c
===
--- linux-2.6.23.orig/fs/isofs/inode.c
+++ linux-2.6.23/fs/isofs/inode.c
@@ -360,12 +360,11 @@ static int parse_options(char *options, 
popt-check = 'u';  /* unset */
popt-nocompress = 0;
popt-blocksize = 1024;
-   popt-mode = S_IRUGO | S_IXUGO; /*
-* r-x for all.  The disc could
-* be shared with DOS machines so
-* virtually anything could be
-* a valid executable.
-*/
+   /*
+* +x bit for all. The disc could be shared with DOS machine, so
+* virtually anything could be a valid executable.
+*/
+   popt-mode = S_IRUGO | S_IWUSR | S_IXUGO;
popt-gid = 0;
popt-uid = 0;
popt-iocharset = NULL;
@@ -1235,7 +1234,7 @@ static void isofs_read_inode(struct inod
ei-i_file_format = isofs_file_normal;
 
if (de-flags[-high_sierra]  2) {
-   inode-i_mode = S_IRUGO | S_IXUGO | S_IFDIR;
+   inode-i_mode = S_IRUGO | S_IWUSR | S_IXUGO | S_IFDIR;
inode-i_nlink = 1; /*
 * Set to 1.  We know there are 2, but
 * the find utility tries to optimize
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-10-02 Thread Nick Piggin
On Tuesday 02 October 2007 07:01, Christoph Lameter wrote:
 On Sat, 29 Sep 2007, Peter Zijlstra wrote:
  On Fri, 2007-09-28 at 11:20 -0700, Christoph Lameter wrote:
   Really? That means we can no longer even allocate stacks for forking.
 
  I think I'm running with 4k stacks...

 4k stacks will never fly on an SGI x86_64 NUMA configuration given the
 additional data that may be kept on the stack. We are currently
 considering to go from 8k to 16k (or even 32k) to make things work. So
 having the ability to put the stacks in vmalloc space may be something to
 look at.

i386 and x86-64 already used 8K stacks for years and they have never
really been much problem before.

They only started failing when contiguous memory is getting used up
by other things, _even with_ those anti-frag patches in there.

Bottom line is that you do not use higher order allocations when you do
not need them.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html