Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao
On Wed, 2007-09-19 at 13:48 -0600, Andreas Dilger wrote:
> On Sep 19, 2007  12:15 -0700, Mingming Cao wrote:
> > @@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
> >  
> >  alloc_transaction:
> > if (!journal->j_running_transaction) {
> > -   new_transaction = kmalloc(sizeof(*new_transaction),
> > -   GFP_NOFS|__GFP_NOFAIL);
> > +   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
> 
> This should probably be a __GFP_NOFAIL if we are trying to start a new
> handle in truncate, as there is no way to propagate an error to the caller.
> 

Thanks, updated version.

Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2, most cases
they are not needed.

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd/journal.c  |2 +-
 fs/jbd2/journal.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:47:58.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 14:23:45.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:48:14.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-19 14:23:45.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao
On Wed, 2007-09-19 at 19:28 +, Dave Kleikamp wrote:
> On Wed, 2007-09-19 at 14:26 -0500, Dave Kleikamp wrote:
> > On Wed, 2007-09-19 at 12:15 -0700, Mingming Cao wrote:
> > 
> > > Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
> > > cases except one handles memory allocation failure so I get rid of those
> > > GFP_NOFAIL flags.
> > > 
> > > Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
> > > in jbd/jbd2? I will send a separate patch to cleanup that.
> > 
> > No.  GFP_NOFS avoids deadlock.  It prevents the allocation from making
> > recursive calls back into the file system that could end up blocking on
> > jbd code.
> 
> Oh, I see your patch now.  You mean use GFP_NOFS instead of
> GFP_KERNEL.  :-)  OK then.
> 

oops, I did mean what you say here.:-)

> > Shaggy

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Andreas Dilger
On Sep 19, 2007  12:15 -0700, Mingming Cao wrote:
> @@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
>  
>  alloc_transaction:
>   if (!journal->j_running_transaction) {
> - new_transaction = kmalloc(sizeof(*new_transaction),
> - GFP_NOFS|__GFP_NOFAIL);
> + new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);

This should probably be a __GFP_NOFAIL if we are trying to start a new
handle in truncate, as there is no way to propagate an error to the caller.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Dave Kleikamp
On Wed, 2007-09-19 at 14:26 -0500, Dave Kleikamp wrote:
> On Wed, 2007-09-19 at 12:15 -0700, Mingming Cao wrote:
> 
> > Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
> > cases except one handles memory allocation failure so I get rid of those
> > GFP_NOFAIL flags.
> > 
> > Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
> > in jbd/jbd2? I will send a separate patch to cleanup that.
> 
> No.  GFP_NOFS avoids deadlock.  It prevents the allocation from making
> recursive calls back into the file system that could end up blocking on
> jbd code.

Oh, I see your patch now.  You mean use GFP_NOFS instead of
GFP_KERNEL.  :-)  OK then.

> Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Dave Kleikamp
On Wed, 2007-09-19 at 12:15 -0700, Mingming Cao wrote:

> Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
> cases except one handles memory allocation failure so I get rid of those
> GFP_NOFAIL flags.
> 
> Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
> in jbd/jbd2? I will send a separate patch to cleanup that.

No.  GFP_NOFS avoids deadlock.  It prevents the allocation from making
recursive calls back into the file system that could end up blocking on
jbd code.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao
On Tue, 2007-09-18 at 19:19 -0700, Andrew Morton wrote:
> On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > JBD: Replace slab allocations with page cache allocations
> > 
> > JBD allocate memory for committed_data and frozen_data from slab. However
> > JBD should not pass slab pages down to the block layer. Use page allocator 
> > pages instead. This will also prepare JBD for the large blocksize patchset.
> > 
> > 
> > Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly
> 
> __GFP_NOFAIL should only be used when we have no way of recovering
> from failure.  The allocation in journal_init_common() (at least)
> _can_ recover and hence really shouldn't be using __GFP_NOFAIL.
> 
> (Actually, nothing in the kernel should be using __GFP_NOFAIL.  It is 
> there as a marker which says "we really shouldn't be doing this but
> we don't know how to fix it").
> 
> So sometime it'd be good if you could review all the __GFP_NOFAILs in
> there and see if we can remove some, thanks.

Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
cases except one handles memory allocation failure so I get rid of those
GFP_NOFAIL flags.

Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
in jbd/jbd2? I will send a separate patch to cleanup that.

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd/journal.c  |2 +-
 fs/jbd/transaction.c  |3 +--
 fs/jbd2/journal.c |2 +-
 fs/jbd2/transaction.c |3 +--
 4 files changed, 4 insertions(+), 6 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:47:58.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 11:48:40.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/transaction.c  2007-09-19 11:48:05.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/transaction.c   2007-09-19 11:49:10.0 
-0700
@@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal->j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
-   GFP_NOFS|__GFP_NOFAIL);
+   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
if (!new_transaction) {
ret = -ENOMEM;
goto out;
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:48:14.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-19 11:49:46.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd2/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/transaction.c 2007-09-19 11:48:08.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/transaction.c  2007-09-19 11:50:12.0 
-0700
@@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal->j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
-   GFP_NOFS|__GFP_NOFAIL);
+   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
if (!new_transaction) {
ret = -ENOMEM;
goto out;




-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao <[EMAIL PROTECTED]> wrote:

> JBD: Replace slab allocations with page cache allocations
> 
> JBD allocate memory for committed_data and frozen_data from slab. However
> JBD should not pass slab pages down to the block layer. Use page allocator 
> pages instead. This will also prepare JBD for the large blocksize patchset.
> 
> 
> Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

__GFP_NOFAIL should only be used when we have no way of recovering
from failure.  The allocation in journal_init_common() (at least)
_can_ recover and hence really shouldn't be using __GFP_NOFAIL.

(Actually, nothing in the kernel should be using __GFP_NOFAIL.  It is 
there as a marker which says "we really shouldn't be doing this but
we don't know how to fix it").

So sometime it'd be good if you could review all the __GFP_NOFAILs in
there and see if we can remove some, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 13:04 -0500, Dave Kleikamp wrote:
> On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
> > On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
> > > On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> > > > Here is the incremental small cleanup patch. 
> > > > 
> > > > Remove kamlloc usages in jbd/jbd2 and consistently use 
> > > > jbd_kmalloc/jbd2_malloc.
> > > 
> > > Shouldn't we kill jbd_kmalloc instead?
> > > 
> > 
> > It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
> > places to handle memory (de)allocation( > in the future if we need to change memory allocation in jbd(e.g. not
> > using kmalloc or using different flag), we don't need to touch every
> > place in the jbd code calling jbd_kmalloc.
> 
> I disagree.  Why would jbd need to globally change the way it allocates
> memory?  It currently uses kmalloc (and jbd_kmalloc) for allocating a
> variety of structures.  Having to change one particular instance won't
> necessarily mean we want to change all of them.  Adding unnecessary
> wrappers only obfuscates the code making it harder to understand.  You
> wouldn't want every subsystem to have it's own *_kmalloc() that took
> different arguments.  Besides, there aren't that many calls to kmalloc
> and kfree in the jbd code, so there wouldn't be much pain in changing
> GFP flags or whatever, if it ever needed to be done.
> 
> Shaggy

Okay, Points taken, Here is the updated patch to get rid of slab
management and jbd_kmalloc from jbd totally. This patch is intend to
replace the patch in mm tree, Andrew, could you pick up this one
instead?

Thanks,

Mingming


jbd/jbd2: JBD memory allocation cleanups

From: Christoph Lameter <[EMAIL PROTECTED]>

JBD: Replace slab allocations with page cache allocations

JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator 
pages instead. This will also prepare JBD for the large blocksize patchset.


Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>

---
 fs/jbd/commit.c   |6 +--
 fs/jbd/journal.c  |   99 ++
 fs/jbd/transaction.c  |   12 +++---
 fs/jbd2/commit.c  |6 +--
 fs/jbd2/journal.c |   99 ++
 fs/jbd2/transaction.c |   18 -
 include/linux/jbd.h   |   18 +
 include/linux/jbd2.h  |   21 +-
 8 files changed, 52 insertions(+), 227 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-18 17:19:01.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-18 17:51:21.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in->b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in->b_frozen_data) {
-   jbd_slab_free(tmp, bh_in->b_size);
+   jbd_free(tmp, bh_in->b_size);
goto repeat;
}
 
@@ -654,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb->s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avo

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Dave Kleikamp
On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
> On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
> > On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> > > Here is the incremental small cleanup patch. 
> > > 
> > > Remove kamlloc usages in jbd/jbd2 and consistently use 
> > > jbd_kmalloc/jbd2_malloc.
> > 
> > Shouldn't we kill jbd_kmalloc instead?
> > 
> 
> It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
> places to handle memory (de)allocation( in the future if we need to change memory allocation in jbd(e.g. not
> using kmalloc or using different flag), we don't need to touch every
> place in the jbd code calling jbd_kmalloc.

I disagree.  Why would jbd need to globally change the way it allocates
memory?  It currently uses kmalloc (and jbd_kmalloc) for allocating a
variety of structures.  Having to change one particular instance won't
necessarily mean we want to change all of them.  Adding unnecessary
wrappers only obfuscates the code making it harder to understand.  You
wouldn't want every subsystem to have it's own *_kmalloc() that took
different arguments.  Besides, there aren't that many calls to kmalloc
and kfree in the jbd code, so there wouldn't be much pain in changing
GFP flags or whatever, if it ever needed to be done.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
> On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> > Here is the incremental small cleanup patch. 
> > 
> > Remove kamlloc usages in jbd/jbd2 and consistently use 
> > jbd_kmalloc/jbd2_malloc.
> 
> Shouldn't we kill jbd_kmalloc instead?
> 

It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
places to handle memory (de)allocation(http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Christoph Hellwig
On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> Here is the incremental small cleanup patch. 
> 
> Remove kamlloc usages in jbd/jbd2 and consistently use 
> jbd_kmalloc/jbd2_malloc.

Shouldn't we kill jbd_kmalloc instead?

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao
On Mon, 2007-09-17 at 15:01 -0700, Badari Pulavarty wrote:
> On Mon, 2007-09-17 at 12:29 -0700, Mingming Cao wrote:
> > On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
> > > jbd/jbd2: Replace slab allocations with page cache allocations
> > > 
> > > From: Christoph Lameter <[EMAIL PROTECTED]>
> > > 
> > > JBD should not pass slab pages down to the block layer.
> > > Use page allocator pages instead. This will also prepare
> > > JBD for the large blocksize patchset.
> > > 
> > 
> > Currently memory allocation for committed_data(and frozen_buffer) for
> > bufferhead is done through jbd slab management, as Christoph Hellwig
> > pointed out that this is broken as jbd should not pass slab pages down
> > to IO layer. and suggested to use get_free_pages() directly.
> > 
> > The problem with this patch, as Andreas Dilger pointed today in ext4
> > interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
> > 1/3-1/2 page space. 
> > 
> > What was the originally intention to set up slabs for committed_data(and
> > frozen_buffer) in JBD? Why not using kmalloc?
> > 
> > Mingming
> 
> Looks good. Small suggestion is to get rid of all kmalloc() usages and
> consistently use jbd_kmalloc() or jbd2_kmalloc().
> 
> Thanks,
> Badari
> 

Here is the incremental small cleanup patch. 

Remove kamlloc usages in jbd/jbd2 and consistently use jbd_kmalloc/jbd2_malloc.


Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd/journal.c  |8 +---
 fs/jbd/revoke.c   |   12 ++--
 fs/jbd2/journal.c |8 +---
 fs/jbd2/revoke.c  |   12 ++--
 4 files changed, 22 insertions(+), 18 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-17 14:32:16.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-17 14:33:59.0 -0700
@@ -723,7 +723,8 @@ journal_t * journal_init_dev(struct bloc
journal->j_blocksize = blocksize;
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
-   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal->j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
@@ -777,7 +778,8 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
-   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal->j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
@@ -1157,7 +1159,7 @@ void journal_destroy(journal_t *journal)
iput(journal->j_inode);
if (journal->j_revoke)
journal_destroy_revoke(journal);
-   kfree(journal->j_wbuf);
+   jbd_kfree(journal->j_wbuf);
jbd_kfree(journal);
 }
 
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-17 14:32:22.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-17 14:35:13.0 -0700
@@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
journal->j_revoke->hash_shift = shift;
 
journal->j_revoke->hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal->j_revoke->hash_table) {
kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
journal->j_revoke = NULL;
@@ -231,7 +231,7 @@ int journal_init_revoke(journal_t *journ
 
journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
if (!journal->j_revoke_table[1]) {
-   kfree(journal->j_revoke_table[0]->hash_table);
+   jbd_kfree(journal->j_revoke_table[0]->hash_table);
kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
return -ENOMEM;
}
@@ -246,9 +246,9 @@ int journal_init_revoke(journal_t *journ
journal->j_revoke->hash_shift = shift;
 
journal->j_revoke->hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal->j_revoke->hash_table) {
-   kfree(journal->j_revoke_table[0]->hash_table);
+   jbd_kfree(journal->j_revoke_table[0]->hash_table);
   

Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Badari Pulavarty
On Mon, 2007-09-17 at 12:29 -0700, Mingming Cao wrote:
> On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
> > jbd/jbd2: Replace slab allocations with page cache allocations
> > 
> > From: Christoph Lameter <[EMAIL PROTECTED]>
> > 
> > JBD should not pass slab pages down to the block layer.
> > Use page allocator pages instead. This will also prepare
> > JBD for the large blocksize patchset.
> > 
> 
> Currently memory allocation for committed_data(and frozen_buffer) for
> bufferhead is done through jbd slab management, as Christoph Hellwig
> pointed out that this is broken as jbd should not pass slab pages down
> to IO layer. and suggested to use get_free_pages() directly.
> 
> The problem with this patch, as Andreas Dilger pointed today in ext4
> interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
> 1/3-1/2 page space. 
> 
> What was the originally intention to set up slabs for committed_data(and
> frozen_buffer) in JBD? Why not using kmalloc?
> 
> Mingming

Looks good. Small suggestion is to get rid of all kmalloc() usages and
consistently use jbd_kmalloc() or jbd2_kmalloc().

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Christoph Hellwig
On Mon, Sep 17, 2007 at 12:29:51PM -0700, Mingming Cao wrote:
> The problem with this patch, as Andreas Dilger pointed today in ext4
> interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
> 1/3-1/2 page space. 
> 
> What was the originally intention to set up slabs for committed_data(and
> frozen_buffer) in JBD? Why not using kmalloc?

kmalloc is using slabs :)

The intent was to avoid the wasted memory, but as we've repeated a gazillion
times wasted memory on a rather rare codepath doesn't really matter when
you just crash random storage drivers otherwise.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao
On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
> jbd/jbd2: Replace slab allocations with page cache allocations
> 
> From: Christoph Lameter <[EMAIL PROTECTED]>
> 
> JBD should not pass slab pages down to the block layer.
> Use page allocator pages instead. This will also prepare
> JBD for the large blocksize patchset.
> 

Currently memory allocation for committed_data(and frozen_buffer) for
bufferhead is done through jbd slab management, as Christoph Hellwig
pointed out that this is broken as jbd should not pass slab pages down
to IO layer. and suggested to use get_free_pages() directly.

The problem with this patch, as Andreas Dilger pointed today in ext4
interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
1/3-1/2 page space. 

What was the originally intention to set up slabs for committed_data(and
frozen_buffer) in JBD? Why not using kmalloc?

Mingming

> Tested on 2.6.23-rc6 with fsx runs fine.
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
> ---
>  fs/jbd/checkpoint.c   |2 
>  fs/jbd/commit.c   |6 +-
>  fs/jbd/journal.c  |  107 
> -
>  fs/jbd/transaction.c  |   10 ++--
>  fs/jbd2/checkpoint.c  |2 
>  fs/jbd2/commit.c  |6 +-
>  fs/jbd2/journal.c |  109 
> --
>  fs/jbd2/transaction.c |   18 
>  include/linux/jbd.h   |   23 +-
>  include/linux/jbd2.h  |   28 ++--
>  10 files changed, 83 insertions(+), 228 deletions(-)
> 
> Index: linux-2.6.23-rc5/fs/jbd/journal.c
> ===
> --- linux-2.6.23-rc5.orig/fs/jbd/journal.c2007-09-13 13:37:57.0 
> -0700
> +++ linux-2.6.23-rc5/fs/jbd/journal.c 2007-09-13 13:45:39.0 -0700
> @@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
> 
>  static int journal_convert_superblock_v1(journal_t *, journal_superblock_t 
> *);
>  static void __journal_abort_soft (journal_t *journal, int errno);
> -static int journal_create_jbd_slab(size_t slab_size);
> 
>  /*
>   * Helper function used to manage commit timeouts
> @@ -334,10 +333,10 @@ repeat:
>   char *tmp;
> 
>   jbd_unlock_bh_state(bh_in);
> - tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS);
> + tmp = jbd_alloc(bh_in->b_size, GFP_NOFS);
>   jbd_lock_bh_state(bh_in);
>   if (jh_in->b_frozen_data) {
> - jbd_slab_free(tmp, bh_in->b_size);
> + jbd_free(tmp, bh_in->b_size);
>   goto repeat;
>   }
> 
> @@ -679,7 +678,7 @@ static journal_t * journal_init_common (
>   /* Set up a default-sized revoke table for the new mount. */
>   err = journal_init_revoke(journal, JOURNAL_REVOKE_DEFAULT_HASH);
>   if (err) {
> - kfree(journal);
> + jbd_kfree(journal);
>   goto fail;
>   }
>   return journal;
> @@ -728,7 +727,7 @@ journal_t * journal_init_dev(struct bloc
>   if (!journal->j_wbuf) {
>   printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
>   __FUNCTION__);
> - kfree(journal);
> + jbd_kfree(journal);
>   journal = NULL;
>   goto out;
>   }
> @@ -782,7 +781,7 @@ journal_t * journal_init_inode (struct i
>   if (!journal->j_wbuf) {
>   printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
>   __FUNCTION__);
> - kfree(journal);
> + jbd_kfree(journal);
>   return NULL;
>   }
> 
> @@ -791,7 +790,7 @@ journal_t * journal_init_inode (struct i
>   if (err) {
>   printk(KERN_ERR "%s: Cannnot locate journal superblock\n",
>  __FUNCTION__);
> - kfree(journal);
> + jbd_kfree(journal);
>   return NULL;
>   }
> 
> @@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
>   }
>   }
> 
> - /*
> -  * Create a slab for this blocksize
> -  */
> - err = journal_create_jbd_slab(be32_to_cpu(sb->s_blocksize));
> - if (err)
> - return err;
> -
>   /* Let the recovery code check whether it needs to recover any
>* data from the journal. */
>   if (journal_recover(journal))
> @@ -1166,7 +1158,7 @@ void journal_destroy(journal_t *journal)
>   if (journal->j_revoke)
>   journal_destroy_revoke(journal);
>   kfree(journal->j_wbuf);
> - kfree(journal);
> + jbd_kfree(journal);
>  }
> 
> 
> @@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
>  }
> 
>  /*
> - * Simple support for retrying memory allocations.  Introduced to help to
> - * debug different VM deadlock avoidance strategies.
> - */
> -void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
> -{
> - return kmalloc(si

Re: [PATCH] JBD slab cleanups

2007-09-14 Thread Christoph Lameter
Thanks Mingming.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] JBD slab cleanups

2007-09-14 Thread Mingming Cao
jbd/jbd2: Replace slab allocations with page cache allocations

From: Christoph Lameter <[EMAIL PROTECTED]>

JBD should not pass slab pages down to the block layer.
Use page allocator pages instead. This will also prepare
JBD for the large blocksize patchset.

Tested on 2.6.23-rc6 with fsx runs fine.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd/checkpoint.c   |2 
 fs/jbd/commit.c   |6 +-
 fs/jbd/journal.c  |  107 -
 fs/jbd/transaction.c  |   10 ++--
 fs/jbd2/checkpoint.c  |2 
 fs/jbd2/commit.c  |6 +-
 fs/jbd2/journal.c |  109 --
 fs/jbd2/transaction.c |   18 
 include/linux/jbd.h   |   23 +-
 include/linux/jbd2.h  |   28 ++--
 10 files changed, 83 insertions(+), 228 deletions(-)

Index: linux-2.6.23-rc5/fs/jbd/journal.c
===
--- linux-2.6.23-rc5.orig/fs/jbd/journal.c  2007-09-13 13:37:57.0 
-0700
+++ linux-2.6.23-rc5/fs/jbd/journal.c   2007-09-13 13:45:39.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in->b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in->b_frozen_data) {
-   jbd_slab_free(tmp, bh_in->b_size);
+   jbd_free(tmp, bh_in->b_size);
goto repeat;
}
 
@@ -679,7 +678,7 @@ static journal_t * journal_init_common (
/* Set up a default-sized revoke table for the new mount. */
err = journal_init_revoke(journal, JOURNAL_REVOKE_DEFAULT_HASH);
if (err) {
-   kfree(journal);
+   jbd_kfree(journal);
goto fail;
}
return journal;
@@ -728,7 +727,7 @@ journal_t * journal_init_dev(struct bloc
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
journal = NULL;
goto out;
}
@@ -782,7 +781,7 @@ journal_t * journal_init_inode (struct i
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
return NULL;
}
 
@@ -791,7 +790,7 @@ journal_t * journal_init_inode (struct i
if (err) {
printk(KERN_ERR "%s: Cannnot locate journal superblock\n",
   __FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
return NULL;
}
 
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb->s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1166,7 +1158,7 @@ void journal_destroy(journal_t *journal)
if (journal->j_revoke)
journal_destroy_revoke(journal);
kfree(journal->j_wbuf);
-   kfree(journal);
+   jbd_kfree(journal);
 }
 
 
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
- */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size >> 11)
-
-static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
-static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   "jbd_1k", "jbd_2k", "jbd_4k", NULL, "jbd_8k"
-};
-
-static void journal_destroy_jbd_slabs(void)
-{
-   int i;
-
-   for (i = 0; i < JBD_MAX_SLABS; i++) {
-   if (jbd_slab[i])
-   kmem_cache_destroy(jbd_slab[i]);
-