Re: [2/3] 2.6.23-rc6: known regressions v2

2007-09-18 Thread Jan Kara
 FS
 
 Subject : hanging ext3 dbench tests
 References  : http://lkml.org/lkml/2007/9/11/176
 Last known good : ?
 Submitter   : Andy Whitcroft [EMAIL PROTECTED]
 Caused-By   : ?
 Handled-By  : ?
 Status  : under test -- unreproducible at present
  Yep... Hard to do anything until Andy is able to reproduce it at least
once more to gather needed info.

 Subject : umount triggers a warning in jfs and takes almost a minute
 References  : http://lkml.org/lkml/2007/9/4/73
 Last known good : ?
 Submitter   : Oliver Neukum [EMAIL PROTECTED]
 Caused-By   : ?
 Handled-By  : ?
 Status  : unknown
  I thought Shaggy asked Oliver about some details (and he did not
answer so far) so I'd assume Shaggy is handling this.


Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
 On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
  Here is the incremental small cleanup patch. 
  
  Remove kamlloc usages in jbd/jbd2 and consistently use 
  jbd_kmalloc/jbd2_malloc.
 
 Shouldn't we kill jbd_kmalloc instead?
 

It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
places to handle memory (de)allocation(page size) via kmalloc/kfree, so
in the future if we need to change memory allocation in jbd(e.g. not
using kmalloc or using different flag), we don't need to touch every
place in the jbd code calling jbd_kmalloc.

Regards,
Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ext4 devel interlock meeting minutes (September 17, 2007)

2007-09-18 Thread Avantika Mathur

Ext4 Developer Interlock Call: September 17, 2007 meeting minutes
Attendees: Mingming Cao, Andreas Dilger, Eric Sandeen, Dave Kleikamp, 
Ted Ts'o,

Avantika Mathur

Minutes can be accessed at: 
http://ext4.wiki.kernel.org/index.php/Ext4_Developer%27s_Conference_Call


PATCH QUEUE:
Reviewed patches in the patch queue, identifying those that are ready to 
be merged with 2.6.24-rc1


- Journal_Checksum: Not yet ready for mainline, Avantika and Girish are 
looking into kernel oops on fsstress.


- Uninitialized Block Groups:  Ready to be pushed upstream, Avantika 
will post this patch to lkml


- Large Block Support: Mingming tested these patches on various 
architectures.  There should be new testcases added to e2fsprogs to test 
large block sizes.  Andreas also described a possible error case with 
empty directories.  Otherwise the feature is ready to be pushed to lkml


- JBD Cleanups:  Ready to be pushed upstream

- Flex_BG Patches: should be pushed upstream asap.  The required flag in 
e2fsprogs is already reserved for this feature.


- Delalloc: Patch should be sent to lkml for detailed review, since not 
only ext4 code is being touched. Delalloc will be pushed with the other 
patches that are  being pushed upstream now.
   - Shaggy will look into whether the delalloc framework can be used 
for JFS
   - We will also try porting the patch to ext3, with a search/replace, 
and try testing on ext3.


- Mballoc: Not ready for 2.6.23. We will wait for the next release, as 
it hasn't been in the patch queue for very  long.  The main issue is to 
break the patch into smaller pieces making it easier to understand.  
This is tricky to break such a large patch into smaller patches.


- Mingming will re-order the patch queue, putting the patches that are 
ready to be pushed upstream at the top.


E2FSPROGS:
- Ted sent out his initial extents support patches to the mailing list; 
he is continuing on ToDo items which includes a lot of error checking, 
and waiting for feedback on the patches.   In his patches, any knowledge 
of the on-disk format of the filesystem is isolated to a small part of 
the library, making it easier to change format in the future.


- 64-bit support in e2fsprogs is background priority, while adding 
support for other ext4 features (e.g. unintialized block groups) is 
higher priority right now.  Some of Valerie's pre-work/cleanup patches 
are ready to be merged now.


- Avantika and Aneesh are working on automated testing of e2fsprogs.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext34: ensure do_split leaves enough free space in both blocks

2007-09-18 Thread Stephen C. Tweedie
Hi,

On Mon, 2007-09-17 at 11:06 -0500, Eric Sandeen wrote:
 The do_split() function for htree dir blocks is intended to split a
 leaf block to make room for a new entry.  It sorts the entries in the
 original block by hash value, then moves the last half of the entries to 
 the new block - without accounting for how much space this actually moves.  

Nasty.

 Also add a few comments to the functions involved.

A big improvement.  :)

 + /* Split the existing block in the middle, size-wise */
 + size = 0;
 + move = 0;
 + for (i = count-1; i = 0; i--) {
 + /* is more than half of this entry in 2nd half of the block? */
 + if (size + map[i].size/2  blocksize/2)
 + break;
 + size += map[i].size;
 + move++;
 + }
 + /* map index at which we will split */
 + split = count - move;

This is the guts of it, and I spent a while looking just to convince
myself it was getting things right in the corner case and was splitting
at _exactly_ the right point.  If we have large dirents and 1k
blocksize, even getting the split point off-by-one will still leave us
vulnerable to the bug.

But it all looks fine.  My only other comment would be that move is
redundant, you could lose it completely and just do

split = i+1;

but I think the logic's easier to understand the way it is. 

Nice catch.

--Stephen


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Dave Kleikamp
On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
 On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
  On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
   Here is the incremental small cleanup patch. 
   
   Remove kamlloc usages in jbd/jbd2 and consistently use 
   jbd_kmalloc/jbd2_malloc.
  
  Shouldn't we kill jbd_kmalloc instead?
  
 
 It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
 places to handle memory (de)allocation(page size) via kmalloc/kfree, so
 in the future if we need to change memory allocation in jbd(e.g. not
 using kmalloc or using different flag), we don't need to touch every
 place in the jbd code calling jbd_kmalloc.

I disagree.  Why would jbd need to globally change the way it allocates
memory?  It currently uses kmalloc (and jbd_kmalloc) for allocating a
variety of structures.  Having to change one particular instance won't
necessarily mean we want to change all of them.  Adding unnecessary
wrappers only obfuscates the code making it harder to understand.  You
wouldn't want every subsystem to have it's own *_kmalloc() that took
different arguments.  Besides, there aren't that many calls to kmalloc
and kfree in the jbd code, so there wouldn't be much pain in changing
GFP flags or whatever, if it ever needed to be done.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 13:04 -0500, Dave Kleikamp wrote:
 On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
  On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
   On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
Here is the incremental small cleanup patch. 

Remove kamlloc usages in jbd/jbd2 and consistently use 
jbd_kmalloc/jbd2_malloc.
   
   Shouldn't we kill jbd_kmalloc instead?
   
  
  It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
  places to handle memory (de)allocation(page size) via kmalloc/kfree, so
  in the future if we need to change memory allocation in jbd(e.g. not
  using kmalloc or using different flag), we don't need to touch every
  place in the jbd code calling jbd_kmalloc.
 
 I disagree.  Why would jbd need to globally change the way it allocates
 memory?  It currently uses kmalloc (and jbd_kmalloc) for allocating a
 variety of structures.  Having to change one particular instance won't
 necessarily mean we want to change all of them.  Adding unnecessary
 wrappers only obfuscates the code making it harder to understand.  You
 wouldn't want every subsystem to have it's own *_kmalloc() that took
 different arguments.  Besides, there aren't that many calls to kmalloc
 and kfree in the jbd code, so there wouldn't be much pain in changing
 GFP flags or whatever, if it ever needed to be done.
 
 Shaggy

Okay, Points taken, Here is the updated patch to get rid of slab
management and jbd_kmalloc from jbd totally. This patch is intend to
replace the patch in mm tree, Andrew, could you pick up this one
instead?

Thanks,

Mingming


jbd/jbd2: JBD memory allocation cleanups

From: Christoph Lameter [EMAIL PROTECTED]

JBD: Replace slab allocations with page cache allocations

JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator 
pages instead. This will also prepare JBD for the large blocksize patchset.


Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]

---
 fs/jbd/commit.c   |6 +--
 fs/jbd/journal.c  |   99 ++
 fs/jbd/transaction.c  |   12 +++---
 fs/jbd2/commit.c  |6 +--
 fs/jbd2/journal.c |   99 ++
 fs/jbd2/transaction.c |   18 -
 include/linux/jbd.h   |   18 +
 include/linux/jbd2.h  |   21 +-
 8 files changed, 52 insertions(+), 227 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-18 17:19:01.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-18 17:51:21.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in-b_frozen_data) {
-   jbd_slab_free(tmp, bh_in-b_size);
+   jbd_free(tmp, bh_in-b_size);
goto repeat;
}
 
@@ -654,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao [EMAIL PROTECTED] wrote:

 JBD: Replace slab allocations with page cache allocations
 
 JBD allocate memory for committed_data and frozen_data from slab. However
 JBD should not pass slab pages down to the block layer. Use page allocator 
 pages instead. This will also prepare JBD for the large blocksize patchset.
 
 
 Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

__GFP_NOFAIL should only be used when we have no way of recovering
from failure.  The allocation in journal_init_common() (at least)
_can_ recover and hence really shouldn't be using __GFP_NOFAIL.

(Actually, nothing in the kernel should be using __GFP_NOFAIL.  It is 
there as a marker which says we really shouldn't be doing this but
we don't know how to fix it).

So sometime it'd be good if you could review all the __GFP_NOFAILs in
there and see if we can remove some, thanks.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ext4: Uninitialized Block Groups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur [EMAIL PROTECTED] wrote:

 +
 +__u16 crc16(__u16 crc, __u8 const *buffer, size_t len)

And is we really really have to do this, then the ext4-private crc16() 
should have static scope.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 16/22] ext34: ensure do_split leaves enough free space in both blocks

2007-09-18 Thread akpm
From: Eric Sandeen [EMAIL PROTECTED]

The do_split() function for htree dir blocks is intended to split a leaf
block to make room for a new entry.  It sorts the entries in the original
block by hash value, then moves the last half of the entries to the new
block - without accounting for how much space this actually moves.  (IOW,
it moves half of the entry *count* not half of the entry *space*).  If by
chance we have both large  small entries, and we move only the smallest
entries, and we have a large new entry to insert, we may not have created
enough space for it.

The patch below stores each record size when calculating the dx_map, and
then walks the hash-sorted dx_map, calculating how many entries must be
moved to more evenly split the existing entries between the old block and
the new block, guaranteeing enough space for the new entry.

The dx_map offs member is reduced to u16 so that the overall map size
does not change - it is temporarily stored at the end of the new block, and
if it grows too large it may be overwritten.  By making offs and size both
u16, we won't grow the map size.

Also add a few comments to the functions involved.

This fixes the testcase reported by [EMAIL PROTECTED] on the
linux-ext4 list, ext3 dir_index causes an error

Thanks to Andreas Dilger for discussing the problem  solution with me.

Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
Tested-by: Junjiro Okajima [EMAIL PROTECTED]
Cc: Theodore Ts'o [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Cc: [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 fs/ext3/namei.c |   39 +++
 fs/ext4/namei.c |   39 +++
 2 files changed, 70 insertions(+), 8 deletions(-)

diff -puN 
fs/ext3/namei.c~ext34-ensure-do_split-leaves-enough-free-space-in-both-blocks 
fs/ext3/namei.c
--- 
a/fs/ext3/namei.c~ext34-ensure-do_split-leaves-enough-free-space-in-both-blocks
+++ a/fs/ext3/namei.c
@@ -140,7 +140,8 @@ struct dx_frame
 struct dx_map_entry
 {
u32 hash;
-   u32 offs;
+   u16 offs;
+   u16 size;
 };
 
 #ifdef CONFIG_EXT3_INDEX
@@ -697,6 +698,10 @@ errout:
  * Directory block splitting, compacting
  */
 
+/*
+ * Create map of hash values, offsets, and sizes, stored at end of block.
+ * Returns number of entries mapped.
+ */
 static int dx_make_map (struct ext3_dir_entry_2 *de, int size,
struct dx_hash_info *hinfo, struct dx_map_entry 
*map_tail)
 {
@@ -710,7 +715,8 @@ static int dx_make_map (struct ext3_dir_
ext3fs_dirhash(de-name, de-name_len, h);
map_tail--;
map_tail-hash = h.hash;
-   map_tail-offs = (u32) ((char *) de - base);
+   map_tail-offs = (u16) ((char *) de - base);
+   map_tail-size = le16_to_cpu(de-rec_len);
count++;
cond_resched();
}
@@ -720,6 +726,7 @@ static int dx_make_map (struct ext3_dir_
return count;
 }
 
+/* Sort map by hash value */
 static void dx_sort_map (struct dx_map_entry *map, unsigned count)
 {
 struct dx_map_entry *p, *q, *top = map + count - 1;
@@ -1117,6 +1124,10 @@ static inline void ext3_set_de_type(stru
 }
 
 #ifdef CONFIG_EXT3_INDEX
+/*
+ * Move count entries from end of map between two memory locations.
+ * Returns pointer to last entry moved.
+ */
 static struct ext3_dir_entry_2 *
 dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count)
 {
@@ -1135,6 +1146,10 @@ dx_move_dirents(char *from, char *to, st
return (struct ext3_dir_entry_2 *) (to - rec_len);
 }
 
+/*
+ * Compact each dir entry in the range to the minimal rec_len.
+ * Returns pointer to last entry in range.
+ */
 static struct ext3_dir_entry_2* dx_pack_dirents(char *base, int size)
 {
struct ext3_dir_entry_2 *next, *to, *prev, *de = (struct 
ext3_dir_entry_2 *) base;
@@ -1157,6 +1172,11 @@ static struct ext3_dir_entry_2* dx_pack_
return prev;
 }
 
+/*
+ * Split a full leaf block to make room for a new dir entry.
+ * Allocate a new block, and move entries so that they are approx. equally 
full.
+ * Returns pointer to de in block into which the new entry will be inserted.
+ */
 static struct ext3_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
struct buffer_head **bh,struct dx_frame *frame,
struct dx_hash_info *hinfo, int *error)
@@ -1168,7 +1188,7 @@ static struct ext3_dir_entry_2 *do_split
u32 hash2;
struct dx_map_entry *map;
char *data1 = (*bh)-b_data, *data2;
-   unsigned split;
+   unsigned split, move, size, i;
struct ext3_dir_entry_2 *de = NULL, *de2;
int err = 0;
 
@@ -1196,8 +1216,19 @@ static struct ext3_dir_entry_2 *do_split
count = dx_make_map ((struct ext3_dir_entry_2 *) data1,