Confused by journaling code in ext3_new_blocks()

2008-01-23 Thread Abhishek Rai
Hi,

I have the following question related to the journaling code in
ext3_new_blocks() function of fs/ext3/balloc.c, any help in this
regard will be greatly appreciated. The code snippet below is taken
from 2.6.24-rc8-mm1 but this code has changed little for a long time
so it's likely that any other recent kernel will also have very
similar code.

ext3_new_blocks()
- ext3_try_to_allocate_with_rsv()
- ext3_journal_dirty_metadata(handle, bitmap_bh)
...
allocated:
if ( /*sanity checking of newly allocated block numbers*/) {
ext3_error(...);
goto out;
}

out:
...
brelse(bitmap_bh);
return 0;


In the above code snippet, ext3_try_to_allocate_with_rsv() is marking
bitmap_bh as dirty metadata _before_ the sanity checks. I'm not
worried about the sanity checks as such, but it seems to me that if
the checks were to fail, we'd be committing the bitmap_bh updates to
disk even though it violates metadata sanity, or am I missing
something here ?

Thanks,
Abhishek
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: regression: 100% io-wait with 2.6.24-rcX

2008-01-23 Thread Martin Knoblauch
- Original Message 
 From: Alasdair G Kergon [EMAIL PROTECTED]
 To: Martin Knoblauch [EMAIL PROTECTED]
 Cc: Linus Torvalds [EMAIL PROTECTED]; Mel Gorman [EMAIL PROTECTED]; 
 Fengguang Wu [EMAIL PROTECTED]; Mike Snitzer [EMAIL PROTECTED]; Peter 
 Zijlstra [EMAIL PROTECTED]; [EMAIL PROTECTED]; Ingo Molnar [EMAIL 
 PROTECTED]; [EMAIL PROTECTED]; linux-ext4@vger.kernel.org 
 linux-ext4@vger.kernel.org; [EMAIL PROTECTED]; Jens Axboe [EMAIL 
 PROTECTED]; Milan Broz [EMAIL PROTECTED]; Neil Brown [EMAIL PROTECTED]
 Sent: Wednesday, January 23, 2008 12:40:52 AM
 Subject: Re: regression: 100% io-wait with 2.6.24-rcX
 
 On Tue, Jan 22, 2008 at 07:25:15AM -0800, Martin Knoblauch wrote:
[EMAIL PROTECTED] ~]# dmsetup table
  VolGroup00-LogVol02: 0 350945280 linear 104:2 67109248
  VolGroup00-LogVol01: 0 8388608 linear 104:2 418054528
  VolGroup00-LogVol00: 0 67108864 linear 104:2 384
 
 The IO should pass straight through simple linear targets like
 that without needing to get broken up, so I wouldn't expect those patches to
 make any difference in this particular case.
 

Alasdair,

 LVM/DM are off the hook :-) I converted one box to direct using partitions and 
the performance is the same disappointment as with LVM/DM. Thanks anyway for 
looking at my problem.

 I will move the discussion now to a new thread, targetting CCISS directly. 

Cheers
Martin


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ext4-online-defrag-free-space-fragmentation.patch uses do_fsync()

2008-01-23 Thread Theodore Ts'o

I was trying to build ext4 as a module, and ran into problems because
the online defrag patch is calling do_fsync() which is *not* an exported
symbol, and so can not be called from a module.

Looking at what the routine is doing, there's no reason to call
do_fsync(), and in fact depending on the journaling mode in use, it may
not force a journal commit, which seems to be the goal of the code.

Hence, I plan to merge the following fix into the the
defrag-free-space-fragmentation patch, unless there are any objections
from Takashi-San or Akira-San.

Regards,

- Ted

diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c
index 4ef3dc0..19d2cfd 100644
--- a/fs/ext4/defrag.c
+++ b/fs/ext4/defrag.c
@@ -632,8 +632,9 @@ static int ext4_ext_defrag_victim(struct file *target_filp,
}
 
/* Sync journal blocks before reservation */
-   if (do_fsync(target_filp, 0)) {
-   printk(KERN_ERR defrag: failed do_fsync\n);
+   ret = ext4_force_commit(sb);
+   if (ret) {
+   printk(KERN_ERR defrag: failed do_fsync (%d)\n, ret);
goto ERR;
}
}
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ext4-online-defrag-free-space-fragmentation.patch uses do_fsync()

2008-01-23 Thread Dave Kleikamp

 diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c
 index 4ef3dc0..19d2cfd 100644
 --- a/fs/ext4/defrag.c
 +++ b/fs/ext4/defrag.c
 @@ -632,8 +632,9 @@ static int ext4_ext_defrag_victim(struct file 
 *target_filp,
   }
 
   /* Sync journal blocks before reservation */
 - if (do_fsync(target_filp, 0)) {
 - printk(KERN_ERR defrag: failed do_fsync\n);
 + ret = ext4_force_commit(sb);
 + if (ret) {
 + printk(KERN_ERR defrag: failed do_fsync (%d)\n, ret);

I'd think you'd want to change the printk text as well.  defrag: failed
ext4_force_commit (%d)\n maybe?

   goto ERR;
   }
   }

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ext4-online-defrag-free-space-fragmentation.patch uses do_fsync()

2008-01-23 Thread Theodore Tso
On Wed, Jan 23, 2008 at 08:10:20AM -0600, Dave Kleikamp wrote:
 I'd think you'd want to change the printk text as well.  defrag: failed
 ext4_force_commit (%d)\n maybe?

mmm, good catch, thanks.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] Add new development flag to the ext4 filesystem

2008-01-23 Thread Theodore Tso
On Tue, Jan 22, 2008 at 09:55:37PM -0600, Eric Sandeen wrote:
 
 Overall, seems ok.  One other question though, when ext4 is a
 fully-fledged production filesystem, and the flag requirement is gone,
 what stops ext3 filesystems from being silently mounted as ext4, just as
 happened with ext4dev today?  

Nothing will prevent that, but in the long term we don't want ext4 to
be automatically adding new features flags anyway.  The only reason
why we were doing that was to encourage more people to test out ext4,
and for the convenience of the ABAT auto-testing.

So I'm assuming that before we remove this test, we will also be
fixing some of the automatic enablement of extents, etc., because that
sort of thing will be moved into e2fsprogs as part of tune2fs -O
ext4 or mke2fs -O ext4 or mkfs.ext4. 

If we do that, then the only downside of having ext3 filesystems run
under ext4 is the test matrix concern.  Since I'm still hoping that
some point in the future, fs/ext4 could subsume fs/ext3 so we don't
have to worry about bug fixes going into fs/ext2 AND fs/ext3 AND
fs/ext4, I have my own reasons for wanting that.  But I do understand
the concerns that maybe in the short term some distro's don't want to
do that.  So in that case I could see adding a you must have extents
test into ext4, if I distro has specific support concerns. But for
people who are running mainline kernel, I think it's actually a *good*
thing if fs/ext4 can mount and read and write to an ext3 filesystem
--- as long as it doesn't automatically turn on features behind the
user's back.

 + \ttestfs\n));
 
 help text doesn't match reality here, missing a _

Oops, thanks for catching that.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] Add new development flag to the ext4 filesystem

2008-01-23 Thread Theodore Tso
On Wed, Jan 23, 2008 at 11:04:35AM -0600, Eric Sandeen wrote:
 
 I just think that ext4.ko running ext3 filesystems needs to be under
 explicit control, and not something that happens, occasionally,
 accidentally, without the user/administrator requesting it.  Least
 surprise, and all that...
 

Well, most of the time that will happen, given that /etc/fstab
explicitly states which filesystem to use; and if the user doesn't
specify a filesystme type, mount will use blkid/vol_id, and if that
says ext3, then ext3 will be used.  The only time where it might not
happen without an explicit administrator request is on the root
filesystem automount and if you are using an initrd to mount the
filesystem (as all of the enterprise distros now do anyway), that
could be easily controlled form userspace as well.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix commit block write in JBD

2008-01-23 Thread Jan Kara
  Hi,

  the patch below fixes preparation of commit block in
journal_write_commit_record(). Obviously the bug doesn't really matter
since nobody reported it so far but let's cleanup the code... Andrew, could
you please queue it up? Thanks.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
---

Commit block is expected to have several copies of the header. Fix the
bug Andrew has spotted ages ago.

Signed-off-by: Jan Kara [EMAIL PROTECTED]

diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index 610264b..a69b240 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -116,9 +116,8 @@ static int journal_write_commit_record(journal_t *journal,
 
bh = jh2bh(descriptor);
 
-   /* AKPM: buglet - add `i' to tmp! */
for (i = 0; i  bh-b_size; i += 512) {
-   journal_header_t *tmp = (journal_header_t*)bh-b_data;
+   journal_header_t *tmp = (journal_header_t*)(bh-b_data+i);
tmp-h_magic = cpu_to_be32(JFS_MAGIC_NUMBER);
tmp-h_blocktype = cpu_to_be32(JFS_COMMIT_BLOCK);
tmp-h_sequence = cpu_to_be32(commit_transaction-t_tid);
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix commit block write in JBD2

2008-01-23 Thread Jan Kara
On Wed 23-01-08 20:09:43, Jan Kara wrote:
   Hi,
 
   the patch below fixes preparation of commit block in
 journal_write_commit_record(). Obviously the bug doesn't really matter
 since nobody reported it so far but let's cleanup the code... Andrew, could
 you please queue it up? Thanks.
  And the same fix for JBD2.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
---

Commit block is expected to have several copies of the header. Fix the
bug Andrew has spotted ages ago.

Signed-off-by: Jan Kara [EMAIL PROTECTED]

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 6986f33..ed61283 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -116,9 +116,8 @@ static int journal_write_commit_record(journal_t *journal,
 
bh = jh2bh(descriptor);
 
-   /* AKPM: buglet - add `i' to tmp! */
for (i = 0; i  bh-b_size; i += 512) {
-   journal_header_t *tmp = (journal_header_t*)bh-b_data;
+   journal_header_t *tmp = (journal_header_t*)(bh-b_data+i);
tmp-h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER);
tmp-h_blocktype = cpu_to_be32(JBD2_COMMIT_BLOCK);
tmp-h_sequence = cpu_to_be32(commit_transaction-t_tid);
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Confused by journaling code in ext3_new_blocks()

2008-01-23 Thread Jan Kara
  Hello,

 I have the following question related to the journaling code in
 ext3_new_blocks() function of fs/ext3/balloc.c, any help in this
 regard will be greatly appreciated. The code snippet below is taken
 from 2.6.24-rc8-mm1 but this code has changed little for a long time
 so it's likely that any other recent kernel will also have very
 similar code.
 
 ext3_new_blocks()
 - ext3_try_to_allocate_with_rsv()
 - ext3_journal_dirty_metadata(handle, bitmap_bh)
 ...
 allocated:
 if ( /*sanity checking of newly allocated block numbers*/) {
 ext3_error(...);
 goto out;
 }
 
 out:
 ...
 brelse(bitmap_bh);
 return 0;
 
 
 In the above code snippet, ext3_try_to_allocate_with_rsv() is marking
 bitmap_bh as dirty metadata _before_ the sanity checks. I'm not
 worried about the sanity checks as such, but it seems to me that if
 the checks were to fail, we'd be committing the bitmap_bh updates to
 disk even though it violates metadata sanity, or am I missing
 something here ?
  Actually, it depends. Because if sanity checks fail, we call
ext3_error() which remounts fs R/O or panics in most cases. Only if
errors=continue is set, we really get to committing the wrong bitmap
data. But I don't think this really is a problem - fsck has to be run
anyway and it rebuilds the bitmaps from scratch.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix commit block write in JBD2

2008-01-23 Thread Girish Shilamkar
Hi,
The loop was removed in journal checksum patch. There had been a
discussion (11 Jul 2007: [EXT4 set 8][PATCH 1/1]Add journal checksums)
about this part of code as checksum info was added to commit block. 

Regards,
Girish
On Wed, 2008-01-23 at 20:10 +0100, Jan Kara wrote:
 On Wed 23-01-08 20:09:43, Jan Kara wrote:
Hi,
  
the patch below fixes preparation of commit block in
  journal_write_commit_record(). Obviously the bug doesn't really matter
  since nobody reported it so far but let's cleanup the code... Andrew, could
  you please queue it up? Thanks.
   And the same fix for JBD2.
 
   Honza

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] Add new development flag to the ext4 filesystem

2008-01-23 Thread Andreas Dilger
On Jan 23, 2008  11:53 -0500, Theodore Tso wrote:
 Since I'm still hoping that
 some point in the future, fs/ext4 could subsume fs/ext3 so we don't
 have to worry about bug fixes going into fs/ext2 AND fs/ext3 AND
 fs/ext4, I have my own reasons for wanting that.

If any newbie kernel hacker wants a filesystem project, allowing ext4
to mount ext2 filesystems w/o a journal would be very useful.  I
suspect that a simple flag check in the ext4_journal_* wrappers of the
jbd2 functions would be enough in many cases.

One of the reasons to keep ext2 around is that ext3 cannot mount the
filesystem without a journal, and removing that limitation for ext4
would bring us one step closer to removing a ton of duplicate code.
Another reason for ext2 vs. ext3 was overhead from journaling, and
that could also be removed by allowing ext4 to mount ext2 filesystems
w/o a journal.

Maybe a good proposal for a Google Summer-of-Code project.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix commit block write in JBD2

2008-01-23 Thread Jan Kara
  Hi,

On Thu 24-01-08 02:48:41, Girish Shilamkar wrote:
   The loop was removed in journal checksum patch. There had been a
 discussion (11 Jul 2007: [EXT4 set 8][PATCH 1/1]Add journal checksums)
 about this part of code as checksum info was added to commit block. 
  Ah, OK, thanks for explanation. So for JBD2 we don't need the patch
anymore. But for JBD it's still needed.

Honza

 On Wed, 2008-01-23 at 20:10 +0100, Jan Kara wrote:
  On Wed 23-01-08 20:09:43, Jan Kara wrote:
 Hi,
   
 the patch below fixes preparation of commit block in
   journal_write_commit_record(). Obviously the bug doesn't really matter
   since nobody reported it so far but let's cleanup the code... Andrew, 
   could
   you please queue it up? Thanks.
And the same fix for JBD2.
  
  Honza
 
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 33/49] ext4: Add the journal checksum feature

2008-01-23 Thread Andrew Morton
 On Mon, 21 Jan 2008 22:02:12 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote:
 From: Girish Shilamkar [EMAIL PROTECTED]
 
 The journal checksum feature adds two new flags i.e
 JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.
 
 JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
 checksum for the blocks described by the descriptor blocks.
 Due to checksums, writing of the commit record no longer needs to be
 synchronous. Now commit record can be sent to disk without waiting for
 descriptor blocks to be written to disk. This behavior is controlled
 using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
 able to recover the journal with _ASYNC_COMMIT hence it is made
 incompat.
 The commit header has been extended to hold the checksum along with the
 type of the checksum.
 
 For recovery in pass scan checksums are verified to ensure the sanity
 and completeness(in case of _ASYNC_COMMIT) of every transaction.
 
  ...

 +static inline __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head 
 *bh)

unneeded inlining.

 +{
 + struct page *page = bh-b_page;
 + char *addr;
 + __u32 checksum;
 +
 + addr = kmap_atomic(page, KM_USER0);
 + checksum = crc32_be(crc32_sum,
 + (void *)(addr + offset_in_page(bh-b_data)), bh-b_size);
 + kunmap_atomic(addr, KM_USER0);
 +
 + return checksum;
 +}

Can this buffer actually be in highmem?

  static inline void write_tag_block(int tag_bytes, journal_block_tag_t *tag,
  unsigned long long block)

More unnecessary inlining.

 +/*
 + * jbd2_journal_clear_features () - Clear a given journal feature in the
 + *   superblock
 + * @journal: Journal to act on.
 + * @compat: bitmask of compatible features
 + * @ro: bitmask of features that force read-only mount
 + * @incompat: bitmask of incompatible features
 + *
 + * Clear a given journal feature as present on the
 + * superblock.  Returns true if the requested features could be reset.
 + */
 +int jbd2_journal_clear_features(journal_t *journal, unsigned long compat,
 + unsigned long ro, unsigned long incompat)
 +{
 + journal_superblock_t *sb;
 +
 + jbd_debug(1, Clear features 0x%lx/0x%lx/0x%lx\n,
 +   compat, ro, incompat);
 +
 + sb = journal-j_superblock;
 +
 + sb-s_feature_compat= ~cpu_to_be32(compat);
 + sb-s_feature_ro_compat = ~cpu_to_be32(ro);
 + sb-s_feature_incompat  = ~cpu_to_be32(incompat);
 +
 + return 1;
 +}
 +EXPORT_SYMBOL(jbd2_journal_clear_features);

Kernel usually returns 0 on success.  So we can return a useful errno on
failure.

 +/*
 + * calc_chksums calculates the checksums for the blocks described in the
 + * descriptor block.
 + */
 +static int calc_chksums(journal_t *journal, struct buffer_head *bh,
 + unsigned long *next_log_block, __u32 *crc32_sum)
 +{
 + int i, num_blks, err;
 + unsigned io_block;
 + struct buffer_head *obh;
 +
 + num_blks = count_tags(journal, bh);
 + /* Calculate checksum of the descriptor block. */
 + *crc32_sum = crc32_be(*crc32_sum, (void *)bh-b_data, bh-b_size);
 +
 + for (i = 0; i  num_blks; i++) {
 + io_block = (*next_log_block)++;

 unsigned - unsigned long.

Are all the types appropriate in here?

 + wrap(journal, *next_log_block);
 + err = jread(obh, journal, io_block);
 + if (err) {
 + printk(KERN_ERR JBD: IO error %d recovering block 
 + %u in log\n, err, io_block);
 + return 1;
 + } else {
 + *crc32_sum = crc32_be(*crc32_sum, (void *)obh-b_data,
 +  obh-b_size);
 + }
 + }
 + return 0;
 +}
 +
  static int do_one_pass(journal_t *journal,
   struct recovery_info *info, enum passtype pass)
  {
 @@ -328,6 +360,7 @@ static int do_one_pass(journal_t *journal,
   unsigned intsequence;
   int blocktype;
   int tag_bytes = journal_tag_bytes(journal);
 + __u32   crc32_sum = ~0; /* Transactional Checksums */
  
   /* Precompute the maximum metadata descriptors in a descriptor block */
   int MAX_BLOCKS_PER_DESC;
 @@ -419,9 +452,23 @@ static int do_one_pass(journal_t *journal,
   switch(blocktype) {
   case JBD2_DESCRIPTOR_BLOCK:
   /* If it is a valid descriptor block, replay it
 -  * in pass REPLAY; otherwise, just skip over the
 -  * blocks it describes. */
 +  * in pass REPLAY; if journal_checksums enabled, then
 +  * calculate checksums in PASS_SCAN, otherwise,
 +  * just skip over the blocks it describes. */
   if (pass != 

Re: [PATCH 23/49] Add buffer head related helper functions

2008-01-23 Thread Andrew Morton
 On Mon, 21 Jan 2008 22:02:02 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote:
 +}
 +EXPORT_SYMBOL(bh_uptodate_or_lock);
 +/**

Missing newline.

 + * bh_submit_read: Submit a locked buffer for reading
 + * @bh: struct buffer_head
 + *
 + * Returns a negative error
 + */
 +int bh_submit_read(struct buffer_head *bh)
 +{
 + if (!buffer_locked(bh))
 + lock_buffer(bh);
 +
 + if (buffer_uptodate(bh))
 + return 0;

Here it can lock the buffer then return zero

 + get_bh(bh);
 + bh-b_end_io = end_buffer_read_sync;
 + submit_bh(READ, bh);
 + wait_on_buffer(bh);
 + if (buffer_uptodate(bh))
 + return 0;

Here it will unlock the buffer and return zero.

This function is unusable when passed an unlocked buffer.

The return value should (always) be documented.

 + return -EIO;
 +}
 +EXPORT_SYMBOL(bh_submit_read);
  void __init buffer_init(void)

Missing newline.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/49] ext4: add block bitmap validation

2008-01-23 Thread Andrew Morton
 On Mon, 21 Jan 2008 22:02:03 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote:
 + if (bh_submit_read(bh)  0) {
 + brelse(bh);
 + ext4_error(sb, __FUNCTION__,
   Cannot read block bitmap - 
 - block_group = %lu, block_bitmap = %llu,
 - block_group, bitmap_blk);
 + block_group = %d, block_bitmap = %llu,
 + (int)block_group, (unsigned long long)bitmap_blk);
 + return NULL;
 + }
 + if (!ext4_valid_block_bitmap(sb, desc, block_group, bh)) {
 + brelse(bh);
 + return NULL;
 + }

brelse() should only be used when the bh might be NULL - put_bh()
can be used here.

Please review all ext4/jbd2 code for this trivial speedup.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 41/49] ext4: Add multi block allocator for ext4

2008-01-23 Thread Andrew Morton
 On Mon, 21 Jan 2008 22:02:20 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote:
 From: Alex Tomas [EMAIL PROTECTED]
 
 Signed-off-by: Alex Tomas [EMAIL PROTECTED]
 Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 Signed-off-by: Theodore Ts'o [EMAIL PROTECTED]

 ...

 +#if BITS_PER_LONG == 64
 +#define mb_correct_addr_and_bit(bit, addr)   \
 +{\
 + bit += ((unsigned long) addr  7UL)  3;   \
 + addr = (void *) ((unsigned long) addr  ~7UL);  \
 +}
 +#elif BITS_PER_LONG == 32
 +#define mb_correct_addr_and_bit(bit, addr)   \
 +{\
 + bit += ((unsigned long) addr  3UL)  3;   \
 + addr = (void *) ((unsigned long) addr  ~3UL);  \
 +}
 +#else
 +#error how many bits you are?!
 +#endif

Why do these exist?

 +static inline int mb_test_bit(int bit, void *addr)
 +{
 + mb_correct_addr_and_bit(bit, addr);
 + return ext4_test_bit(bit, addr);
 +}

ext2_test_bit() already handles bitnum  wordsize.

If mb_correct_addr_and_bit() is actually needed then some suitable comment
would help.

 +static inline void mb_set_bit(int bit, void *addr)
 +{
 + mb_correct_addr_and_bit(bit, addr);
 + ext4_set_bit(bit, addr);
 +}
 +
 +static inline void mb_set_bit_atomic(spinlock_t *lock, int bit, void *addr)
 +{
 + mb_correct_addr_and_bit(bit, addr);
 + ext4_set_bit_atomic(lock, bit, addr);
 +}
 +
 +static inline void mb_clear_bit(int bit, void *addr)
 +{
 + mb_correct_addr_and_bit(bit, addr);
 + ext4_clear_bit(bit, addr);
 +}
 +
 +static inline void mb_clear_bit_atomic(spinlock_t *lock, int bit, void *addr)
 +{
 + mb_correct_addr_and_bit(bit, addr);
 + ext4_clear_bit_atomic(lock, bit, addr);
 +}
 +
 +static inline void *mb_find_buddy(struct ext4_buddy *e4b, int order, int 
 *max)

uninlining this will save about eighty squigabytes of text.

Please review all of ext4/jbd2 with a view to removig unnecessary and wrong
inlings.

 +{
 + char *bb;
 +
 + /* FIXME!! is this needed */
 + BUG_ON(EXT4_MB_BITMAP(e4b) == EXT4_MB_BUDDY(e4b));
 + BUG_ON(max == NULL);
 +
 + if (order  e4b-bd_blkbits + 1) {
 + *max = 0;
 + return NULL;
 + }
 +
 + /* at order 0 we see each particular block */
 + *max = 1  (e4b-bd_blkbits + 3);
 + if (order == 0)
 + return EXT4_MB_BITMAP(e4b);
 +
 + bb = EXT4_MB_BUDDY(e4b) + EXT4_SB(e4b-bd_sb)-s_mb_offsets[order];
 + *max = EXT4_SB(e4b-bd_sb)-s_mb_maxs[order];
 +
 + return bb;
 +}
 +

 ...

 +#else
 +#define mb_free_blocks_double(a, b, c, d)
 +#define mb_mark_used_double(a, b, c)
 +#define mb_cmp_bitmaps(a, b)
 +#endif

Please use the do{}while(0) thing.  Or, better, proper C functions which
have typechecking (unless this will cause undefined-var compile errors,
which happens sometimes)

 +/* find most significant bit */
 +static int fmsb(unsigned short word)
 +{
 + int order;
 +
 + if (word  255) {
 + order = 7;
 + word = 8;
 + } else {
 + order = -1;
 + }
 +
 + do {
 + order++;
 + word = 1;
 + } while (word != 0);
 +
 + return order;
 +}

Did we just reinvent fls()?

 +/* FIXME!! need more doc */
 +static void ext4_mb_mark_free_simple(struct super_block *sb,
 + void *buddy, unsigned first, int len,
 + struct ext4_group_info *grp)
 +{
 + struct ext4_sb_info *sbi = EXT4_SB(sb);
 + unsigned short min;
 + unsigned short max;
 + unsigned short chunk;
 + unsigned short border;
 +
 + BUG_ON(len = EXT4_BLOCKS_PER_GROUP(sb));
 +
 + border = 2  sb-s_blocksize_bits;

Won't this explode with = 32k blocksize?

 + while (len  0) {
 + /* find how many blocks can be covered since this position */
 + max = ffs(first | border) - 1;
 +
 + /* find how many blocks of power 2 we need to mark */
 + min = fmsb(len);
 +
 + if (max  min)
 + min = max;
 + chunk = 1  min;
 +
 + /* mark multiblock chunks only */
 + grp-bb_counters[min]++;
 + if (min  0)
 + mb_clear_bit(first  min,
 +  buddy + sbi-s_mb_offsets[min]);
 +
 + len -= chunk;
 + first += chunk;
 + }
 +}
 +
 
 ...

 +static int ext4_mb_init_cache(struct page *page, char *incore)
 +{
 + int blocksize;
 + int blocks_per_page;
 + int groups_per_page;
 + int err = 0;
 + int i;
 + ext4_group_t first_group;
 + int first_block;
 + struct super_block *sb;
 + struct buffer_head *bhs;
 + struct buffer_head **bh;
 + struct inode *inode;
 + char *data;
 + char *bitmap;
 +
 + mb_debug(init page %lu\n, page-index);

Re: [PATCH 30/49] ext4: Convert truncate_mutex to read write semaphore.

2008-01-23 Thread Andrew Morton
 On Mon, 21 Jan 2008 22:02:09 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote:
 +int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t 
 block,
 + unsigned long max_blocks, struct buffer_head *bh,
 + int create, int extend_disksize)
 +{
 + int retval;
 + if (create) {
 + down_write((EXT4_I(inode)-i_data_sem));
 + } else {
 + down_read((EXT4_I(inode)-i_data_sem));
 + }
 + if (EXT4_I(inode)-i_flags  EXT4_EXTENTS_FL) {
 + retval =  ext4_ext_get_blocks(handle, inode, block, max_blocks,
 + bh, create, extend_disksize);
 + } else {
 + retval = ext4_get_blocks_handle(handle, inode, block,
 + max_blocks, bh, create, extend_disksize);
 + }
 + if (create) {
 + up_write((EXT4_I(inode)-i_data_sem));
 + } else {
 + up_read((EXT4_I(inode)-i_data_sem));
 + }

This function has many unneeded braces.  checkpatch used to detect this
but it seems to have broken.

 + return retval;
 +}
  static int ext4_get_block(struct inode *inode, sector_t iblock,
   struct buffer_head *bh_result, int create)

Mising newline.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 36/49] ext4: Add EXT4_IOC_MIGRATE ioctl

2008-01-23 Thread Andrew Morton
 On Mon, 21 Jan 2008 22:02:15 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote:
 The below patch add ioctl for migrating ext3 indirect block mapped inode
 to ext4 extent mapped inode.

This patch adds lots of weird and inexplicable single- and double-newlines
in inappropriate places.  However it frequently forgets to add newlines
between end-of-locals and start-of-code, which is usual practice.


+struct list_blocks_struct {
+   ext4_lblk_t first_block, last_block;
+   ext4_fsblk_t first_pblock, last_pblock;
+};

This structure would benefit from some code comments.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 41/49] ext4: Add multi block allocator for ext4

2008-01-23 Thread Andreas Dilger
On Jan 23, 2008  14:07 -0800, Andrew Morton wrote:
  +#define mb_correct_addr_and_bit(bit, addr) \
  +{  \
  +   bit += ((unsigned long) addr  3UL)  3;   \
  +   addr = (void *) ((unsigned long) addr  ~3UL);  \
  +}
 
 Why do these exist?

They seem to be a holdover from when mballoc stored the buddy bitmaps
on disk.  That no longer happens (to avoid bitmap vs. buddy consistency
problems), so I suspect they can be removed.

I can't comment on many of the other issues because Alex wrote most
of the code.

 Gosh what a lot of code.  Is it faster?

Yes, and also importantly it uses a lot less CPU to do a given amount
of allocation, which is critical in our environments where there is
very high disk bandwidth on a single node and CPU becomes the limiting
factor of the IO speed.  This of course also helps any write-intensive
environment where the CPU is doing something useful.

Some older test results include:
https://ols2006.108.redhat.com/2007/Reprints/mathur-Reprint.pdf (Section 7)

In the linux-ext4 thread compilebench numbers for ext4:
http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg03834.html

http://oss.oracle.com/~mason/compilebench/ext4/ext-create-compare.png
http://oss.oracle.com/~mason/compilebench/ext4/ext-compile-compare.png
http://oss.oracle.com/~mason/compilebench/ext4/ext-read-compare.png
http://oss.oracle.com/~mason/compilebench/ext4/ext-rm-compare.png

note the ext-read-compare.png graph shows lower read performance, but
a couple of bugs in mballoc were since fixed to have ext4 allocate more
contiguous extents.

In the old linux-ext4 thread [RFC] delayed allocation testing on node zefir
http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg00587.html

: dd2048rw 
: REAL   UTIME  STIME  READWRITTEN DETAILS
EXT3: 58.46  23 1491   25722097292 17 extents
EXT4: 44.56  19 1018   12  2097244 19 extents
REISERFS: 56.80  26 1370   29522097336 457 extents
JFS : 45.77  22 9840   2097216 1 extents
XFS : 50.97  20 1394   0   2100825 7 extents

: kernuntar
: REAL   UTIME  STIME  READWRITTEN DETAILS
EXT3: 56.99  5037   65168  252016  
EXT4: 55.03  5034   55336  249884  
REISERFS: 52.55  4996   85464  238068  
JFS : 70.15  5057   630496 288116  
XFS : 72.84  5052   953132 316798  

: kernstat 
: REAL   UTIME  STIME  READWRITTEN DETAILS
EXT3: 2.83   8  15 58920   
EXT4: 0.51   9  10 58920   
REISERFS: 0.81   7  49 26960   
JFS : 6.19   11 49 12552   0   
XFS : 2.09   9  61 65040   

: kerncat  
: REAL   UTIME  STIME  READWRITTEN DETAILS
EXT3: 9.48   25 213241624  0   
EXT4: 6.29   27 197238560  0   
REISERFS: 14.69  33 230234744  0   
JFS : 23.51  23 231244596  0   
XFS : 18.24  36 254238548  0   

: kernrm   
: REAL   UTIME  STIME  READWRITTEN DETAILS
EXT3: 4.82   4  10896284672
EXT4: 1.61   5  11065364632
REISERFS: 3.15   8  2762768236 
JFS : 33.90  7  16814400   33048   
XFS : 20.03  8  296663286160   


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HELLO

2008-01-23 Thread fred ogoja



Hello,


I want to solicit your attention in helping and processing and
securing on my behalf certain funds which i cannot
process because of my position in Government.

The purpose of my contacting you is because I need a foreigner to help me
handle this transaction


When you reply this message,i will send you the full details and more
information about myself and the
funds.

My personal email is : [EMAIL PROTECTED]

Thank you.

Mr Fred Ogoja

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ext4-online-defrag-free-space-fragmentation.patch uses do_fsync()

2008-01-23 Thread Takashi Sato

Hi,


I was trying to build ext4 as a module, and ran into problems because
the online defrag patch is calling do_fsync() which is *not* an exported
symbol, and so can not be called from a module.

Looking at what the routine is doing, there's no reason to call
do_fsync(), and in fact depending on the journaling mode in use, it may
not force a journal commit, which seems to be the goal of the code.

Hence, I plan to merge the following fix into the the
defrag-free-space-fragmentation patch, unless there are any objections
from Takashi-San or Akira-San.


Thank you for your information and the proposition of the fix.
Your fix is correct, please merge it.


Regards,

- Ted

diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c
index 4ef3dc0..19d2cfd 100644
--- a/fs/ext4/defrag.c
+++ b/fs/ext4/defrag.c
@@ -632,8 +632,9 @@ static int ext4_ext_defrag_victim(struct file *target_filp,
 }

 /* Sync journal blocks before reservation */
- if (do_fsync(target_filp, 0)) {
- printk(KERN_ERR defrag: failed do_fsync\n);
+ ret = ext4_force_commit(sb);
+ if (ret) {
+ printk(KERN_ERR defrag: failed do_fsync (%d)\n, ret);
 goto ERR;
 }
 }
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Takashi
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/49] Add buffer head related helper functions

2008-01-23 Thread Aneesh Kumar K.V
On Wed, Jan 23, 2008 at 02:06:48PM -0800, Andrew Morton wrote:
  On Mon, 21 Jan 2008 22:02:02 -0500 Theodore Ts'o [EMAIL PROTECTED] 
  wrote:
  +}
  +EXPORT_SYMBOL(bh_uptodate_or_lock);
  +/**
 
 Missing newline.
 
  + * bh_submit_read: Submit a locked buffer for reading
  + * @bh: struct buffer_head
  + *
  + * Returns a negative error
  + */
  +int bh_submit_read(struct buffer_head *bh)
  +{
  +   if (!buffer_locked(bh))
  +   lock_buffer(bh);
  +
  +   if (buffer_uptodate(bh))
  +   return 0;
 
 Here it can lock the buffer then return zero
 
  +   get_bh(bh);
  +   bh-b_end_io = end_buffer_read_sync;
  +   submit_bh(READ, bh);
  +   wait_on_buffer(bh);
  +   if (buffer_uptodate(bh))
  +   return 0;
 
 Here it will unlock the buffer and return zero.
 
 This function is unusable when passed an unlocked buffer.
 

Updated patch below.

commit 70d4ca32604e0935a8b9a49c5ac8b9c64c810693
Author: Aneesh Kumar K.V [EMAIL PROTECTED]
Date:   Thu Jan 24 10:50:24 2008 +0530

Add buffer head related helper functions

Add buffer head related helper function bh_uptodate_or_lock and
bh_submit_read which can be used by file system

Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]

diff --git a/fs/buffer.c b/fs/buffer.c
index 7249e01..82aa2db 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3213,6 +3213,53 @@ static int buffer_cpu_notify(struct notifier_block *self,
return NOTIFY_OK;
 }
 
+/**
+ * bh_uptodate_or_lock: Test whether the buffer is uptodate
+ * @bh: struct buffer_head
+ *
+ * Return true if the buffer is up-to-date and false,
+ * with the buffer locked, if not.
+ */
+int bh_uptodate_or_lock(struct buffer_head *bh)
+{
+   if (!buffer_uptodate(bh)) {
+   lock_buffer(bh);
+   if (!buffer_uptodate(bh))
+   return 0;
+   unlock_buffer(bh);
+   }
+   return 1;
+}
+EXPORT_SYMBOL(bh_uptodate_or_lock);
+
+/**
+ * bh_submit_read: Submit a locked buffer for reading
+ * @bh: struct buffer_head
+ *
+ * Returns zero on success and -EIO on error.If the input
+ * buffer is not locked returns -EINVAL
+ *
+ */
+int bh_submit_read(struct buffer_head *bh)
+{
+   if (!buffer_locked(bh))
+   return -EINVAL;
+
+   if (buffer_uptodate(bh)) {
+   unlock_buffer(bh);
+   return 0;
+   }
+
+   get_bh(bh);
+   bh-b_end_io = end_buffer_read_sync;
+   submit_bh(READ, bh);
+   wait_on_buffer(bh);
+   if (buffer_uptodate(bh))
+   return 0;
+   return -EIO;
+}
+EXPORT_SYMBOL(bh_submit_read);
+
 void __init buffer_init(void)
 {
int nrpages;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index da0d83f..e98801f 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -192,6 +192,8 @@ int sync_dirty_buffer(struct buffer_head *bh);
 int submit_bh(int, struct buffer_head *);
 void write_boundary_block(struct block_device *bdev,
sector_t bblock, unsigned blocksize);
+int bh_uptodate_or_lock(struct buffer_head *bh);
+int bh_submit_read(struct buffer_head *bh);
 
 extern int buffer_heads_over_limit;
 
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch]

2008-01-23 Thread Andrew Morton
 On Wed, 23 Jan 2008 04:12:16 -0500 Abhishek Rai [EMAIL PROTECTED] wrote:
  I'm wondering about the interaction between this code and the
  buffer_boundary() logic.  I guess we should disable the buffer_boundary()
  handling when this code is in effect.  Have you reviewed and tested that
  aspect?
 
 Thanks for pointing this out, I had totally missed this issue in my change. 
 I've now made the call to set_buffer_boundary() in ext3_get_blocks_handle() 
 subject to metacluster option being set.
 

Did it make any performance difference?  iirc the buffer_boundary stuff was
worth around 10% on a single linear read of a large, well-laid-out file.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 41/49] ext4: Add multi block allocator for ext4

2008-01-23 Thread Aneesh Kumar K.V
On Wed, Jan 23, 2008 at 02:07:27PM -0800, Andrew Morton wrote:
  On Mon, 21 Jan 2008 22:02:20 -0500 Theodore Ts'o [EMAIL PROTECTED] 
  wrote:
  From: Alex Tomas [EMAIL PROTECTED]
  
  Signed-off-by: Alex Tomas [EMAIL PROTECTED]
  Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
  Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
  Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
  Signed-off-by: Theodore Ts'o [EMAIL PROTECTED]
 
  ...
 
  +#if BITS_PER_LONG == 64
  +#define mb_correct_addr_and_bit(bit, addr) \
  +{  \
  +   bit += ((unsigned long) addr  7UL)  3;   \
  +   addr = (void *) ((unsigned long) addr  ~7UL);  \
  +}
  +#elif BITS_PER_LONG == 32
  +#define mb_correct_addr_and_bit(bit, addr) \
  +{  \
  +   bit += ((unsigned long) addr  3UL)  3;   \
  +   addr = (void *) ((unsigned long) addr  ~3UL);  \
  +}
  +#else
  +#error how many bits you are?!
  +#endif
 
 Why do these exist?

Initial version on mballoc supported on x86 32 this was there to give
compile warning on 64 bit platform. I guess we can remove that now.
Or may be we can keep it as such because it is harmless.


 
  +static inline int mb_test_bit(int bit, void *addr)
  +{
  +   mb_correct_addr_and_bit(bit, addr);
  +   return ext4_test_bit(bit, addr);
  +}
 
 ext2_test_bit() already handles bitnum  wordsize.
 
 If mb_correct_addr_and_bit() is actually needed then some suitable comment
 would help.

ext4_test_bit on powerpc needs the addr to be 8 byte aligned. Othewise
it fails

 
  +static inline void mb_set_bit(int bit, void *addr)
  +{
  +   mb_correct_addr_and_bit(bit, addr);
  +   ext4_set_bit(bit, addr);
  +}
  +
  +static inline void mb_set_bit_atomic(spinlock_t *lock, int bit, void *addr)
  +{
  +   mb_correct_addr_and_bit(bit, addr);
  +   ext4_set_bit_atomic(lock, bit, addr);
  +}
  +
  +static inline void mb_clear_bit(int bit, void *addr)
  +{
  +   mb_correct_addr_and_bit(bit, addr);
  +   ext4_clear_bit(bit, addr);
  +}
  +
  +static inline void mb_clear_bit_atomic(spinlock_t *lock, int bit, void 
  *addr)
  +{
  +   mb_correct_addr_and_bit(bit, addr);
  +   ext4_clear_bit_atomic(lock, bit, addr);
  +}
  +
  +static inline void *mb_find_buddy(struct ext4_buddy *e4b, int order, int 
  *max)
 
 uninlining this will save about eighty squigabytes of text.

Fixed


 
 Please review all of ext4/jbd2 with a view to removig unnecessary and wrong
 inlings.
 
  +{
  +   char *bb;
  +
  +   /* FIXME!! is this needed */
  +   BUG_ON(EXT4_MB_BITMAP(e4b) == EXT4_MB_BUDDY(e4b));
  +   BUG_ON(max == NULL);
  +
  +   if (order  e4b-bd_blkbits + 1) {
  +   *max = 0;
  +   return NULL;
  +   }
  +
  +   /* at order 0 we see each particular block */
  +   *max = 1  (e4b-bd_blkbits + 3);
  +   if (order == 0)
  +   return EXT4_MB_BITMAP(e4b);
  +
  +   bb = EXT4_MB_BUDDY(e4b) + EXT4_SB(e4b-bd_sb)-s_mb_offsets[order];
  +   *max = EXT4_SB(e4b-bd_sb)-s_mb_maxs[order];
  +
  +   return bb;
  +}
  +
 
  ...
 
  +#else
  +#define mb_free_blocks_double(a, b, c, d)
  +#define mb_mark_used_double(a, b, c)
  +#define mb_cmp_bitmaps(a, b)
  +#endif
 
 Please use the do{}while(0) thing.  Or, better, proper C functions which
 have typechecking (unless this will cause undefined-var compile errors,
 which happens sometimes)

makde static inline void.

 
  +/* find most significant bit */
  +static int fmsb(unsigned short word)
  +{
  +   int order;
  +
  +   if (word  255) {
  +   order = 7;
  +   word = 8;
  +   } else {
  +   order = -1;
  +   }
  +
  +   do {
  +   order++;
  +   word = 1;
  +   } while (word != 0);
  +
  +   return order;
  +}
 
 Did we just reinvent fls()?

replaced by fls.

 
  +/* FIXME!! need more doc */
  +static void ext4_mb_mark_free_simple(struct super_block *sb,
  +   void *buddy, unsigned first, int len,
  +   struct ext4_group_info *grp)
  +{
  +   struct ext4_sb_info *sbi = EXT4_SB(sb);
  +   unsigned short min;
  +   unsigned short max;
  +   unsigned short chunk;
  +   unsigned short border;
  +
  +   BUG_ON(len = EXT4_BLOCKS_PER_GROUP(sb));
  +
  +   border = 2  sb-s_blocksize_bits;
 
 Won't this explode with = 32k blocksize?
 
  +   while (len  0) {
  +   /* find how many blocks can be covered since this position */
  +   max = ffs(first | border) - 1;
  +
  +   /* find how many blocks of power 2 we need to mark */
  +   min = fmsb(len);
  +
  +   if (max  min)
  +   min = max;
  +   chunk = 1  min;
  +
  +   /* mark multiblock chunks only */
  +   grp-bb_counters[min]++;
  +   if (min  0)
  +   mb_clear_bit(first  min,
  +buddy + sbi-s_mb_offsets[min]);
  +
  +   len -= chunk;
  +   first += chunk;
  +   }
  +}
  +
  
  ...
 
  +static