Confused by journaling code in ext3_new_blocks()
Hi, I have the following question related to the journaling code in ext3_new_blocks() function of fs/ext3/balloc.c, any help in this regard will be greatly appreciated. The code snippet below is taken from 2.6.24-rc8-mm1 but this code has changed little for a long time so it's likely that any other recent kernel will also have very similar code. ext3_new_blocks() - ext3_try_to_allocate_with_rsv() - ext3_journal_dirty_metadata(handle, bitmap_bh) ... allocated: if ( /*sanity checking of newly allocated block numbers*/) { ext3_error(...); goto out; } out: ... brelse(bitmap_bh); return 0; In the above code snippet, ext3_try_to_allocate_with_rsv() is marking bitmap_bh as dirty metadata _before_ the sanity checks. I'm not worried about the sanity checks as such, but it seems to me that if the checks were to fail, we'd be committing the bitmap_bh updates to disk even though it violates metadata sanity, or am I missing something here ? Thanks, Abhishek - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression: 100% io-wait with 2.6.24-rcX
- Original Message From: Alasdair G Kergon [EMAIL PROTECTED] To: Martin Knoblauch [EMAIL PROTECTED] Cc: Linus Torvalds [EMAIL PROTECTED]; Mel Gorman [EMAIL PROTECTED]; Fengguang Wu [EMAIL PROTECTED]; Mike Snitzer [EMAIL PROTECTED]; Peter Zijlstra [EMAIL PROTECTED]; [EMAIL PROTECTED]; Ingo Molnar [EMAIL PROTECTED]; [EMAIL PROTECTED]; linux-ext4@vger.kernel.org linux-ext4@vger.kernel.org; [EMAIL PROTECTED]; Jens Axboe [EMAIL PROTECTED]; Milan Broz [EMAIL PROTECTED]; Neil Brown [EMAIL PROTECTED] Sent: Wednesday, January 23, 2008 12:40:52 AM Subject: Re: regression: 100% io-wait with 2.6.24-rcX On Tue, Jan 22, 2008 at 07:25:15AM -0800, Martin Knoblauch wrote: [EMAIL PROTECTED] ~]# dmsetup table VolGroup00-LogVol02: 0 350945280 linear 104:2 67109248 VolGroup00-LogVol01: 0 8388608 linear 104:2 418054528 VolGroup00-LogVol00: 0 67108864 linear 104:2 384 The IO should pass straight through simple linear targets like that without needing to get broken up, so I wouldn't expect those patches to make any difference in this particular case. Alasdair, LVM/DM are off the hook :-) I converted one box to direct using partitions and the performance is the same disappointment as with LVM/DM. Thanks anyway for looking at my problem. I will move the discussion now to a new thread, targetting CCISS directly. Cheers Martin - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ext4-online-defrag-free-space-fragmentation.patch uses do_fsync()
I was trying to build ext4 as a module, and ran into problems because the online defrag patch is calling do_fsync() which is *not* an exported symbol, and so can not be called from a module. Looking at what the routine is doing, there's no reason to call do_fsync(), and in fact depending on the journaling mode in use, it may not force a journal commit, which seems to be the goal of the code. Hence, I plan to merge the following fix into the the defrag-free-space-fragmentation patch, unless there are any objections from Takashi-San or Akira-San. Regards, - Ted diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c index 4ef3dc0..19d2cfd 100644 --- a/fs/ext4/defrag.c +++ b/fs/ext4/defrag.c @@ -632,8 +632,9 @@ static int ext4_ext_defrag_victim(struct file *target_filp, } /* Sync journal blocks before reservation */ - if (do_fsync(target_filp, 0)) { - printk(KERN_ERR defrag: failed do_fsync\n); + ret = ext4_force_commit(sb); + if (ret) { + printk(KERN_ERR defrag: failed do_fsync (%d)\n, ret); goto ERR; } } - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext4-online-defrag-free-space-fragmentation.patch uses do_fsync()
diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c index 4ef3dc0..19d2cfd 100644 --- a/fs/ext4/defrag.c +++ b/fs/ext4/defrag.c @@ -632,8 +632,9 @@ static int ext4_ext_defrag_victim(struct file *target_filp, } /* Sync journal blocks before reservation */ - if (do_fsync(target_filp, 0)) { - printk(KERN_ERR defrag: failed do_fsync\n); + ret = ext4_force_commit(sb); + if (ret) { + printk(KERN_ERR defrag: failed do_fsync (%d)\n, ret); I'd think you'd want to change the printk text as well. defrag: failed ext4_force_commit (%d)\n maybe? goto ERR; } } -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext4-online-defrag-free-space-fragmentation.patch uses do_fsync()
On Wed, Jan 23, 2008 at 08:10:20AM -0600, Dave Kleikamp wrote: I'd think you'd want to change the printk text as well. defrag: failed ext4_force_commit (%d)\n maybe? mmm, good catch, thanks. - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] Add new development flag to the ext4 filesystem
On Tue, Jan 22, 2008 at 09:55:37PM -0600, Eric Sandeen wrote: Overall, seems ok. One other question though, when ext4 is a fully-fledged production filesystem, and the flag requirement is gone, what stops ext3 filesystems from being silently mounted as ext4, just as happened with ext4dev today? Nothing will prevent that, but in the long term we don't want ext4 to be automatically adding new features flags anyway. The only reason why we were doing that was to encourage more people to test out ext4, and for the convenience of the ABAT auto-testing. So I'm assuming that before we remove this test, we will also be fixing some of the automatic enablement of extents, etc., because that sort of thing will be moved into e2fsprogs as part of tune2fs -O ext4 or mke2fs -O ext4 or mkfs.ext4. If we do that, then the only downside of having ext3 filesystems run under ext4 is the test matrix concern. Since I'm still hoping that some point in the future, fs/ext4 could subsume fs/ext3 so we don't have to worry about bug fixes going into fs/ext2 AND fs/ext3 AND fs/ext4, I have my own reasons for wanting that. But I do understand the concerns that maybe in the short term some distro's don't want to do that. So in that case I could see adding a you must have extents test into ext4, if I distro has specific support concerns. But for people who are running mainline kernel, I think it's actually a *good* thing if fs/ext4 can mount and read and write to an ext3 filesystem --- as long as it doesn't automatically turn on features behind the user's back. + \ttestfs\n)); help text doesn't match reality here, missing a _ Oops, thanks for catching that. - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] Add new development flag to the ext4 filesystem
On Wed, Jan 23, 2008 at 11:04:35AM -0600, Eric Sandeen wrote: I just think that ext4.ko running ext3 filesystems needs to be under explicit control, and not something that happens, occasionally, accidentally, without the user/administrator requesting it. Least surprise, and all that... Well, most of the time that will happen, given that /etc/fstab explicitly states which filesystem to use; and if the user doesn't specify a filesystme type, mount will use blkid/vol_id, and if that says ext3, then ext3 will be used. The only time where it might not happen without an explicit administrator request is on the root filesystem automount and if you are using an initrd to mount the filesystem (as all of the enterprise distros now do anyway), that could be easily controlled form userspace as well. - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix commit block write in JBD
Hi, the patch below fixes preparation of commit block in journal_write_commit_record(). Obviously the bug doesn't really matter since nobody reported it so far but let's cleanup the code... Andrew, could you please queue it up? Thanks. Honza -- Jan Kara [EMAIL PROTECTED] SUSE Labs, CR --- Commit block is expected to have several copies of the header. Fix the bug Andrew has spotted ages ago. Signed-off-by: Jan Kara [EMAIL PROTECTED] diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c index 610264b..a69b240 100644 --- a/fs/jbd/commit.c +++ b/fs/jbd/commit.c @@ -116,9 +116,8 @@ static int journal_write_commit_record(journal_t *journal, bh = jh2bh(descriptor); - /* AKPM: buglet - add `i' to tmp! */ for (i = 0; i bh-b_size; i += 512) { - journal_header_t *tmp = (journal_header_t*)bh-b_data; + journal_header_t *tmp = (journal_header_t*)(bh-b_data+i); tmp-h_magic = cpu_to_be32(JFS_MAGIC_NUMBER); tmp-h_blocktype = cpu_to_be32(JFS_COMMIT_BLOCK); tmp-h_sequence = cpu_to_be32(commit_transaction-t_tid); - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix commit block write in JBD2
On Wed 23-01-08 20:09:43, Jan Kara wrote: Hi, the patch below fixes preparation of commit block in journal_write_commit_record(). Obviously the bug doesn't really matter since nobody reported it so far but let's cleanup the code... Andrew, could you please queue it up? Thanks. And the same fix for JBD2. Honza -- Jan Kara [EMAIL PROTECTED] SUSE Labs, CR --- Commit block is expected to have several copies of the header. Fix the bug Andrew has spotted ages ago. Signed-off-by: Jan Kara [EMAIL PROTECTED] diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 6986f33..ed61283 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -116,9 +116,8 @@ static int journal_write_commit_record(journal_t *journal, bh = jh2bh(descriptor); - /* AKPM: buglet - add `i' to tmp! */ for (i = 0; i bh-b_size; i += 512) { - journal_header_t *tmp = (journal_header_t*)bh-b_data; + journal_header_t *tmp = (journal_header_t*)(bh-b_data+i); tmp-h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER); tmp-h_blocktype = cpu_to_be32(JBD2_COMMIT_BLOCK); tmp-h_sequence = cpu_to_be32(commit_transaction-t_tid); - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Confused by journaling code in ext3_new_blocks()
Hello, I have the following question related to the journaling code in ext3_new_blocks() function of fs/ext3/balloc.c, any help in this regard will be greatly appreciated. The code snippet below is taken from 2.6.24-rc8-mm1 but this code has changed little for a long time so it's likely that any other recent kernel will also have very similar code. ext3_new_blocks() - ext3_try_to_allocate_with_rsv() - ext3_journal_dirty_metadata(handle, bitmap_bh) ... allocated: if ( /*sanity checking of newly allocated block numbers*/) { ext3_error(...); goto out; } out: ... brelse(bitmap_bh); return 0; In the above code snippet, ext3_try_to_allocate_with_rsv() is marking bitmap_bh as dirty metadata _before_ the sanity checks. I'm not worried about the sanity checks as such, but it seems to me that if the checks were to fail, we'd be committing the bitmap_bh updates to disk even though it violates metadata sanity, or am I missing something here ? Actually, it depends. Because if sanity checks fail, we call ext3_error() which remounts fs R/O or panics in most cases. Only if errors=continue is set, we really get to committing the wrong bitmap data. But I don't think this really is a problem - fsck has to be run anyway and it rebuilds the bitmaps from scratch. Honza -- Jan Kara [EMAIL PROTECTED] SuSE CR Labs - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix commit block write in JBD2
Hi, The loop was removed in journal checksum patch. There had been a discussion (11 Jul 2007: [EXT4 set 8][PATCH 1/1]Add journal checksums) about this part of code as checksum info was added to commit block. Regards, Girish On Wed, 2008-01-23 at 20:10 +0100, Jan Kara wrote: On Wed 23-01-08 20:09:43, Jan Kara wrote: Hi, the patch below fixes preparation of commit block in journal_write_commit_record(). Obviously the bug doesn't really matter since nobody reported it so far but let's cleanup the code... Andrew, could you please queue it up? Thanks. And the same fix for JBD2. Honza - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] Add new development flag to the ext4 filesystem
On Jan 23, 2008 11:53 -0500, Theodore Tso wrote: Since I'm still hoping that some point in the future, fs/ext4 could subsume fs/ext3 so we don't have to worry about bug fixes going into fs/ext2 AND fs/ext3 AND fs/ext4, I have my own reasons for wanting that. If any newbie kernel hacker wants a filesystem project, allowing ext4 to mount ext2 filesystems w/o a journal would be very useful. I suspect that a simple flag check in the ext4_journal_* wrappers of the jbd2 functions would be enough in many cases. One of the reasons to keep ext2 around is that ext3 cannot mount the filesystem without a journal, and removing that limitation for ext4 would bring us one step closer to removing a ton of duplicate code. Another reason for ext2 vs. ext3 was overhead from journaling, and that could also be removed by allowing ext4 to mount ext2 filesystems w/o a journal. Maybe a good proposal for a Google Summer-of-Code project. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix commit block write in JBD2
Hi, On Thu 24-01-08 02:48:41, Girish Shilamkar wrote: The loop was removed in journal checksum patch. There had been a discussion (11 Jul 2007: [EXT4 set 8][PATCH 1/1]Add journal checksums) about this part of code as checksum info was added to commit block. Ah, OK, thanks for explanation. So for JBD2 we don't need the patch anymore. But for JBD it's still needed. Honza On Wed, 2008-01-23 at 20:10 +0100, Jan Kara wrote: On Wed 23-01-08 20:09:43, Jan Kara wrote: Hi, the patch below fixes preparation of commit block in journal_write_commit_record(). Obviously the bug doesn't really matter since nobody reported it so far but let's cleanup the code... Andrew, could you please queue it up? Thanks. And the same fix for JBD2. Honza -- Jan Kara [EMAIL PROTECTED] SUSE Labs, CR - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 33/49] ext4: Add the journal checksum feature
On Mon, 21 Jan 2008 22:02:12 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote: From: Girish Shilamkar [EMAIL PROTECTED] The journal checksum feature adds two new flags i.e JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Due to checksums, writing of the commit record no longer needs to be synchronous. Now commit record can be sent to disk without waiting for descriptor blocks to be written to disk. This behavior is controlled using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be able to recover the journal with _ASYNC_COMMIT hence it is made incompat. The commit header has been extended to hold the checksum along with the type of the checksum. For recovery in pass scan checksums are verified to ensure the sanity and completeness(in case of _ASYNC_COMMIT) of every transaction. ... +static inline __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh) unneeded inlining. +{ + struct page *page = bh-b_page; + char *addr; + __u32 checksum; + + addr = kmap_atomic(page, KM_USER0); + checksum = crc32_be(crc32_sum, + (void *)(addr + offset_in_page(bh-b_data)), bh-b_size); + kunmap_atomic(addr, KM_USER0); + + return checksum; +} Can this buffer actually be in highmem? static inline void write_tag_block(int tag_bytes, journal_block_tag_t *tag, unsigned long long block) More unnecessary inlining. +/* + * jbd2_journal_clear_features () - Clear a given journal feature in the + * superblock + * @journal: Journal to act on. + * @compat: bitmask of compatible features + * @ro: bitmask of features that force read-only mount + * @incompat: bitmask of incompatible features + * + * Clear a given journal feature as present on the + * superblock. Returns true if the requested features could be reset. + */ +int jbd2_journal_clear_features(journal_t *journal, unsigned long compat, + unsigned long ro, unsigned long incompat) +{ + journal_superblock_t *sb; + + jbd_debug(1, Clear features 0x%lx/0x%lx/0x%lx\n, + compat, ro, incompat); + + sb = journal-j_superblock; + + sb-s_feature_compat= ~cpu_to_be32(compat); + sb-s_feature_ro_compat = ~cpu_to_be32(ro); + sb-s_feature_incompat = ~cpu_to_be32(incompat); + + return 1; +} +EXPORT_SYMBOL(jbd2_journal_clear_features); Kernel usually returns 0 on success. So we can return a useful errno on failure. +/* + * calc_chksums calculates the checksums for the blocks described in the + * descriptor block. + */ +static int calc_chksums(journal_t *journal, struct buffer_head *bh, + unsigned long *next_log_block, __u32 *crc32_sum) +{ + int i, num_blks, err; + unsigned io_block; + struct buffer_head *obh; + + num_blks = count_tags(journal, bh); + /* Calculate checksum of the descriptor block. */ + *crc32_sum = crc32_be(*crc32_sum, (void *)bh-b_data, bh-b_size); + + for (i = 0; i num_blks; i++) { + io_block = (*next_log_block)++; unsigned - unsigned long. Are all the types appropriate in here? + wrap(journal, *next_log_block); + err = jread(obh, journal, io_block); + if (err) { + printk(KERN_ERR JBD: IO error %d recovering block + %u in log\n, err, io_block); + return 1; + } else { + *crc32_sum = crc32_be(*crc32_sum, (void *)obh-b_data, + obh-b_size); + } + } + return 0; +} + static int do_one_pass(journal_t *journal, struct recovery_info *info, enum passtype pass) { @@ -328,6 +360,7 @@ static int do_one_pass(journal_t *journal, unsigned intsequence; int blocktype; int tag_bytes = journal_tag_bytes(journal); + __u32 crc32_sum = ~0; /* Transactional Checksums */ /* Precompute the maximum metadata descriptors in a descriptor block */ int MAX_BLOCKS_PER_DESC; @@ -419,9 +452,23 @@ static int do_one_pass(journal_t *journal, switch(blocktype) { case JBD2_DESCRIPTOR_BLOCK: /* If it is a valid descriptor block, replay it - * in pass REPLAY; otherwise, just skip over the - * blocks it describes. */ + * in pass REPLAY; if journal_checksums enabled, then + * calculate checksums in PASS_SCAN, otherwise, + * just skip over the blocks it describes. */ if (pass !=
Re: [PATCH 23/49] Add buffer head related helper functions
On Mon, 21 Jan 2008 22:02:02 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote: +} +EXPORT_SYMBOL(bh_uptodate_or_lock); +/** Missing newline. + * bh_submit_read: Submit a locked buffer for reading + * @bh: struct buffer_head + * + * Returns a negative error + */ +int bh_submit_read(struct buffer_head *bh) +{ + if (!buffer_locked(bh)) + lock_buffer(bh); + + if (buffer_uptodate(bh)) + return 0; Here it can lock the buffer then return zero + get_bh(bh); + bh-b_end_io = end_buffer_read_sync; + submit_bh(READ, bh); + wait_on_buffer(bh); + if (buffer_uptodate(bh)) + return 0; Here it will unlock the buffer and return zero. This function is unusable when passed an unlocked buffer. The return value should (always) be documented. + return -EIO; +} +EXPORT_SYMBOL(bh_submit_read); void __init buffer_init(void) Missing newline. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 24/49] ext4: add block bitmap validation
On Mon, 21 Jan 2008 22:02:03 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote: + if (bh_submit_read(bh) 0) { + brelse(bh); + ext4_error(sb, __FUNCTION__, Cannot read block bitmap - - block_group = %lu, block_bitmap = %llu, - block_group, bitmap_blk); + block_group = %d, block_bitmap = %llu, + (int)block_group, (unsigned long long)bitmap_blk); + return NULL; + } + if (!ext4_valid_block_bitmap(sb, desc, block_group, bh)) { + brelse(bh); + return NULL; + } brelse() should only be used when the bh might be NULL - put_bh() can be used here. Please review all ext4/jbd2 code for this trivial speedup. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 41/49] ext4: Add multi block allocator for ext4
On Mon, 21 Jan 2008 22:02:20 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote: From: Alex Tomas [EMAIL PROTECTED] Signed-off-by: Alex Tomas [EMAIL PROTECTED] Signed-off-by: Andreas Dilger [EMAIL PROTECTED] Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED] Signed-off-by: Eric Sandeen [EMAIL PROTECTED] Signed-off-by: Theodore Ts'o [EMAIL PROTECTED] ... +#if BITS_PER_LONG == 64 +#define mb_correct_addr_and_bit(bit, addr) \ +{\ + bit += ((unsigned long) addr 7UL) 3; \ + addr = (void *) ((unsigned long) addr ~7UL); \ +} +#elif BITS_PER_LONG == 32 +#define mb_correct_addr_and_bit(bit, addr) \ +{\ + bit += ((unsigned long) addr 3UL) 3; \ + addr = (void *) ((unsigned long) addr ~3UL); \ +} +#else +#error how many bits you are?! +#endif Why do these exist? +static inline int mb_test_bit(int bit, void *addr) +{ + mb_correct_addr_and_bit(bit, addr); + return ext4_test_bit(bit, addr); +} ext2_test_bit() already handles bitnum wordsize. If mb_correct_addr_and_bit() is actually needed then some suitable comment would help. +static inline void mb_set_bit(int bit, void *addr) +{ + mb_correct_addr_and_bit(bit, addr); + ext4_set_bit(bit, addr); +} + +static inline void mb_set_bit_atomic(spinlock_t *lock, int bit, void *addr) +{ + mb_correct_addr_and_bit(bit, addr); + ext4_set_bit_atomic(lock, bit, addr); +} + +static inline void mb_clear_bit(int bit, void *addr) +{ + mb_correct_addr_and_bit(bit, addr); + ext4_clear_bit(bit, addr); +} + +static inline void mb_clear_bit_atomic(spinlock_t *lock, int bit, void *addr) +{ + mb_correct_addr_and_bit(bit, addr); + ext4_clear_bit_atomic(lock, bit, addr); +} + +static inline void *mb_find_buddy(struct ext4_buddy *e4b, int order, int *max) uninlining this will save about eighty squigabytes of text. Please review all of ext4/jbd2 with a view to removig unnecessary and wrong inlings. +{ + char *bb; + + /* FIXME!! is this needed */ + BUG_ON(EXT4_MB_BITMAP(e4b) == EXT4_MB_BUDDY(e4b)); + BUG_ON(max == NULL); + + if (order e4b-bd_blkbits + 1) { + *max = 0; + return NULL; + } + + /* at order 0 we see each particular block */ + *max = 1 (e4b-bd_blkbits + 3); + if (order == 0) + return EXT4_MB_BITMAP(e4b); + + bb = EXT4_MB_BUDDY(e4b) + EXT4_SB(e4b-bd_sb)-s_mb_offsets[order]; + *max = EXT4_SB(e4b-bd_sb)-s_mb_maxs[order]; + + return bb; +} + ... +#else +#define mb_free_blocks_double(a, b, c, d) +#define mb_mark_used_double(a, b, c) +#define mb_cmp_bitmaps(a, b) +#endif Please use the do{}while(0) thing. Or, better, proper C functions which have typechecking (unless this will cause undefined-var compile errors, which happens sometimes) +/* find most significant bit */ +static int fmsb(unsigned short word) +{ + int order; + + if (word 255) { + order = 7; + word = 8; + } else { + order = -1; + } + + do { + order++; + word = 1; + } while (word != 0); + + return order; +} Did we just reinvent fls()? +/* FIXME!! need more doc */ +static void ext4_mb_mark_free_simple(struct super_block *sb, + void *buddy, unsigned first, int len, + struct ext4_group_info *grp) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + unsigned short min; + unsigned short max; + unsigned short chunk; + unsigned short border; + + BUG_ON(len = EXT4_BLOCKS_PER_GROUP(sb)); + + border = 2 sb-s_blocksize_bits; Won't this explode with = 32k blocksize? + while (len 0) { + /* find how many blocks can be covered since this position */ + max = ffs(first | border) - 1; + + /* find how many blocks of power 2 we need to mark */ + min = fmsb(len); + + if (max min) + min = max; + chunk = 1 min; + + /* mark multiblock chunks only */ + grp-bb_counters[min]++; + if (min 0) + mb_clear_bit(first min, + buddy + sbi-s_mb_offsets[min]); + + len -= chunk; + first += chunk; + } +} + ... +static int ext4_mb_init_cache(struct page *page, char *incore) +{ + int blocksize; + int blocks_per_page; + int groups_per_page; + int err = 0; + int i; + ext4_group_t first_group; + int first_block; + struct super_block *sb; + struct buffer_head *bhs; + struct buffer_head **bh; + struct inode *inode; + char *data; + char *bitmap; + + mb_debug(init page %lu\n, page-index);
Re: [PATCH 30/49] ext4: Convert truncate_mutex to read write semaphore.
On Mon, 21 Jan 2008 22:02:09 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote: +int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block, + unsigned long max_blocks, struct buffer_head *bh, + int create, int extend_disksize) +{ + int retval; + if (create) { + down_write((EXT4_I(inode)-i_data_sem)); + } else { + down_read((EXT4_I(inode)-i_data_sem)); + } + if (EXT4_I(inode)-i_flags EXT4_EXTENTS_FL) { + retval = ext4_ext_get_blocks(handle, inode, block, max_blocks, + bh, create, extend_disksize); + } else { + retval = ext4_get_blocks_handle(handle, inode, block, + max_blocks, bh, create, extend_disksize); + } + if (create) { + up_write((EXT4_I(inode)-i_data_sem)); + } else { + up_read((EXT4_I(inode)-i_data_sem)); + } This function has many unneeded braces. checkpatch used to detect this but it seems to have broken. + return retval; +} static int ext4_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) Mising newline. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 36/49] ext4: Add EXT4_IOC_MIGRATE ioctl
On Mon, 21 Jan 2008 22:02:15 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote: The below patch add ioctl for migrating ext3 indirect block mapped inode to ext4 extent mapped inode. This patch adds lots of weird and inexplicable single- and double-newlines in inappropriate places. However it frequently forgets to add newlines between end-of-locals and start-of-code, which is usual practice. +struct list_blocks_struct { + ext4_lblk_t first_block, last_block; + ext4_fsblk_t first_pblock, last_pblock; +}; This structure would benefit from some code comments. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 41/49] ext4: Add multi block allocator for ext4
On Jan 23, 2008 14:07 -0800, Andrew Morton wrote: +#define mb_correct_addr_and_bit(bit, addr) \ +{ \ + bit += ((unsigned long) addr 3UL) 3; \ + addr = (void *) ((unsigned long) addr ~3UL); \ +} Why do these exist? They seem to be a holdover from when mballoc stored the buddy bitmaps on disk. That no longer happens (to avoid bitmap vs. buddy consistency problems), so I suspect they can be removed. I can't comment on many of the other issues because Alex wrote most of the code. Gosh what a lot of code. Is it faster? Yes, and also importantly it uses a lot less CPU to do a given amount of allocation, which is critical in our environments where there is very high disk bandwidth on a single node and CPU becomes the limiting factor of the IO speed. This of course also helps any write-intensive environment where the CPU is doing something useful. Some older test results include: https://ols2006.108.redhat.com/2007/Reprints/mathur-Reprint.pdf (Section 7) In the linux-ext4 thread compilebench numbers for ext4: http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg03834.html http://oss.oracle.com/~mason/compilebench/ext4/ext-create-compare.png http://oss.oracle.com/~mason/compilebench/ext4/ext-compile-compare.png http://oss.oracle.com/~mason/compilebench/ext4/ext-read-compare.png http://oss.oracle.com/~mason/compilebench/ext4/ext-rm-compare.png note the ext-read-compare.png graph shows lower read performance, but a couple of bugs in mballoc were since fixed to have ext4 allocate more contiguous extents. In the old linux-ext4 thread [RFC] delayed allocation testing on node zefir http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg00587.html : dd2048rw : REAL UTIME STIME READWRITTEN DETAILS EXT3: 58.46 23 1491 25722097292 17 extents EXT4: 44.56 19 1018 12 2097244 19 extents REISERFS: 56.80 26 1370 29522097336 457 extents JFS : 45.77 22 9840 2097216 1 extents XFS : 50.97 20 1394 0 2100825 7 extents : kernuntar : REAL UTIME STIME READWRITTEN DETAILS EXT3: 56.99 5037 65168 252016 EXT4: 55.03 5034 55336 249884 REISERFS: 52.55 4996 85464 238068 JFS : 70.15 5057 630496 288116 XFS : 72.84 5052 953132 316798 : kernstat : REAL UTIME STIME READWRITTEN DETAILS EXT3: 2.83 8 15 58920 EXT4: 0.51 9 10 58920 REISERFS: 0.81 7 49 26960 JFS : 6.19 11 49 12552 0 XFS : 2.09 9 61 65040 : kerncat : REAL UTIME STIME READWRITTEN DETAILS EXT3: 9.48 25 213241624 0 EXT4: 6.29 27 197238560 0 REISERFS: 14.69 33 230234744 0 JFS : 23.51 23 231244596 0 XFS : 18.24 36 254238548 0 : kernrm : REAL UTIME STIME READWRITTEN DETAILS EXT3: 4.82 4 10896284672 EXT4: 1.61 5 11065364632 REISERFS: 3.15 8 2762768236 JFS : 33.90 7 16814400 33048 XFS : 20.03 8 296663286160 Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
HELLO
Hello, I want to solicit your attention in helping and processing and securing on my behalf certain funds which i cannot process because of my position in Government. The purpose of my contacting you is because I need a foreigner to help me handle this transaction When you reply this message,i will send you the full details and more information about myself and the funds. My personal email is : [EMAIL PROTECTED] Thank you. Mr Fred Ogoja - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ext4-online-defrag-free-space-fragmentation.patch uses do_fsync()
Hi, I was trying to build ext4 as a module, and ran into problems because the online defrag patch is calling do_fsync() which is *not* an exported symbol, and so can not be called from a module. Looking at what the routine is doing, there's no reason to call do_fsync(), and in fact depending on the journaling mode in use, it may not force a journal commit, which seems to be the goal of the code. Hence, I plan to merge the following fix into the the defrag-free-space-fragmentation patch, unless there are any objections from Takashi-San or Akira-San. Thank you for your information and the proposition of the fix. Your fix is correct, please merge it. Regards, - Ted diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c index 4ef3dc0..19d2cfd 100644 --- a/fs/ext4/defrag.c +++ b/fs/ext4/defrag.c @@ -632,8 +632,9 @@ static int ext4_ext_defrag_victim(struct file *target_filp, } /* Sync journal blocks before reservation */ - if (do_fsync(target_filp, 0)) { - printk(KERN_ERR defrag: failed do_fsync\n); + ret = ext4_force_commit(sb); + if (ret) { + printk(KERN_ERR defrag: failed do_fsync (%d)\n, ret); goto ERR; } } - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Takashi - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 23/49] Add buffer head related helper functions
On Wed, Jan 23, 2008 at 02:06:48PM -0800, Andrew Morton wrote: On Mon, 21 Jan 2008 22:02:02 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote: +} +EXPORT_SYMBOL(bh_uptodate_or_lock); +/** Missing newline. + * bh_submit_read: Submit a locked buffer for reading + * @bh: struct buffer_head + * + * Returns a negative error + */ +int bh_submit_read(struct buffer_head *bh) +{ + if (!buffer_locked(bh)) + lock_buffer(bh); + + if (buffer_uptodate(bh)) + return 0; Here it can lock the buffer then return zero + get_bh(bh); + bh-b_end_io = end_buffer_read_sync; + submit_bh(READ, bh); + wait_on_buffer(bh); + if (buffer_uptodate(bh)) + return 0; Here it will unlock the buffer and return zero. This function is unusable when passed an unlocked buffer. Updated patch below. commit 70d4ca32604e0935a8b9a49c5ac8b9c64c810693 Author: Aneesh Kumar K.V [EMAIL PROTECTED] Date: Thu Jan 24 10:50:24 2008 +0530 Add buffer head related helper functions Add buffer head related helper function bh_uptodate_or_lock and bh_submit_read which can be used by file system Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED] diff --git a/fs/buffer.c b/fs/buffer.c index 7249e01..82aa2db 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3213,6 +3213,53 @@ static int buffer_cpu_notify(struct notifier_block *self, return NOTIFY_OK; } +/** + * bh_uptodate_or_lock: Test whether the buffer is uptodate + * @bh: struct buffer_head + * + * Return true if the buffer is up-to-date and false, + * with the buffer locked, if not. + */ +int bh_uptodate_or_lock(struct buffer_head *bh) +{ + if (!buffer_uptodate(bh)) { + lock_buffer(bh); + if (!buffer_uptodate(bh)) + return 0; + unlock_buffer(bh); + } + return 1; +} +EXPORT_SYMBOL(bh_uptodate_or_lock); + +/** + * bh_submit_read: Submit a locked buffer for reading + * @bh: struct buffer_head + * + * Returns zero on success and -EIO on error.If the input + * buffer is not locked returns -EINVAL + * + */ +int bh_submit_read(struct buffer_head *bh) +{ + if (!buffer_locked(bh)) + return -EINVAL; + + if (buffer_uptodate(bh)) { + unlock_buffer(bh); + return 0; + } + + get_bh(bh); + bh-b_end_io = end_buffer_read_sync; + submit_bh(READ, bh); + wait_on_buffer(bh); + if (buffer_uptodate(bh)) + return 0; + return -EIO; +} +EXPORT_SYMBOL(bh_submit_read); + void __init buffer_init(void) { int nrpages; diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index da0d83f..e98801f 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -192,6 +192,8 @@ int sync_dirty_buffer(struct buffer_head *bh); int submit_bh(int, struct buffer_head *); void write_boundary_block(struct block_device *bdev, sector_t bblock, unsigned blocksize); +int bh_uptodate_or_lock(struct buffer_head *bh); +int bh_submit_read(struct buffer_head *bh); extern int buffer_heads_over_limit; - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch]
On Wed, 23 Jan 2008 04:12:16 -0500 Abhishek Rai [EMAIL PROTECTED] wrote: I'm wondering about the interaction between this code and the buffer_boundary() logic. I guess we should disable the buffer_boundary() handling when this code is in effect. Have you reviewed and tested that aspect? Thanks for pointing this out, I had totally missed this issue in my change. I've now made the call to set_buffer_boundary() in ext3_get_blocks_handle() subject to metacluster option being set. Did it make any performance difference? iirc the buffer_boundary stuff was worth around 10% on a single linear read of a large, well-laid-out file. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 41/49] ext4: Add multi block allocator for ext4
On Wed, Jan 23, 2008 at 02:07:27PM -0800, Andrew Morton wrote: On Mon, 21 Jan 2008 22:02:20 -0500 Theodore Ts'o [EMAIL PROTECTED] wrote: From: Alex Tomas [EMAIL PROTECTED] Signed-off-by: Alex Tomas [EMAIL PROTECTED] Signed-off-by: Andreas Dilger [EMAIL PROTECTED] Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED] Signed-off-by: Eric Sandeen [EMAIL PROTECTED] Signed-off-by: Theodore Ts'o [EMAIL PROTECTED] ... +#if BITS_PER_LONG == 64 +#define mb_correct_addr_and_bit(bit, addr) \ +{ \ + bit += ((unsigned long) addr 7UL) 3; \ + addr = (void *) ((unsigned long) addr ~7UL); \ +} +#elif BITS_PER_LONG == 32 +#define mb_correct_addr_and_bit(bit, addr) \ +{ \ + bit += ((unsigned long) addr 3UL) 3; \ + addr = (void *) ((unsigned long) addr ~3UL); \ +} +#else +#error how many bits you are?! +#endif Why do these exist? Initial version on mballoc supported on x86 32 this was there to give compile warning on 64 bit platform. I guess we can remove that now. Or may be we can keep it as such because it is harmless. +static inline int mb_test_bit(int bit, void *addr) +{ + mb_correct_addr_and_bit(bit, addr); + return ext4_test_bit(bit, addr); +} ext2_test_bit() already handles bitnum wordsize. If mb_correct_addr_and_bit() is actually needed then some suitable comment would help. ext4_test_bit on powerpc needs the addr to be 8 byte aligned. Othewise it fails +static inline void mb_set_bit(int bit, void *addr) +{ + mb_correct_addr_and_bit(bit, addr); + ext4_set_bit(bit, addr); +} + +static inline void mb_set_bit_atomic(spinlock_t *lock, int bit, void *addr) +{ + mb_correct_addr_and_bit(bit, addr); + ext4_set_bit_atomic(lock, bit, addr); +} + +static inline void mb_clear_bit(int bit, void *addr) +{ + mb_correct_addr_and_bit(bit, addr); + ext4_clear_bit(bit, addr); +} + +static inline void mb_clear_bit_atomic(spinlock_t *lock, int bit, void *addr) +{ + mb_correct_addr_and_bit(bit, addr); + ext4_clear_bit_atomic(lock, bit, addr); +} + +static inline void *mb_find_buddy(struct ext4_buddy *e4b, int order, int *max) uninlining this will save about eighty squigabytes of text. Fixed Please review all of ext4/jbd2 with a view to removig unnecessary and wrong inlings. +{ + char *bb; + + /* FIXME!! is this needed */ + BUG_ON(EXT4_MB_BITMAP(e4b) == EXT4_MB_BUDDY(e4b)); + BUG_ON(max == NULL); + + if (order e4b-bd_blkbits + 1) { + *max = 0; + return NULL; + } + + /* at order 0 we see each particular block */ + *max = 1 (e4b-bd_blkbits + 3); + if (order == 0) + return EXT4_MB_BITMAP(e4b); + + bb = EXT4_MB_BUDDY(e4b) + EXT4_SB(e4b-bd_sb)-s_mb_offsets[order]; + *max = EXT4_SB(e4b-bd_sb)-s_mb_maxs[order]; + + return bb; +} + ... +#else +#define mb_free_blocks_double(a, b, c, d) +#define mb_mark_used_double(a, b, c) +#define mb_cmp_bitmaps(a, b) +#endif Please use the do{}while(0) thing. Or, better, proper C functions which have typechecking (unless this will cause undefined-var compile errors, which happens sometimes) makde static inline void. +/* find most significant bit */ +static int fmsb(unsigned short word) +{ + int order; + + if (word 255) { + order = 7; + word = 8; + } else { + order = -1; + } + + do { + order++; + word = 1; + } while (word != 0); + + return order; +} Did we just reinvent fls()? replaced by fls. +/* FIXME!! need more doc */ +static void ext4_mb_mark_free_simple(struct super_block *sb, + void *buddy, unsigned first, int len, + struct ext4_group_info *grp) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + unsigned short min; + unsigned short max; + unsigned short chunk; + unsigned short border; + + BUG_ON(len = EXT4_BLOCKS_PER_GROUP(sb)); + + border = 2 sb-s_blocksize_bits; Won't this explode with = 32k blocksize? + while (len 0) { + /* find how many blocks can be covered since this position */ + max = ffs(first | border) - 1; + + /* find how many blocks of power 2 we need to mark */ + min = fmsb(len); + + if (max min) + min = max; + chunk = 1 min; + + /* mark multiblock chunks only */ + grp-bb_counters[min]++; + if (min 0) + mb_clear_bit(first min, +buddy + sbi-s_mb_offsets[min]); + + len -= chunk; + first += chunk; + } +} + ... +static