+ ext-ea-block-reference-count-racing-fix-fix.patch added to -mm tree
The patch titled ext-ea-block-reference-count-racing-fix-fix has been added to the -mm tree. Its filename is ext-ea-block-reference-count-racing-fix-fix.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: ext-ea-block-reference-count-racing-fix-fix From: Andrew Morton <[EMAIL PROTECTED]> Cc: Cc: Andreas Gruenbacher <[EMAIL PROTECTED]> Cc: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/ext4/xattr.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN fs/ext3/xattr.c~ext-ea-block-reference-count-racing-fix-fix fs/ext3/xattr.c diff -puN fs/ext4/xattr.c~ext-ea-block-reference-count-racing-fix-fix fs/ext4/xattr.c --- a/fs/ext4/xattr.c~ext-ea-block-reference-count-racing-fix-fix +++ a/fs/ext4/xattr.c @@ -721,7 +721,7 @@ ext4_xattr_block_set(handle_t *handle, s int offset = (char *)s->here - bs->bh->b_data; unlock_buffer(bs->bh); - journal_release_buffer(handle, bs->bh); + jbd2_journal_release_buffer(handle, bs->bh); if (ce) { mb_cache_entry_release(ce); ce = NULL; _ Patches currently in -mm which might be from [EMAIL PROTECTED] are make-aout-executables-work-again-fix.patch sony-laptop-fix-uninitialised-variable.patch git-agpgart.patch git-cpufreq.patch git-drm.patch git-dvb.patch pvrusb-warning-fix.patch git-input.patch setstream-param-for-psmouse-tweak.patch git-libata-all.patch git-md-accel-vs-md.patch dmaengine-uninline-large-functions.patch git-mips-fixup.patch Fabric7-VIOC-driver-fixes.patch revert-drivers-net-tulip-dmfe-support-basic-carrier-detection.patch dmfe-add-support-for-suspend-resume-fix.patch sis900-warning-fixes.patch git-parisc.patch git-parisc-fixup.patch rm9000-serial-driver-tidy.patch git-pciseg.patch git-s390.patch revert-md-avoid-possible-bug_on-in-md-bitmap-handling-for-git-block.patch git-block-fixup.patch git-unionfs.patch after-before-x86_64-mm-mmconfig-share.patch throttle_vm_writeout-dont-loop-on-gfp_nofs-and-gfp_noio-allocations.patch ext-ea-block-reference-count-racing-fix-fix.patch smaps-add-clear_refs-file-to-clear-reference-fix.patch kvm-add-internal-filesystem-for-generating-inodes-tweak.patch fix-rmmod-read-write-races-in-proc-entries-fix.patch reduce-size-of-task_struct-on-64-bit-machines.patch mm-shrink-parent-dentries-when-shrinking-slab.patch call-cpu_chain-with-cpu_down_failed-if-cpu_down_prepare-failed-vs-reduce-size-of-task_struct-on-64-bit-machines.patch utrace-vs-reduce-size-of-task_struct-on-64-bit-machines.patch local_t-mips-extension-shrink-duplicated-mips-32-64-bits-functions-from-localh-fix.patch linux-kernel-markers-kconfig-menus-fix-4.patch - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
+ ext-ea-block-reference-count-racing-fix.patch added to -mm tree
The patch titled ext[34]" EA block reference count racing fix has been added to the -mm tree. Its filename is ext-ea-block-reference-count-racing-fix.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: ext[34]" EA block reference count racing fix From: Mingming Cao <[EMAIL PROTECTED]> There are race issues around ext[34] xattr block release code. ext[34]_xattr_release_block() checks the reference count of xattr block (h_refcount) and frees that xattr block if it is the last one reference it. Unlike ext2, the check of this counter is unprotected by any lock. ext[34]_xattr_release_block() will free the mb_cache entry before freeing that xattr block. There is a small window between the check for the re h_refcount ==1 and the call to mb_cache_entry_free(). During this small window another inode might find this xattr block from the mbcache and reuse it, racing a refcount updates. The xattr block will later be freed by the first inode without notice other inode is still use it. Later if that block is reallocated as a datablock for other file, then more serious problem might happen. We need put a lock around places checking the refount as well to avoid racing issue. Another place need this kind of protection is in ext3_xattr_block_set(), where it will modify the xattr block content in- the-fly if the refcount is 1 (means it's the only inode reference it). This will also fix another issue: the xattr block may not get freed at all if no lock is to protect the refcount check at the release time. It is possible that the last two inodes could release the shared xattr block at the same time. But both of them think they are not the last one so only decreased the h_refcount without freeing xattr block at all. We need to call lock_buffer() after ext3_journal_get_write_access() to avoid deadlock (because the later will call lock_buffer()/unlock_buffer () as well). Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Cc: Andreas Gruenbacher <[EMAIL PROTECTED]> Cc: Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/ext3/xattr.c | 42 ++ fs/ext4/xattr.c | 41 + 2 files changed, 51 insertions(+), 32 deletions(-) diff -puN fs/ext3/xattr.c~ext-ea-block-reference-count-racing-fix fs/ext3/xattr.c --- a/fs/ext3/xattr.c~ext-ea-block-reference-count-racing-fix +++ a/fs/ext3/xattr.c @@ -475,8 +475,15 @@ ext3_xattr_release_block(handle_t *handl struct buffer_head *bh) { struct mb_cache_entry *ce = NULL; + int error = 0; ce = mb_cache_entry_get(ext3_xattr_cache, bh->b_bdev, bh->b_blocknr); + error = ext3_journal_get_write_access(handle, bh); + if (error) +goto out; + + lock_buffer(bh); + if (BHDR(bh)->h_refcount == cpu_to_le32(1)) { ea_bdebug(bh, "refcount now=0; freeing"); if (ce) @@ -485,21 +492,20 @@ ext3_xattr_release_block(handle_t *handl get_bh(bh); ext3_forget(handle, 1, inode, bh, bh->b_blocknr); } else { - if (ext3_journal_get_write_access(handle, bh) == 0) { - lock_buffer(bh); - BHDR(bh)->h_refcount = cpu_to_le32( + BHDR(bh)->h_refcount = cpu_to_le32( le32_to_cpu(BHDR(bh)->h_refcount) - 1); - ext3_journal_dirty_metadata(handle, bh); - if (IS_SYNC(inode)) - handle->h_sync = 1; - DQUOT_FREE_BLOCK(inode, 1); - unlock_buffer(bh); - ea_bdebug(bh, "refcount now=%d; releasing", - le32_to_cpu(BHDR(bh)->h_refcount)); - } + error = ext3_journal_dirty_metadata(handle, bh); + handle->h_sync = 1; + DQUOT_FREE_BLOCK(inode, 1); + ea_bdebug(bh, "refcount now=%d; releasing", + le32_to_cpu(BHDR(bh)->h_refcount)); if (ce) mb_cache_entry_release(ce); } + unlock_buffer(bh); +out: + ext3_std_error(inode->i_sb, error); + return; } struct ext3_xattr_info { @@ -675,7 +681,7 @@ ext3_xattr_block_set(handle_t *handle, s struct buffer_head *new_bh = NULL; struct ext3_xattr_search *s = &bs->s; struct mb_cache_entry *ce = NULL; - int error; + int error = 0; #define header(x) ((struct ext3_xattr_header *)(x)) @@ -684,16 +690,17 @@ ext3_xattr_block_set(handle_t *handle, s if (s->base) { ce = mb_cache_entry_get(ext3_xattr_cache, bs->bh->b_bdev, bs->bh->b_
[PATCH] ext[34] EA block reference count racing fix
There are race issues around ext[34] xattr block release code. ext[34]_xattr_release_block() checks the reference count of xattr block (h_refcount) and frees that xattr block if it is the last one reference it. Unlike ext2, the check of this counter is unprotected by any lock. ext[34]_xattr_release_block() will free the mb_cache entry before freeing that xattr block. There is a small window between the check for the re h_refcount ==1 and the call to mb_cache_entry_free(). During this small window another inode might find this xattr block from the mbcache and reuse it, racing a refcount updates. The xattr block will later be freed by the first inode without notice other inode is still use it. Later if that block is reallocated as a datablock for other file, then more serious problem might happen. We need put a lock around places checking the refount as well to avoid racing issue. Another place need this kind of protection is in ext3_xattr_block_set(), where it will modify the xattr block content in- the-fly if the refcount is 1 (means it's the only inode reference it). This will also fix another issue: the xattr block may not get freed at all if no lock is to protect the refcount check at the release time. It is possible that the last two inodes could release the shared xattr block at the same time. But both of them think they are not the last one so only decreased the h_refcount without freeing xattr block at all. We need to call lock_buffer() after ext3_journal_get_write_access() to avoid deadlock (because the later will call lock_buffer()/unlock_buffer () as well). Signed-Off-By: Mingming Cao <[EMAIL PROTECTED]> --- linux-2.6.19-ming/fs/ext3/xattr.c | 42 +++--- linux-2.6.19-ming/fs/ext4/xattr.c | 41 ++--- 2 files changed, 51 insertions(+), 32 deletions(-) diff -puN fs/ext3/xattr.c~ext34_xattr_release_block_race_fix fs/ext3/xattr.c --- linux-2.6.19/fs/ext3/xattr.c~ext34_xattr_release_block_race_fix 2007-02-14 16:40:48.0 -0800 +++ linux-2.6.19-ming/fs/ext3/xattr.c 2007-02-21 11:45:10.0 -0800 @@ -478,8 +478,15 @@ ext3_xattr_release_block(handle_t *handl struct buffer_head *bh) { struct mb_cache_entry *ce = NULL; + int error = 0; ce = mb_cache_entry_get(ext3_xattr_cache, bh->b_bdev, bh->b_blocknr); + error = ext3_journal_get_write_access(handle, bh); + if (error) +goto out; + + lock_buffer(bh); + if (BHDR(bh)->h_refcount == cpu_to_le32(1)) { ea_bdebug(bh, "refcount now=0; freeing"); if (ce) @@ -488,21 +495,20 @@ ext3_xattr_release_block(handle_t *handl get_bh(bh); ext3_forget(handle, 1, inode, bh, bh->b_blocknr); } else { - if (ext3_journal_get_write_access(handle, bh) == 0) { - lock_buffer(bh); - BHDR(bh)->h_refcount = cpu_to_le32( + BHDR(bh)->h_refcount = cpu_to_le32( le32_to_cpu(BHDR(bh)->h_refcount) - 1); - ext3_journal_dirty_metadata(handle, bh); - if (IS_SYNC(inode)) - handle->h_sync = 1; - DQUOT_FREE_BLOCK(inode, 1); - unlock_buffer(bh); - ea_bdebug(bh, "refcount now=%d; releasing", - le32_to_cpu(BHDR(bh)->h_refcount)); - } + error = ext3_journal_dirty_metadata(handle, bh); + handle->h_sync = 1; + DQUOT_FREE_BLOCK(inode, 1); + ea_bdebug(bh, "refcount now=%d; releasing", + le32_to_cpu(BHDR(bh)->h_refcount)); if (ce) mb_cache_entry_release(ce); } + unlock_buffer(bh); +out: + ext3_std_error(inode->i_sb, error); + return; } struct ext3_xattr_info { @@ -678,7 +684,7 @@ ext3_xattr_block_set(handle_t *handle, s struct buffer_head *new_bh = NULL; struct ext3_xattr_search *s = &bs->s; struct mb_cache_entry *ce = NULL; - int error; + int error = 0; #define header(x) ((struct ext3_xattr_header *)(x)) @@ -687,16 +693,17 @@ ext3_xattr_block_set(handle_t *handle, s if (s->base) { ce = mb_cache_entry_get(ext3_xattr_cache, bs->bh->b_bdev, bs->bh->b_blocknr); + error = ext3_journal_get_write_access(handle, bs->bh); + if (error) + goto cleanup; + lock_buffer(bs->bh); + if (header(s->base)->h_refcount == cpu_to_le32(1)) { if (ce) { mb_cache_entry_free(ce); ce = NULL; } ea_bdebug(bs->bh, "modifying in-place"); -
Re: [PATCH] Correction to check_filetype()
Hi Peter, Thanks for pointing out the typo. In the kernel, struct ext3_dir_entry_2 has a 8-bit name_len field followed by 8-bit field for filetype. Whereas in e2fsck, struct ext2_dir_entry has a 16-bit name_len field, the upper 8 bits of which store the filetype. Hence e2fsck masks the upper 8 bits while using name_len. Here is the patch with the change. Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]> Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]> Index: e2fsprogs-1.39/e2fsck/pass2.c === --- e2fsprogs-1.39.orig/e2fsck/pass2.c +++ e2fsprogs-1.39/e2fsck/pass2.c @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc return 1; } - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) { + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) || + ((dirent->name_len & 0xFF) <= 2 && dirent->name[0] == '.' && +(dirent->name[1] == '.' || dirent->name[1] == '\0'))) { should_be = EXT2_FT_DIR; } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map, dirent->inode)) { Thanks, Kalpak Shah. <[EMAIL PROTECTED]> On Wed, 2007-02-21 at 09:49 -0500, Peter Staubach wrote: > Kalpak Shah wrote: > > Hi, > > > > If the mode of a directory gets corrupted, check_filetype() makes wrong > > decisions for all its sub-directories. For example, using debugfs we can > > corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set > > the filetype of all its subdirectories as 6 (filetype for socket). All the > > subdirectories would be moved to lost+found, and in second run of e2fsck > > their filetype would be set back to 2. > > > > By the time we come to check_filetype(), we have already verified the "." > > and ".." entries, so we special case these dirents in check_filetype(). > > > > Please consider for review. > > > > Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]> > > Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]> > > > > > > Index: e2fsprogs-1.39/e2fsck/pass2.c > > === > > --- e2fsprogs-1.39.orig/e2fsck/pass2.c > > +++ e2fsprogs-1.39/e2fsck/pass2.c > > @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc > > return 1; > > } > > > > - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) { > > + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) || > > + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == '.' && > > > > What is the "&& 0xFF" part for? At the very least, it should probably > be "& 0xFF". It doesn't seem like this mask should be needed at all > though, I think. > > Thanx... > >ps > > > +(dirent->name[1] == '.' || dirent->name[1] == '\0'))) { > > should_be = EXT2_FT_DIR; > > } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map, > > dirent->inode)) { > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Correction to check_filetype()
Kalpak Shah wrote: Hi, If the mode of a directory gets corrupted, check_filetype() makes wrong decisions for all its sub-directories. For example, using debugfs we can corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set the filetype of all its subdirectories as 6 (filetype for socket). All the subdirectories would be moved to lost+found, and in second run of e2fsck their filetype would be set back to 2. By the time we come to check_filetype(), we have already verified the "." and ".." entries, so we special case these dirents in check_filetype(). Please consider for review. Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]> Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]> Index: e2fsprogs-1.39/e2fsck/pass2.c === --- e2fsprogs-1.39.orig/e2fsck/pass2.c +++ e2fsprogs-1.39/e2fsck/pass2.c @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc return 1; } - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) { + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) || + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == '.' && What is the "&& 0xFF" part for? At the very least, it should probably be "& 0xFF". It doesn't seem like this mask should be needed at all though, I think. Thanx... ps +(dirent->name[1] == '.' || dirent->name[1] == '\0'))) { should_be = EXT2_FT_DIR; } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map, dirent->inode)) { - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ext3 not writing inode flags
Hello, I've noticed that if e. g. S_IMMUTABLE flag is set on an inode, ext3 does not propagate it to its inode flags on write. This also leads to the situation that lsattr reports no flag set but the file is in fact immutable. Would a fix be accepted? Actually, the quota code is the only place where S_IMMUTABLE flag is set to an inode so this is not a big issue but still... Honza -- Jan Kara <[EMAIL PROTECTED]> SuSE CR Labs - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Correction to check_filetype()
Hi, The other issue this brings up is maybe pass1 should be checking whether it is the inode mode that is corrupted (by trying to verify block[0] has "." and ".."in it) instead of truncating off those blocks. This would actually be a redundant check as pass2 would also make the same check. We can have some more checks like if i_blocks!=0 then the file may not be special file, etc. Does e2fsck need to have more such checks to avoid making decisions just based on the mode? Thanks, Kalpak. On Wed, 2007-02-21 at 14:45 +0530, Kalpak Shah wrote: > Hi, > > If the mode of a directory gets corrupted, check_filetype() makes wrong > decisions for all its sub-directories. For example, using debugfs we can > corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set > the filetype of all its subdirectories as 6 (filetype for socket). All the > subdirectories would be moved to lost+found, and in second run of e2fsck > their filetype would be set back to 2. > > By the time we come to check_filetype(), we have already verified the "." and > ".." entries, so we special case these dirents in check_filetype(). > > Please consider for review. > > Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]> > Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]> > > > Index: e2fsprogs-1.39/e2fsck/pass2.c > === > --- e2fsprogs-1.39.orig/e2fsck/pass2.c > +++ e2fsprogs-1.39/e2fsck/pass2.c > @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc > return 1; > } > > - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) { > + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) || > + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == '.' && > +(dirent->name[1] == '.' || dirent->name[1] == '\0'))) { > should_be = EXT2_FT_DIR; > } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map, > dirent->inode)) { > > > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Correction to check_filetype()
Hi, If the mode of a directory gets corrupted, check_filetype() makes wrong decisions for all its sub-directories. For example, using debugfs we can corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set the filetype of all its subdirectories as 6 (filetype for socket). All the subdirectories would be moved to lost+found, and in second run of e2fsck their filetype would be set back to 2. By the time we come to check_filetype(), we have already verified the "." and ".." entries, so we special case these dirents in check_filetype(). Please consider for review. Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]> Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]> Index: e2fsprogs-1.39/e2fsck/pass2.c === --- e2fsprogs-1.39.orig/e2fsck/pass2.c +++ e2fsprogs-1.39/e2fsck/pass2.c @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc return 1; } - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) { + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) || + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == '.' && +(dirent->name[1] == '.' || dirent->name[1] == '\0'))) { should_be = EXT2_FT_DIR; } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map, dirent->inode)) { - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html