Re: e2fsprogs coverity patch
On Feb 09, 2007 18:11 -0800, Brian D. Behlendorf wrote: > This check is unnecessary since fs_type is guaranteed to be set earlier in the > function. I'm not positive this is the right fix. If we are creating a journal device, we shouldn't be using the normal defaults based on the size of the journal device because this will often be much smaller than the actual filesystem. I'd argue that we set fs_type="journal" earlier when fs_type is being first set, if INCOMPAT_JOURNAL_DEV is set. > Index: e2fsprogs+chaos/misc/mke2fs.c > === > --- e2fsprogs+chaos.orig/misc/mke2fs.c > +++ e2fsprogs+chaos/misc/mke2fs.c > @@ -1310,8 +1310,6 @@ static void PRS(int argc, char *argv[]) > > if (fs_param.s_feature_incompat & > EXT3_FEATURE_INCOMPAT_JOURNAL_DEV) { > - if (!fs_type) > - fs_type = "journal"; > reserved_ratio = 0; > fs_param.s_feature_incompat = EXT3_FEATURE_INCOMPAT_JOURNAL_DEV; > fs_param.s_feature_compat = 0; Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e2fsprogs coverity patch
Hi Ted, Sorry about failing to read the SUBMITTING-PATCHES file, I hadn't noticed it. Yes I'm willing to certify to the Developer's Certification of Origin 1.1 statement, please add the following line to the patches: Signed-off-by: Brian Behlendorf <[EMAIL PROTECTED]> I'll make sure to add the Signed-off-by line in the future. Thanks, Brian > Hi Brian, please see the SUBMITTING-PATCHES file in the e2fsprogs > repository. > > May I assume that you are willing to certify to the Developer's > Certification of Origin 1.1 statement so I can add a: > > Signed-off-by: Brian Behlendorf <[EMAIL PROTECTED]> > > to each of these patches before I check them into the e2fsprogs source > control repository? > > In the future, please add the Signed-off-by: line when submitting > patches assuming that you are willing to certify to the DCO 1.1. > > Thanks, regards, - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1][RFC] EXT34 retry loop issue V(2)
Patch depends on : "[PATCH 0/1][RFC] prepare_write positive return value V(2)" This patch solve ext3/4 retry loop issue. Issue description: What we can do if block_prepare_write fail inside ext3_prepare_write ? a) Stop transaction and do retry if possible, but what happend if reboot comes after journal_force_commit, but before we exhaust all retry attempts and generic_file_buffered_write call trancate? We may get allocated blocks outside i_size. Witch is bad. b) Commit newly allocated blocks. This approach was used after Andrey's patch. But if reboot comes, or error happend, user will be surprised that i_size increased but blocks are zero filed. This is internal write operation state becames visiable to user. Witch is also bad. c) Just return as match bytes as we can deal with rigth now back to caller, and let's caller handles it and than call commit. In this scenario we never stop journal in the midle of some internal fs operation. If reboot comes before commit trunsaction was't closed so it will be droped while journal replay. Only (c) tend to garantie attomic data operation. Also fix some issues introduced by retries-in-ext3_prepare_write-violate-ordering-requirements: i_size may increase after error happend. possible data corruption caused commiting non uptodate bh. Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> - diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index dba6dd2..4c5e9f7 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1154,6 +1154,18 @@ static int do_journal_get_write_access(handle_t *handle, * transaction must include the content of the newly allocated blocks. * This content is expected to be set to zeroes by block_prepare_write(). * 2006/10/14 SAW + * What we can do if some blocks was allocated? + * a) Stop transaction and do retry if possible, but what happend if + *reboot comes after journal_force_commit, but before we exhaust + *all retry attempts and caller call trancate? + *We may get allocated blocks outside i_size. Witch is bad. + * b) Commit newly allocated blocks. And if reboot comes, user will be + *surprised that i_size increased but blocks are zero filed. Witch is + *also bad. + * c) Just return as match bytes as we can deal with rigth now back to + *caller, and if reboot comes trunsaction was't closed so it will + *be droped while journal replay. + * Only (c) tend to garantie attomic data operation. */ static int ext3_prepare_failure(struct file *file, struct page *page, unsigned from, unsigned to) @@ -1186,7 +1198,7 @@ skip: block_start = to; break; } - if (!buffer_mapped(bh)) + if (!buffer_mapped(bh) || !buffer_uptodate(bh)) /* prepare_write failed on this bh */ break; if (ext3_should_journal_data(mapping->host)) { @@ -1204,8 +1216,8 @@ skip: if (block_start <= from) goto skip; - /* commit allocated and zeroed buffers */ - return mapping->a_ops->commit_write(file, page, from, block_start); + /* return number of bytes till last mapped && uptodate block */ + return block_start - from; } static int ext3_prepare_write(struct file *file, struct page *page, @@ -1239,7 +1251,8 @@ retry: failure: ret2 = ext3_prepare_failure(file, page, from, to); - if (ret2 < 0) + if (ret2) + /* ready to partial write, or fatal error */ return ret2; if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries)) goto retry; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 806eee1..da39f80 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1153,6 +1153,18 @@ static int do_journal_get_write_access(handle_t *handle, * transaction must include the content of the newly allocated blocks. * This content is expected to be set to zeroes by block_prepare_write(). * 2006/10/14 SAW + * What we can do if some blocks was allocated? + * a) Stop transaction and do retry if possible, but what happend if + *reboot comes after journal_force_commit, but before we exhaust + *all retry attempts and caller call trancate? + *We may get allocated blocks outside i_size. Witch is bad. + * b) Commit newly allocated blocks. And if reboot comes, user will be + *surprised that i_size increased but blocks are zero filed. Witch is + *also bad. + * c) Just return as match bytes as we can deal with rigth now back to + *caller, and if reboot comes trunsaction was't closed so it will + *be droped while journal replay. + * Only (c) tend to garantie attomic data operation. */ static int ext4_prepare_failure(struct file *file, struct page *page, unsigned from, unsigned to) @@ -1185,7 +1197,7 @@ skip: block_start = to;
[PATCH 0/1][RFC] prepare_write positive return value V(2)
Changes from ver(1) - __page_symlink(): In order to be on a safe side add explicit zeroing content before fail (just in case). -do_lo_send_aops(): If prepare_write can't handle total size of bytes requested from loop_dev it is safer to fail. - pipe_to_file(): Limit the size of the copy to size wich prepare_write ready to handle. instead of failing. - ecryptfs: was't changed in this version, because where is no even AOP_TRUNCATED_PAGE handling code where. Almost all read/write operation handles data with chunks(segments or pages) and result has integral behaviour for folowing scenario: for_each_chunk() { res = op(); if(IS_ERROR(res)) return progress ? progress : res; progress += res; } prepare_write may has integral behaviour in case of blksize < pgsize, for example we successfully allocated/read some blocks, but not all of them, and than some error happend. Currently we eliminate this progress by doing vmtrunate() after prepare_has failed. It is good to have ability to signal about this progress. Interprete positive prepare_write() ret code as bytes num that fs ready to handle at this moment. BTH: This approach was used in OpenVZ 2.6.9 kernel in order to make FS with delayed allocation more correct, and its works well. I think not everybody will happy about this, but let's discuss all advantages and disadvantages of this change. fixes not dirrectly connected with proposed prepare_write semantic changes: __page_symlink : If find_or_create_page has failed on second retry attempt function will exit with wrong error code. __generic_cont_expand: Add correct AOP_TRUNCATED_PAGE handling. Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> - diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 6b5b642..dc332dc 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -224,6 +224,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, sector_t IV; unsigned size; int transfer_result; + int partial = 0; IV = ((sector_t)index << (PAGE_CACHE_SHIFT - 9))+(offset >> 9); size = PAGE_CACHE_SIZE - offset; @@ -239,11 +240,20 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, page_cache_release(page); continue; } - goto unlock; + if (ret > 0 && ret < PAGE_CACHE_SIZE) { + /* +* ->prepare_write() can't handle total size of bytes +* requested. In fact this is not likely to happen, +* because we request one block only. +*/ + partial = 1; + size = ret; + } else + goto unlock; } transfer_result = lo_do_transfer(lo, WRITE, page, offset, bvec->bv_page, bv_offs, size, IV); - if (unlikely(transfer_result)) { + if (unlikely(transfer_result || partial)) { char *kaddr; /* @@ -266,7 +276,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, } goto unlock; } - if (unlikely(transfer_result)) + if (unlikely(transfer_result || partial)) goto unlock; bv_offs += size; len -= size; diff --git a/fs/buffer.c b/fs/buffer.c index 1ad674f..ed06560 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2027,13 +2027,20 @@ static int __generic_cont_expand(struct inode *inode, loff_t size, if (size > inode->i_sb->s_maxbytes) goto out; +retry: err = -ENOMEM; page = grab_cache_page(mapping, index); if (!page) goto out; err = mapping->a_ops->prepare_write(NULL, page, offset, offset); + if (err == AOP_TRUNCATED_PAGE) { + page_cache_release(page); + goto retry; + } if (err) { /* +* ->prepare_write() called with from==to and must not +* return positive size of bytes in this case. * ->prepare_write() may have instantiated a few blocks * outside i_size. Trim these off again. */ @@ -2044,6 +2051,10 @@ static int __generic_cont_expand(struct inode *inode, loff_t size, } err = mapping->a_ops->commit_write(NULL, page, offset, offset); + if (err == AOP_TRUNCATED_PAGE) { + page_cache_release(page); + goto retry; + }