Re: e2fsprogs coverity patch

2007-02-12 Thread Andreas Dilger
On Feb 09, 2007  18:11 -0800, Brian D. Behlendorf wrote:
> This check is unnecessary since fs_type is guaranteed to be set earlier in the
> function.

I'm not positive this is the right fix.  If we are creating a journal device,
we shouldn't be using the normal defaults based on the size of the journal
device because this will often be much smaller than the actual filesystem.

I'd argue that we set fs_type="journal" earlier when fs_type is being first
set, if INCOMPAT_JOURNAL_DEV is set.

> Index: e2fsprogs+chaos/misc/mke2fs.c
> ===
> --- e2fsprogs+chaos.orig/misc/mke2fs.c
> +++ e2fsprogs+chaos/misc/mke2fs.c
> @@ -1310,8 +1310,6 @@ static void PRS(int argc, char *argv[])
>  
>   if (fs_param.s_feature_incompat & 
>   EXT3_FEATURE_INCOMPAT_JOURNAL_DEV) {
> - if (!fs_type)
> - fs_type = "journal";
>   reserved_ratio = 0;
>   fs_param.s_feature_incompat = EXT3_FEATURE_INCOMPAT_JOURNAL_DEV;
>   fs_param.s_feature_compat = 0;

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsprogs coverity patch

2007-02-12 Thread Brian Behlendorf

Hi Ted,

  Sorry about failing to read the SUBMITTING-PATCHES file, I hadn't noticed
it.  Yes I'm willing to certify to the Developer's Certification of Origin 
1.1 statement, please add the following line to the patches:

Signed-off-by: Brian Behlendorf <[EMAIL PROTECTED]>

  I'll make sure to add the Signed-off-by line in the future.

Thanks,
Brian

> Hi Brian, please see the SUBMITTING-PATCHES file in the e2fsprogs
> repository.
>
> May I assume that you are willing to certify to the Developer's
> Certification of Origin 1.1 statement so I can add a:
>
> Signed-off-by: Brian Behlendorf <[EMAIL PROTECTED]>
>
> to each of these patches before I check them into the e2fsprogs source
> control repository?
>
> In the future, please add the Signed-off-by: line when submitting
> patches assuming that you are willing to certify to the DCO 1.1.
>
> Thanks, regards,
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1][RFC] EXT34 retry loop issue V(2)

2007-02-12 Thread Dmitriy Monakhov

Patch depends on : "[PATCH 0/1][RFC]  prepare_write positive return value V(2)"

This patch solve ext3/4 retry loop issue.
Issue description:
What we can do if block_prepare_write fail inside ext3_prepare_write ?
a) Stop transaction and do retry if possible, but what happend if 
   reboot comes after journal_force_commit, but before we exhaust
   all retry attempts and generic_file_buffered_write call trancate? 
   We may get allocated blocks outside i_size. Witch is bad.

b) Commit newly allocated blocks. This approach was used after Andrey's patch.
   But if reboot comes, or error happend, user will be surprised that i_size
   increased but blocks are zero filed. This is internal write operation state
   becames visiable to user. Witch is also bad. 
   
c) Just return as match bytes as we can deal with rigth now back to 
   caller, and let's caller handles it and than call commit. In this scenario
   we never stop journal in the midle of some internal fs operation.
   If reboot comes before commit trunsaction was't closed so it will 
   be droped while journal replay.

Only (c) tend to garantie attomic data operation.

Also fix some issues introduced by 
retries-in-ext3_prepare_write-violate-ordering-requirements:
  i_size may increase after error happend.
  possible data corruption caused commiting non uptodate bh.

Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>
-
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index dba6dd2..4c5e9f7 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1154,6 +1154,18 @@ static int do_journal_get_write_access(handle_t *handle,
  * transaction must include the content of the newly allocated blocks.
  * This content is expected to be set to zeroes by block_prepare_write().
  * 2006/10/14  SAW
+ * What we can do if some blocks was allocated?
+ * a) Stop transaction and do retry if possible, but what happend if
+ *reboot comes after journal_force_commit, but before we exhaust
+ *all retry attempts and caller call trancate?
+ *We may get allocated blocks outside i_size. Witch is bad.
+ * b) Commit newly allocated blocks. And if reboot comes, user will be
+ *surprised that i_size increased but blocks are zero filed. Witch is
+ *also bad.
+ * c) Just return as match bytes as we can deal with rigth now back to
+ *caller, and if reboot comes trunsaction was't closed so it will
+ *be droped while journal replay.
+ * Only (c) tend to garantie attomic data operation.
  */
 static int ext3_prepare_failure(struct file *file, struct page *page,
unsigned from, unsigned to)
@@ -1186,7 +1198,7 @@ skip:
block_start = to;
break;
}
-   if (!buffer_mapped(bh))
+   if (!buffer_mapped(bh) || !buffer_uptodate(bh))
/* prepare_write failed on this bh */
break;
if (ext3_should_journal_data(mapping->host)) {
@@ -1204,8 +1216,8 @@ skip:
if (block_start <= from)
goto skip;
 
-   /* commit allocated and zeroed buffers */
-   return mapping->a_ops->commit_write(file, page, from, block_start);
+   /* return number of bytes till last mapped && uptodate block */
+   return block_start - from;
 }
 
 static int ext3_prepare_write(struct file *file, struct page *page,
@@ -1239,7 +1251,8 @@ retry:
 
 failure:
ret2 = ext3_prepare_failure(file, page, from, to);
-   if (ret2 < 0)
+   if (ret2)
+   /* ready to partial write, or fatal error */
return ret2;
if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
goto retry;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 806eee1..da39f80 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1153,6 +1153,18 @@ static int do_journal_get_write_access(handle_t *handle,
  * transaction must include the content of the newly allocated blocks.
  * This content is expected to be set to zeroes by block_prepare_write().
  * 2006/10/14  SAW
+ * What we can do if some blocks was allocated?
+ * a) Stop transaction and do retry if possible, but what happend if
+ *reboot comes after journal_force_commit, but before we exhaust
+ *all retry attempts and caller call trancate?
+ *We may get allocated blocks outside i_size. Witch is bad.
+ * b) Commit newly allocated blocks. And if reboot comes, user will be
+ *surprised that i_size increased but blocks are zero filed. Witch is
+ *also bad.
+ * c) Just return as match bytes as we can deal with rigth now back to
+ *caller, and if reboot comes trunsaction was't closed so it will
+ *be droped while journal replay.
+ * Only (c) tend to garantie attomic data operation.
  */
 static int ext4_prepare_failure(struct file *file, struct page *page,
unsigned from, unsigned to)
@@ -1185,7 +1197,7 @@ skip:
block_start = to;
 

[PATCH 0/1][RFC] prepare_write positive return value V(2)

2007-02-12 Thread Dmitriy Monakhov
Changes from ver(1)
 - __page_symlink(): In order to be on a safe side add explicit zeroing content
 before fail (just in case).
 -do_lo_send_aops(): If prepare_write can't handle total size of bytes requested
 from loop_dev it is safer to fail.
 -  pipe_to_file():  Limit the size of the copy to size wich prepare_write ready
 to handle. instead of failing.
 - ecryptfs: was't changed in this version, because where is no even 
 AOP_TRUNCATED_PAGE handling code where.

Almost all read/write operation handles data with chunks(segments or pages)
and result has integral behaviour for folowing scenario: 
for_each_chunk() {
 res = op();
 if(IS_ERROR(res))
   return progress ? progress : res;
 progress += res;
}
prepare_write may has integral behaviour in case of blksize < pgsize,
for example we successfully allocated/read some blocks, but not all of them,
and than some error happend. Currently we eliminate this progress by doing
vmtrunate() after prepare_has failed.
It is good to have ability to signal about this progress. Interprete positive
prepare_write() ret code as bytes num that fs ready to handle at this moment.
 
BTH: This approach was used in OpenVZ 2.6.9 kernel in order to make FS with 
delayed allocation more correct, and its works well.
I think not everybody will happy about this,  but let's discuss all advantages
and disadvantages of this change.

fixes not dirrectly connected with proposed prepare_write semantic changes:
 __page_symlink : If find_or_create_page has failed on second retry attempt
  function will exit with wrong error code.
 __generic_cont_expand: Add correct AOP_TRUNCATED_PAGE handling.

Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>
-
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 6b5b642..dc332dc 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -224,6 +224,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
sector_t IV;
unsigned size;
int transfer_result;
+   int partial = 0;
 
IV = ((sector_t)index << (PAGE_CACHE_SHIFT - 9))+(offset >> 9);
size = PAGE_CACHE_SIZE - offset;
@@ -239,11 +240,20 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
page_cache_release(page);
continue;
}
-   goto unlock;
+   if (ret > 0 && ret < PAGE_CACHE_SIZE) {
+   /* 
+* ->prepare_write() can't handle total size of bytes
+* requested. In fact this is not likely to happen, 
+* because we request one block only.
+*/
+   partial = 1;
+   size = ret;
+   } else
+   goto unlock;
}
transfer_result = lo_do_transfer(lo, WRITE, page, offset,
bvec->bv_page, bv_offs, size, IV);
-   if (unlikely(transfer_result)) {
+   if (unlikely(transfer_result || partial)) {
char *kaddr;
 
/*
@@ -266,7 +276,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
}
goto unlock;
}
-   if (unlikely(transfer_result))
+   if (unlikely(transfer_result || partial))
goto unlock;
bv_offs += size;
len -= size;
diff --git a/fs/buffer.c b/fs/buffer.c
index 1ad674f..ed06560 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2027,13 +2027,20 @@ static int __generic_cont_expand(struct inode *inode, 
loff_t size,
if (size > inode->i_sb->s_maxbytes)
goto out;
 
+retry:
err = -ENOMEM;
page = grab_cache_page(mapping, index);
if (!page)
goto out;
err = mapping->a_ops->prepare_write(NULL, page, offset, offset);
+   if (err == AOP_TRUNCATED_PAGE) {
+   page_cache_release(page);
+   goto retry;
+   }
if (err) {
/*
+* ->prepare_write() called with from==to and must not 
+* return positive size of bytes in this case.
 * ->prepare_write() may have instantiated a few blocks
 * outside i_size.  Trim these off again.
 */
@@ -2044,6 +2051,10 @@ static int __generic_cont_expand(struct inode *inode, 
loff_t size,
}
 
err = mapping->a_ops->commit_write(NULL, page, offset, offset);
+   if (err == AOP_TRUNCATED_PAGE) {
+   page_cache_release(page);
+   goto retry;
+   }