Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-06-14 Thread Jan Kara
 On Wed, 2007-06-13 at 13:43 +0200, Nick Piggin wrote:
 ..
   
   5) ext3_write_end:
 Before  write_begin/write_end patch set we have folowing locking
 order:
 stop_journal(handle);
 unlock_page(page);
 But now order is oposite:
 unlock_page(page);
 stop_journal(handle);
 Can we got any race condition now? I'm not sure is it actual problem,
 may be somebody cant describe this.
  
  Can we just change it to the original order? That would seem to be
  safest unless one of the ext3 devs explicitly acks it.
  Sorry, I've missed beginning of this thread. But what problems can
exactly cause this ordering change? ext3_journal_stop has no need to be
protected by the page lock - it can be even better that it's not
protected as it can trigger commit and all that would happen
unnecessarily under page lock...

Honza

-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-06-14 Thread Nick Piggin
On Thu, Jun 14, 2007 at 11:52:49AM +0200, Jan Kara wrote:
  On Wed, 2007-06-13 at 13:43 +0200, Nick Piggin wrote:
  ..

5) ext3_write_end:
Before  write_begin/write_end patch set we have folowing locking
order:
stop_journal(handle);
unlock_page(page);
But now order is oposite:
unlock_page(page);
stop_journal(handle);
Can we got any race condition now? I'm not sure is it actual 
problem,
may be somebody cant describe this.
   
   Can we just change it to the original order? That would seem to be
   safest unless one of the ext3 devs explicitly acks it.
   Sorry, I've missed beginning of this thread. But what problems can
 exactly cause this ordering change? ext3_journal_stop has no need to be
 protected by the page lock - it can be even better that it's not
 protected as it can trigger commit and all that would happen
 unnecessarily under page lock...

Sure, if you think it is safe. I would rather it be done in a
different patch though.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-06-13 Thread Dmitriy Monakhov
On 14:19 Втр 29 Май , [EMAIL PROTECTED] wrote:
 
 The patch titled
  fs: introduce write_begin, write_end, and perform_write aops
 has been added to the -mm tree.  Its filename is
  fs-introduce-write_begin-write_end-and-perform_write-aops.patch
 
 *** Remember to use Documentation/SubmitChecklist when testing your code ***
 
 See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
 out what to do about this
 
 --
 Subject: fs: introduce write_begin, write_end, and perform_write aops
 From: Nick Piggin [EMAIL PROTECTED]
 
 These are intended to replace prepare_write and commit_write with more
 flexible alternatives that are also able to avoid the buffered write
 deadlock problems efficiently (which prepare_write is unable to do).
 
 [EMAIL PROTECTED]: API design contributions, code review and fixes]
 Signed-off-by: Nick Piggin [EMAIL PROTECTED]
 Signed-off-by: Mark Fasheh [EMAIL PROTECTED]
 Signed-off-by: Andrew Morton [EMAIL PROTECTED]
I've finaly find time to review Nick's write_begin/write_end aop patch set.
And i have some fixes and questions. My be i've missed somthing and it was 
already disscussed, but i cant find in LKML.

1) loop dev:
loop.c code itself is not perfect. In fact before Nick's patch
partial write was't possible. Assumption what write chunks are
always page aligned is realy weird ( see index++ line).
Fixed by new aop loop fix patch

2)block_write_begin:
After we enter to block_write_begin with *pagep == NULL and
some page was grabed we remember this page in *pagep
And if __block_prepare_write() we have to clear *pagep , as 
it was before. Because this may confuse caller.
for example caller may have folowing code:
ret = block_write_begin(..., pagep,...)
if (ret  *pagep != NULL) {
unlock_page(*pagep);
page_cache_release(*pagep);
}
Fixed my new aop block_write_begin fix patch

3) __page_symlink:
Nick's patch add folowing code:
+ err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE,
+ AOP_FLAG_UNINTERRUPTIBLE, page,fsdata);
symlink now consume whole page. I have only one question WHY???.
I don't see any advantages, but where are huge list of
dissdvantages:
a) fs with blksize == 1k and pagesize == 16k after this patch
   waste more than 10x times disk space for nothing.
b) What happends if we want use fs with blksize == 4k on i386
   after it was used by ia64 ??? (before this patch it was
   possible).

I dont prepare patch for this because i dont understand issue
witch Nick aimed to fix.

4) iov_iter_fault_in_readable:
Function prerform check for signgle region, with out respect to
segment nature of iovec, For example writev no longer works :) :
writev(3, [{\1, 1}, {\2..., 4096}], 2) = -1 EFAULT (Bad address)
this hidden bug, and it was invisiable because _fault_in_readable
return value was ignored before. Lets iov_iter_fault_in_readable
perform checks for all segments.
Fixed by :iov_iter_fault_in_readable fix

5) ext3_write_end:
Before  write_begin/write_end patch set we have folowing locking
order:
stop_journal(handle);
unlock_page(page);
But now order is oposite:
unlock_page(page);
stop_journal(handle);
Can we got any race condition now? I'm not sure is it actual problem,
may be somebody cant describe this.




-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-06-13 Thread Dmitriy Monakhov
On 13:43 Срд 13 Июн , Nick Piggin wrote:
 On Wed, Jun 13, 2007 at 05:40:05PM +0400, Dmitriy Monakhov wrote:
  On 14:19 ?? 29 ?? , [EMAIL PROTECTED] wrote:
   
   The patch titled
fs: introduce write_begin, write_end, and perform_write aops
   has been added to the -mm tree.  Its filename is
fs-introduce-write_begin-write_end-and-perform_write-aops.patch
   
   *** Remember to use Documentation/SubmitChecklist when testing your code 
   ***
   
   See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to 
   find
   out what to do about this
   
   --
   Subject: fs: introduce write_begin, write_end, and perform_write aops
   From: Nick Piggin [EMAIL PROTECTED]
   
   These are intended to replace prepare_writ eand commit_write with more
   flexible alternatives that are also able to avoid the buffered write
   deadlock problems efficiently (which prepare_write is unable to do).
   
   [EMAIL PROTECTED]: API design contributions, code review and fixes]
   Signed-off-by: Nick Piggin [EMAIL PROTECTED]
   Signed-off-by: Mark Fasheh [EMAIL PROTECTED]
   Signed-off-by: Andrew Morton [EMAIL PROTECTED]
  I've finaly find time to review Nick's write_begin/write_end aop patch 
  set.
  And i have some fixes and questions. My be i've missed somthing and it was 
  already disscussed, but i cant find in LKML.
 
 Thanks, that's really helpful.
 
  
  1) loop dev:
  loop.c code itself is not perfect. In fact before Nick's patch
  partial write was't possible. Assumption what write chunks are
  always page aligned is realy weird ( see index++ line).
  Fixed by new aop loop fix patch
 
 I think you're right, fix looks good.
 
  
  2)block_write_begin:
  After we enter to block_write_begin with *pagep == NULL and
  some page was grabed we remember this page in *pagep
  And if __block_prepare_write() we have to clear *pagep , as 
  it was before. Because this may confuse caller.
  for example caller may have folowing code:
  ret = block_write_begin(..., pagep,...)
  if (ret  *pagep != NULL) {
  unlock_page(*pagep);
  page_cache_release(*pagep);
  }
  Fixed my new aop block_write_begin fix patch
 
 I don't think the caller can rely on that if it returns failure.
 But that is more defensive I guess. Maybe setting it to 1 or
 so would catch abusers.
 
  
  3) __page_symlink:
  Nick's patch add folowing code:
  + err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE,
  + AOP_FLAG_UNINTERRUPTIBLE, page,fsdata);
  symlink now consume whole page. I have only one question WHY???.
  I don't see any advantages, but where are huge list of
  dissdvantages:
  a) fs with blksize == 1k and pagesize == 16k after this patch
 waste more than 10x times disk space for nothing.
  b) What happends if we want use fs with blksize == 4k on i386
 after it was used by ia64 ??? (before this patch it was
 possible).
One more visiable effect caused by wrong symlink size:
fsstress cause folowing error:
LOG BEGIN
EXT3-fs unexpected failure: !buffer_revoked(bh); 
inconsistent data on disk
ext3_forget: aborting transaction: IO failure in __ext3_journal_revoke
ext3_abort called.

EXT3-fs error (device dm-4): ext3_forget: error -5 when attempting
revoke
Remounting filesystem read-only 
Aborting journal on device dm-4.
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_free_blocks_sb: Journal has aborted

journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_truncate: IO failure
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_orphan_del: Journal has aborted
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_delete_inode: IO failure
LOG END

After symlink size was fixed to len-1 problem dissappeared. 
  
  I dont prepare patch for this because i dont understand issue
  witch Nick aimed to fix.
 
 I don't know why myself :P I think it would be just fine to use
 len-1 as it did previously, so it must have been a typo?
 
 
  4) iov_iter_fault_in_readable:
  Function prerform check for signgle region, with out respect to
  segment nature of iovec, For example writev no longer works :) :
  writev(3, [{\1, 1}, {\2..., 4096}], 2) = -1 EFAULT (Bad address)
  this hidden bug, and it was invisiable because _fault_in_readable
  return value was ignored before. Lets 

Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-06-13 Thread Nick Piggin
On Wed, Jun 13, 2007 at 04:07:01PM -0700, Badari Pulavarty wrote:
 On Wed, 2007-06-13 at 13:43 +0200, Nick Piggin wrote:
 ..
   
   5) ext3_write_end:
 Before  write_begin/write_end patch set we have folowing locking
 order:
 stop_journal(handle);
 unlock_page(page);
 But now order is oposite:
 unlock_page(page);
 stop_journal(handle);
 Can we got any race condition now? I'm not sure is it actual problem,
 may be somebody cant describe this.
  
  Can we just change it to the original order? That would seem to be
  safest unless one of the ext3 devs explicitly acks it.
 
 It would be nice to go back to original order, but its not that
 simple with current structure of the code. With Nick's patches
 unlock_page() happens in generic_write_end(). journal_stop() 
 needs to happen after generic_write_end(). :(

Well we could use block_write_end?

 
 Mingming, can you take a look at the current  proposed order ?
 I ran into bunch of races when I tried to change the order for
 -writepages() support earlier :(

OK, it sounds like we probably want to revert to the original
order at least for this patchset. If the new order is proven
safe then that could be introduced later to simplify things...

Thanks,
Nick

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html