I'm sorry I haven't responded sooner.

Andrew Morton wrote:
> Sorry, I broke it.  I should have read the comment:
> 
>                 /*
>                  * BUGBUG - Should we call filemap_fdatawrite here instead
>                  * of fsync_inode_data?
>                  * If we do, we have a deadlock condition since we may end
>                  * up recursively calling jfs_get_block with the IWRITELOCK
>                  * held.  We may be able to do away with IWRITELOCK while
>                  * committing transactions and use i_sem instead.
>                  */
>                 if ((!S_ISDIR(ip->i_mode))
>                     && (tblk->flag & COMMIT_DELETE) == 0) {
>                         filemap_fdatawait(ip->i_mapping);
>                         filemap_fdatawrite(ip->i_mapping);
>                         filemap_fdatawait(ip->i_mapping);
>                 }
> 
> That's exactly what's happening.  One of those pages must
> be dirty and not mapped to disk.  __block_write_full_page
> is calling into jfs_get_block().
> 

> 
> i_sem was already taken way up in do_truncate(), so that
> approach may not be promising.

I think that's okay.  Now we take the IWRITE_LOCK near the level where 
we're called from the VFS, so we should be able to only grab i_sem in 
the paths that it's not already taken.  I think we will know when to 
take it and when not to.  If we take that approach, I'll have to make 
sure we protect those things that aren't protected by i_sem (in 
particular, anything that jfs_get_block may change).

> 
> One thing which is a little odd.  This page is unmapped,
> and dirty.  __block_write_full_page() is trying to give
> the page a disk mapping so it can be written out.  But
> because the page doesn't have any buffers, the
> fsync_inode_data_buffers() approach in 2.5.7 would not
> have written this page back anyway.  So... does it need
> to be written at all?

The integrity of the file system does not require these pages be 
written.  The rationale is to make sure that when a file grows, that the 
data is written before the metadata is committed in the journal.  This 
way, if the system crashes, and the journal is replayed, a new or larger 
file does not contain data from another user, possibly sensitive data.

Obviously, we're not currently being very selective about which data 
needs to be written when we commit the inode changes. We're erring on 
the safe side and writing any dirty pages (as long as they are mapped!). 
  This is obviously sub-optimal.

> 
> Anyway.  Could someone please suggest a fix?

For right now, we could probably remove this snippet of code altogether 
until we engineer a more efficient way of ordering the data and metadata 
writes.

> 
> Thanks.

Thank you.
Shaggy

-- 
David Kleikamp
IBM Linux Technology Center

_______________________________________________
Jfs-discussion mailing list
[EMAIL PROTECTED]
http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jfs-discussion

Reply via email to