[PATCH v3] Btrfs: fix abnormal long waiting in fsync

2014-07-17 Thread Liu Bo
xfstests generic/127 detected this problem.

With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush
data within the passed range.  This is the cause of the above problem,
-- btrfs's fsync has a stage called 'sync log' which will wait for all the
ordered extents it've recorded to finish.

In xfstests/generic/127, with mixed operations such as truncate, fallocate,
punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will
mmap, and then msync.  And I find that msync will wait for quite a long time
(about 20s in my case), thanks to ftrace, it turns out that the previous
fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the
range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants,
there can be some ordered extents created but not getting corresponding pages
flushed, then they're left in memory until we fsync which runs into the
stage 'sync log', and fsync will just wait for the system writeback thread
to flush those pages and get ordered extents finished, so the latency is
inevitable.

This adds a flush similar to btrfs_start_ordered_extent() in
btrfs_wait_logged_extents() to fix that.

Reviewed-by: Miao Xie mi...@cn.fujitsu.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
v3:
   Add a check for IO_DONE flag to avoid unnecessary flush.
v2: 
   Move flush part into btrfs_wait_logged_extents() to get the flush range
   more precise.

 fs/btrfs/ordered-data.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index e12441c..7187b14 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -484,8 +484,19 @@ void btrfs_wait_logged_extents(struct btrfs_root *log, u64 
transid)
   log_list);
list_del_init(ordered-log_list);
spin_unlock_irq(log-log_extents_lock[index]);
+
+   if (!test_bit(BTRFS_ORDERED_IO_DONE, ordered-flags) 
+   !test_bit(BTRFS_ORDERED_DIRECT, ordered-flags)) {
+   struct inode *inode = ordered-inode;
+   u64 start = ordered-file_offset;
+   u64 end = ordered-file_offset + ordered-len - 1;
+
+   WARN_ON(!inode);
+   filemap_fdatawrite_range(inode-i_mapping, start, end);
+   }
wait_event(ordered-wait, test_bit(BTRFS_ORDERED_IO_DONE,
   ordered-flags));
+
btrfs_put_ordered_extent(ordered);
spin_lock_irq(log-log_extents_lock[index]);
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix abnormal long waiting in fsync

2014-07-17 Thread Chris Mason
On 07/17/2014 04:08 AM, Liu Bo wrote:
 xfstests generic/127 detected this problem.
 
 With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only 
 flush
 data within the passed range.  This is the cause of the above problem,
 -- btrfs's fsync has a stage called 'sync log' which will wait for all the
 ordered extents it've recorded to finish.
 
 In xfstests/generic/127, with mixed operations such as truncate, fallocate,
 punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will
 mmap, and then msync.  And I find that msync will wait for quite a long time
 (about 20s in my case), thanks to ftrace, it turns out that the previous
 fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the
 range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants,
 there can be some ordered extents created but not getting corresponding pages
 flushed, then they're left in memory until we fsync which runs into the
 stage 'sync log', and fsync will just wait for the system writeback thread
 to flush those pages and get ordered extents finished, so the latency is
 inevitable.
 
 This adds a flush similar to btrfs_start_ordered_extent() in
 btrfs_wait_logged_extents() to fix that.

I was able to trigger the stalls with plain fsx as well.  Thanks!

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html