Previously in dax_writeback_one() we cleared the PAGECACHE_TAG_TOWRITE flag
before we had actually flushed the tagged radix tree entry to media.  This
is incorrect because of the following race:

Thread 1                                Thread 2
--------                                --------
dax_writeback_mapping_range()
tag entry with PAGECACHE_TAG_TOWRITE
                                        dax_writeback_mapping_range()
                                        tag entry with PAGECACHE_TAG_TOWRITE
                                        dax_writeback_one()
                                        radix_tree_tag_clear(TOWRITE)
TOWRITE flag is no longer set,
  find_get_entries_tag() finds no
  entries, return
                                        flush entry to media

In this case thread 1 returns before the data for the dirty entry is
actually durable on media.

Fix this by only clearing the PAGECACHE_TAG_TOWRITE flag after all flushing
is complete.

Signed-off-by: Ross Zwisler <ross.zwis...@linux.intel.com>
Reported-by: Jan Kara <j...@suse.cz>
---
 fs/dax.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index cee9e1b..d589113 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -407,8 +407,6 @@ static int dax_writeback_one(struct block_device *bdev,
        if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
                goto unlock;
 
-       radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
-
        if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) {
                ret = -EIO;
                goto unlock;
@@ -432,6 +430,10 @@ static int dax_writeback_one(struct block_device *bdev,
        }
 
        wb_cache_pmem(dax.addr, dax.size);
+
+       spin_lock_irq(&mapping->tree_lock);
+       radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
+       spin_unlock_irq(&mapping->tree_lock);
  unmap:
        dax_unmap_atomic(bdev, &dax);
        return ret;
-- 
2.5.0

Reply via email to