On 11/05/2014 02:56 PM, Filipe Manana wrote:
We have a race while deleting unused block groups that causes extents written
by past generations/transactions to be rewritten by the current transaction
before that transaction is committed. The steps that lead to this issue:

1) At transaction N one or more block groups became unused and we added them
    to the list fs_info->unused_bgs;

2) While still at transaction N we write btree extents to block group X and the
    transaction is committed;

3) The cleaner kthread is awaken and calls btrfs_delete_unused_bgs() to go 
through
    the list fs_info->unused_bgs and remove unused block groups;

4) Transaction N + 1 starts;

5) At transaction N + 1, block group X becomes unused and is added to the list
    fs_info->unused_bgs - this implies delayed refs were run, so we had the
    following function calls: btrfs_run_delayed_refs() -> __btrfs_free_extent()
    -> update_block_group(). The update_block_group() function grabs the lock
    fs_info->unused_bgs_lock, adds block group X to fs_info->unused_bgs and
    releases that lock;

6) The cleaner kthread, while at btrfs_delete_unused_bgs(), sees block group X
    added by transaction N + 1 because it's doing a loop that finishes only when
    the list fs_info->unused_bgs is empty and locks and unlocks the spinlock
    fs_info->unused_bgs_lock on each iteration. So it deletes the block group
    and its respective chunk is released. Even if it didn't do the lock/unlock
    per iteration, it could still see block group X in the list, because the
    cleaner kthread might call btrfs_delete_unused_bgs() multiple times (for
    example if there are several snapshots to delete);

7) A new block group X' is created for data, and it's associated to the same 
chunk
    that block group X was associated to;


Actually this can't happen, we search the commit root for a free dev extent, so if block group X` get's mapped to a dev extent that was deleted in the same transaction as it was free'd in then that is a different problem.

8) Extents from block group X' are allocated for file data and for example an 
fsync
    makes the file data be effectively written to disk;


Also if a new block group is allocated fsync() will trigger a full transaction commit. So thinking about this more I'm not entirely sure there is actually a problem here. Did you observe this issue? Are you sure it's because of this change and not just exacerbated by it? Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to