-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/9/15 4:48 PM, Jeff Mahoney wrote:
> On 6/8/15 10:12 AM, Filipe David Manana wrote:
>> On Wed, Jun 3, 2015 at 3:47 PM,  <je...@suse.com> wrote:
>>> From: Jeff Mahoney <je...@suse.com>
>>> 
>>> When we clear the dirty bits in btrfs_delete_unused_bgs for 
>>> extents in the empty block group, it results in 
>>> btrfs_finish_extent_commit being unable to discard the freed 
>>> extents.
>>> 
>>> The block group removal patch added an alternate path to
>>> forget extents other than btrfs_finish_extent_commit.  As a
>>> result, any extents that would be freed when the block group is
>>> removed aren't discarded.  In my test run, with a large copy of
>>> mixed sized files followed by removal, it left nearly 2/3 of
>>> extents undiscarded.
>>> 
>>> To clean up the block groups, we add the removed block group
>>> onto a list that will be discarded after transaction commit.
>>> 
>>> v1->v2: - Fix ordering to ensure that we don't discard extents 
>>> freed in an uncommitted transaction.
>>> 
>>> v2->v3: - Factor out get/put block_group->trimming to ensure
>>> that cleanup always happens at the last reference drop. -
>>> Cleanup the free space cache on the last reference drop. - Use
>>> list_move instead of list_add in case of multiple adds.  We
>>> still issue a warning, but we shouldn't fall over.
>>> 
>>> Signed-off-by: Jeff Mahoney <je...@suse.com>
> 
>> Hi Jeff,
> 
>> The changes look good from eyeballing, and I gave it a try 
>> together with the previous 2 patches in the corresponding series.
>>  However, this patch in the series fails tests btrfs/065 to 
>> btrfs/072.
> 
>> Running the following in a loop until it returns failure ($? !=
>> 0) triggers an apparent corruption issue most of the time:
> 
> Ok, I didn't have my test environment set up to do the multidevice 
> tests by default, and definitely didn't have it set up to do 065, 
> since that requires 5+ devices of identical size. Now that I do
> have that set up, I see this:
> 
> 8,37   2     1442    22.727796582 15400  D   D 419434496 + 8192
> [umoun t]
> 
> 419434496 is the start of /dev/sdc5 and we're blowing away 4 MB.
> That happens to coincide with the system (single) block group.  In
> my testing, I found (IIRC) that system (single) and metadata
> (single) were always in the unused list but were getting skipped by
> the cleaner thread.  It looks like the cleanup block groups on
> close_ctree is misbehaving.  Whatever condition that was causing
> those to be skipped must be unset for 065.  It wasn't a problem
> during normal runtime, but now that it's called in close_ctree,
> it's introducing a problem.  I'll start digging.
> 
> Thanks for the review!

<...>-5472  [014] ....   390.430881: btrfs_chunk_free: root =
3(CHUNK_TREE), offset = 0, size = 4194304, num_stripes = 1,
sub_stripes = 0, type = SYSTEM

That'd do it. If I do a btrfs-debug-tree on the file system without
discard, it works fine because the superblock doesn't get discarded.
That 0-4MB chunk _is_ released and unused.

- -Jeff

> -Jeff
> 
>> root 17:16:48 /home/fdmanana/git/hub/xfstests (up_master_2)> 
>> MKFS_OPTIONS="-O skinny-metadata" MOUNT_OPTIONS="-o discard" 
>> ./check btrfs/065 FSTYP         -- btrfs PLATFORM      -- 
>> Linux/x86_64 debian3 4.1.0-rc5-btrfs-next-10+ MKFS_OPTIONS  --
>> -O skinny-metadata /dev/sdc MOUNT_OPTIONS -- -o discard /dev/sdc 
>> /home/fdmanana/btrfs-tests/scratch_1
> 
>> btrfs/065 130s ... [failed, exit status 1] - output mismatch (see
>>  /home/fdmanana/git/hub/xfstests/results//btrfs/065.out.bad) --- 
>> tests/btrfs/065.out 2014-11-17 20:59:51.282203001 +0000 +++ 
>> /home/fdmanana/git/hub/xfstests/results//btrfs/065.out.bad 
>> 2015-06-07 17:37:40.841293317 +0100 @@ -1,2 +1,3 @@ QA output 
>> created by 065 Silence is golden +_check_btrfs_filesystem: 
>> filesystem on /dev/sdc is inconsistent (see 
>> /home/fdmanana/git/hub/xfstests/results//btrfs/065.full) ...
>> (Run 'diff -u tests/btrfs/065.out 
>> /home/fdmanana/git/hub/xfstests/results//btrfs/065.out.bad'  to 
>> see the entire diff) Ran: btrfs/065 Failures: btrfs/065 Failed 1
>> of 1 tests
> 
>> root 17:37:41 /home/fdmanana/git/hub/xfstests (up_master_2)> 
>> btrfsck /dev/sdc No valid Btrfs found on /dev/sdc Couldn't open 
>> file system root 17:37:43 /home/fdmanana/git/hub/xfstests 
>> (up_master_2)>
> 
> 
>> With only patches 1 and 2 in the series (or without them) this 
>> never happens (well, at least not in ~50 iterations when running 
>> btrfs/065). And I've never seen this before testing this patch.
> 
>> It also makes generic/251 "never" finish apparently - it didn't 
>> finish here after 9h running, and without any of these patches
>> in the series it used to finish in less than half an hour.
> 
>> Maybe the apparent corruption is due to some interaction with 
>> device replace or scrub, I haven't had the time to 
>> analyze/determine what it could be.
> 
>> thanks
> 
> 
> 
> 
>>> --- fs/btrfs/ctree.h            |  3 ++ fs/btrfs/extent-tree.c 
>>> | 69 ++++++++++++++++++++++++++++++++++++++++++--- 
>>> fs/btrfs/free-space-cache.c | 57 
>>> +++++++++++++++++++++---------------- fs/btrfs/super.c |  2 +-
>>> fs/btrfs/transaction.c      |  2 ++ fs/btrfs/transaction.h
>>> |  2 ++ 6 files changed, 106 insertions(+), 29 deletions(-)
>>> 
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 
>>> 6f364e1..780acf1 100644 --- a/fs/btrfs/ctree.h +++ 
>>> b/fs/btrfs/ctree.h @@ -3438,6 +3438,8 @@ int 
>>> btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>>> struct btrfs_root *root, u64 group_start, struct extent_map
>>> *em); void btrfs_delete_unused_bgs(struct btrfs_fs_info
>>> *fs_info); +void btrfs_get_block_group_trimming(struct
>>> btrfs_block_group_cache *cache); +void
>>> btrfs_put_block_group_trimming(struct btrfs_block_group_cache
>>> *cache); void btrfs_create_pending_block_groups(struct
>>> btrfs_trans_handle *trans, struct btrfs_root *root); u64 
>>> btrfs_get_alloc_profile(struct btrfs_root *root, int data); @@ 
>>> -4068,6 +4070,7 @@ __printf(5, 6) void
>>> __btrfs_std_error(struct btrfs_fs_info *fs_info, const char
>>> *function, unsigned int line, int errno, const char *fmt,
>>> ...);
>>> 
>>> +const char *btrfs_decode_error(int errno);
>>> 
>>> void __btrfs_abort_transaction(struct btrfs_trans_handle
>>> *trans, struct btrfs_root *root, const char *function, diff
>>> --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 
>>> fb4dbfc..e53dbe9 100644 --- a/fs/btrfs/extent-tree.c +++ 
>>> b/fs/btrfs/extent-tree.c @@ -6034,20 +6034,19 @@ int 
>>> btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, 
>>> struct btrfs_root *root) { struct btrfs_fs_info *fs_info = 
>>> root->fs_info; +       struct btrfs_block_group_cache 
>>> *block_group, *tmp; +       struct list_head *deleted_bgs;
>>> struct extent_io_tree *unpin; u64 start; u64 end; int ret;
>>> 
>>> -       if (trans->aborted) -               return 0; - if 
>>> (fs_info->pinned_extents == &fs_info->freed_extents[0]) unpin
>>> = &fs_info->freed_extents[1]; else unpin = 
>>> &fs_info->freed_extents[0];
>>> 
>>> -       while (1) { +       while (!trans->aborted) { 
>>> mutex_lock(&fs_info->unused_bg_unpin_mutex); ret = 
>>> find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY, 
>>> NULL); @@ -6066,6 +6065,34 @@ int 
>>> btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, 
>>> cond_resched(); }
>>> 
>>> +       /* +        * Transaction is finished.  We don't need
>>> the lock anymore.  We +        * do need to clean up the block
>>> groups in case of a transaction +        * abort. +        */
>>> + deleted_bgs = &trans->transaction->deleted_bgs; + 
>>> list_for_each_entry_safe(block_group, tmp, deleted_bgs,
>>> bg_list) { +               u64 trimmed = 0; + +
>>> ret = -EROFS; +               if (!trans->aborted) + ret =
>>> btrfs_discard_extent(root, + block_group->key.objectid, + 
>>> block_group->key.offset, + &trimmed); + + 
>>> list_del_init(&block_group->bg_list); + 
>>> btrfs_put_block_group_trimming(block_group); + 
>>> btrfs_put_block_group(block_group); + +               if (ret)
>>> { +                       const char *errstr = 
>>> btrfs_decode_error(ret); + btrfs_warn(fs_info, +
>>> "Discard failed while removing blockgroup: errno=%d %s\n", + 
>>> ret, errstr); +               } +       } + return 0; }
>>> 
>>> @@ -9845,6 +9872,11 @@ int btrfs_remove_block_group(struct 
>>> btrfs_trans_handle *trans, * currently running transaction
>>> might finish and a new one start, * allowing for new block
>>> groups to be created that can reuse the same * physical device
>>> locations unless we take this special care. +        * +
>>> * There may also be an implicit trim operation if the file
>>> system +        * is mounted with -odiscard. The same
>>> protections must remain + * in place until the extents have
>>> been discarded completely when +        * the transaction
>>> commit has completed. */ remove_em = 
>>> (atomic_read(&block_group->trimming) == 0); /* @@ -9917,8 
>>> +9949,10 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
>>> *fs_info) return;
>>> 
>>> spin_lock(&fs_info->unused_bgs_lock); + while 
>>> (!list_empty(&fs_info->unused_bgs)) { u64 start, end; + int
>>> trimming;
>>> 
>>> block_group = list_first_entry(&fs_info->unused_bgs, struct 
>>> btrfs_block_group_cache, @@ -10016,12 +10050,39 @@ void 
>>> btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) 
>>> spin_unlock(&block_group->lock);
>>> spin_unlock(&space_info->lock);
>>> 
>>> +               /* DISCARD can flip during remount */ + 
>>> trimming = btrfs_test_opt(root, DISCARD); + +               /* 
>>> Implicit trim during transaction commit. */ +               if 
>>> (trimming) + btrfs_get_block_group_trimming(block_group); + /*
>>> * Btrfs_remove_chunk will abort the transaction if things go * 
>>> horribly wrong. */ ret = btrfs_remove_chunk(trans, root, 
>>> block_group->key.objectid); + +               if (ret) { + if
>>> (trimming) + btrfs_put_block_group_trimming(block_group); + 
>>> goto end_trans; +               } + +               /* + * If
>>> we're not mounted with -odiscard, we can just forget + * about
>>> this block group. Otherwise we'll need to wait + * until
>>> transaction commit to do the actual discard. + */ +
>>> if (trimming) { + WARN_ON(!list_empty(&block_group->bg_list));
>>> + spin_lock(&trans->transaction->deleted_bgs_lock); + 
>>> list_move(&block_group->bg_list, + 
>>> &trans->transaction->deleted_bgs); + 
>>> spin_unlock(&trans->transaction->deleted_bgs_lock); + 
>>> btrfs_get_block_group(block_group); +               }
>>> end_trans: btrfs_end_transaction(trans, root); next: diff
>>> --git a/fs/btrfs/free-space-cache.c
>>> b/fs/btrfs/free-space-cache.c index 9dbe5b5..c79253e 100644 ---
>>> a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c
>>> @@ -3274,35 +3274,23 @@ next: return ret; }
>>> 
>>> -int btrfs_trim_block_group(struct btrfs_block_group_cache 
>>> *block_group, -                          u64 *trimmed, u64
>>> start, u64 end, u64 minlen) +void
>>> btrfs_get_block_group_trimming(struct btrfs_block_group_cache
>>> *cache) { -       int ret; + atomic_inc(&cache->trimming); +}
>>> 
>>> -       *trimmed = 0; +void
>>> btrfs_put_block_group_trimming(struct btrfs_block_group_cache
>>> *block_group) +{ +       struct extent_map_tree *em_tree; +
>>> struct extent_map *em; + bool cleanup;
>>> 
>>> spin_lock(&block_group->lock); -       if
>>> (block_group->removed) { -
>>> spin_unlock(&block_group->lock); - return 0; -       } -
>>> atomic_inc(&block_group->trimming); + cleanup =
>>> (atomic_dec_and_test(&block_group->trimming) && + 
>>> block_group->removed); spin_unlock(&block_group->lock);
>>> 
>>> -       ret = trim_no_bitmap(block_group, trimmed, start, end, 
>>> minlen); -       if (ret) -               goto out; - -
>>> ret = trim_bitmaps(block_group, trimmed, start, end, minlen);
>>> -out: - spin_lock(&block_group->lock); -       if 
>>> (atomic_dec_and_test(&block_group->trimming) && - 
>>> block_group->removed) { -               struct extent_map_tree 
>>> *em_tree; -               struct extent_map *em; - - 
>>> spin_unlock(&block_group->lock); - +       if (cleanup) { 
>>> lock_chunks(block_group->fs_info->chunk_root); em_tree = 
>>> &block_group->fs_info->mapping_tree.map_tree; 
>>> write_lock(&em_tree->lock); @@ -3326,10 +3314,31 @@ out: *
>>> this block group have left 1 entry each one. Free them. */ 
>>> __btrfs_remove_free_space_cache(block_group->free_space_ctl);
>>> - } else { +       } +} + +int btrfs_trim_block_group(struct 
>>> btrfs_block_group_cache *block_group, + u64 *trimmed, u64
>>> start, u64 end, u64 minlen) +{ +       int ret; + +
>>> *trimmed = 0; + + spin_lock(&block_group->lock); +       if
>>> (block_group->removed) { spin_unlock(&block_group->lock); +
>>> return 0; } + btrfs_get_block_group_trimming(block_group); + 
>>> spin_unlock(&block_group->lock); + +       ret = 
>>> trim_no_bitmap(block_group, trimmed, start, end, minlen); + if
>>> (ret) +               goto out;
>>> 
>>> +       ret = trim_bitmaps(block_group, trimmed, start, end, 
>>> minlen); +out: + btrfs_put_block_group_trimming(block_group);
>>> return ret; }
>>> 
>>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 
>>> 2ccd8d4..a80da03 100644 --- a/fs/btrfs/super.c +++ 
>>> b/fs/btrfs/super.c @@ -69,7 +69,7 @@ static struct 
>>> file_system_type btrfs_fs_type;
>>> 
>>> static int btrfs_remount(struct super_block *sb, int *flags,
>>> char *data);
>>> 
>>> -static const char *btrfs_decode_error(int errno) +const char 
>>> *btrfs_decode_error(int errno) { char *errstr = "unknown";
>>> 
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c 
>>> index 5628e25..2005262 100644 --- a/fs/btrfs/transaction.c +++ 
>>> b/fs/btrfs/transaction.c @@ -256,6 +256,8 @@ loop: 
>>> mutex_init(&cur_trans->cache_write_mutex); 
>>> cur_trans->num_dirty_bgs = 0; 
>>> spin_lock_init(&cur_trans->dirty_bgs_lock); + 
>>> INIT_LIST_HEAD(&cur_trans->deleted_bgs); + 
>>> spin_lock_init(&cur_trans->deleted_bgs_lock); 
>>> list_add_tail(&cur_trans->list, &fs_info->trans_list); 
>>> extent_io_tree_init(&cur_trans->dirty_pages, 
>>> fs_info->btree_inode->i_mapping); diff --git 
>>> a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 
>>> 0b24755..14325f2 100644 --- a/fs/btrfs/transaction.h +++ 
>>> b/fs/btrfs/transaction.h @@ -74,6 +74,8 @@ struct 
>>> btrfs_transaction { */ struct mutex cache_write_mutex;
>>> spinlock_t dirty_bgs_lock; +       struct list_head
>>> deleted_bgs; + spinlock_t deleted_bgs_lock; struct
>>> btrfs_delayed_ref_root delayed_refs; int aborted; int
>>> dirty_bg_run; -- 1.8.4.5
>>> 
>>> -- To unsubscribe from this list: send the line "unsubscribe 
>>> linux-btrfs" in the body of a message to 
>>> majord...@vger.kernel.org More majordomo info at 
>>> http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> 
> 
> -- To unsubscribe from this list: send the line "unsubscribe
> linux-btrfs" in the body of a message to majord...@vger.kernel.org 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQIcBAEBAgAGBQJVd39VAAoJEB57S2MheeWyROkP/3yZKTetC+Ael7NKpQ8TLk6I
usu3EWAz3mkg1Xmd/Isb9HLrCcxFwl4RPE2DORyXpZOF8Vx/2TfwZ9GWeziQsSUl
ArTemnfRS5eZ1Vwh2MiY0ONJrR7OegJwtJ2z2p2w5XjvLj7v/0iBBvBrdWo4Zasl
l4n6othn771D3744sm69WpmUeU+KsdQf8UgSlfpe2vDMounFrPQE91b1HTCrnM/S
w0ccLNju2HWnEqP3GuukFyi3UtylEz8NfhNzWbSyFKlc5Z7LvT5JTU1AcT5U8vSG
G5aTDEZnAuUVx74o7+lL4/kE855vyeHc3Lag6Uc86ZMOjrR8AW2daeLFGhFqxGby
mDPx4wnOBx6kHuZyAFVZk3SXOUdK6jGWIwLnkJQXCNFry8v0pFpt+e+tPrwMYsnp
tBhH9sQeHp9F0U+RPsG18glgpN/xCv9/CEtoOteVZfSlRlv7ktpbYuEITd4y9p0O
x62NTi7oJEqGugsVPV3Y8gsrCW5oQIAA/YaoQFEP5hOqacsS/OV5bbdCvn7qD4Qw
mtF0gNNzvXLy9fIGw/P2NY2fwHHUSVZ21cSBKNxMn2KD9eMrXSnw+WdWeLTfn4v6
Lv7tdOSSd1JCSNUbfLMBXGRKm5E4nmn9idi69R0zyJ61P7fvpqxRyqtxqIpsHups
WKo867u3URagNmYuh6ld
=PZYC
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to