On 21.02.19 г. 16:32 ч., Qu Wenruo wrote:
>
>
> On 2019/2/21 下午10:25, Nikolay Borisov wrote:
>>
>>
>> On 21.02.19 г. 10:22 ч., Qu Wenruo wrote:
>>> There are a lot of error reports complaining about transid error in the
>>> mail list.
>>>
>>> Under most case, the on-disk transid is lower than expected transid.
>>> This may indicate that some tree blocks are not written back to disk
>>> before writing super blocks.
>>>
>>> This patch will add a safe net for developers, by calling
>>> btrfs_write_and_wait_transaction() before setting transaction unblocked
>>> and double check btree_inode and dirty_pages io_tree, to ensure no tree
>>> blocks are still dirty or under writeback.
>>>
>>> Signed-off-by: Qu Wenruo <w...@suse.com>
>>> ---
>>> The reason for RFC is, I'm not sure why we currently call
>>> btrfs_write_and_wait_transaction() after setting transaction UNBLOCKED.
>>>
>>> It looks like an optimization, but I don't see much performance
>>> difference during regression test.
>>>
>>> I hope to move the call before we unblock transaction so we can do such
>>> sanity check for all builds and hope to catch some clue of transid
>>> error.
>>
>> Even current code ensures that all allocated blocks in the current
>> transaction (which is what all those EXTENT_DIRTY extents in the
>> dirty_pages tree ) are written before the new superblocks are.
>>
>> Slight offtopic: In fact instead of playing games with the flags and
>> having an extent_io_tree called dirty_pages o_O it can be replaced with
>> a simple linked list that holds all newly allocated buffers so writing
>> all such buffers will result in simply iterating the list.
>>
>> In any case this patch is buggy, see below on why
>>
>>> ---
>>> fs/btrfs/transaction.c | 38 ++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 38 insertions(+)
>>>
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 4ec2b660d014..30b7ed0bf873 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -2213,6 +2213,44 @@ int btrfs_commit_transaction(struct
>>> btrfs_trans_handle *trans)
>>>
>>> btrfs_trans_release_chunk_metadata(trans);
>>>
>>> + /* Last safenet or developer to catch any unwritten tree blocks */
>>> + if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
>>> + u64 found_start = 0;
>>> + u64 found_end = 0;
>>> +
>>> + ret = btrfs_write_and_wait_transaction(trans);
>>> + if (ret) {
>>> + btrfs_handle_fs_error(fs_info, ret,
>>> + "Error while writing out
>>> transaction");
>>> + mutex_unlock(&fs_info->tree_log_mutex);
>>> + goto scrub_continue;
>>> + }
>>> +
>>> + /* No dirty extent should exist in btree inode */
>>> + ret = test_range_bit(&trans->transaction->dirty_pages, 0,
>>> + (u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
>>
>> Why do you check EXTENT_WRITEBACK, AFAICS that flag is not currently
>> used in the code and should perhahps be deleted? I don't see anything
>> setting it, it's only being checked for (as part of EXTENT_IOBITS).
>>
>> Additionally this check is pointless because
>> btrfs_write_and_wait_transaction calls clear_btree_io_tree which purges
>> the tree.
>
> But we still have BTRFS_I(fs_info->btree_inode)->io_tree, and that's the
> main part of the check.
Okay, but the ->dirty_pages code is pointless. Also checking for
EXTENT_WRITEBACK on the io_tree is also pointless.
>
> I don't really think the dirty_pages is really an important thing
> compared to btree_inode.
> If there is some way to find any dirty pages from an address_space, it
> would be even better.
>
> Thanks,
> Qu
>
>>
>>> + 0, NULL);
>>> + if (ret > 0) {
>>> + WARN(1,
>>> + "dirty_pages not fully written back, start=%llu len=%llu\n",
>>> + found_start, found_end + 1 - found_start);
>>> + ret = -EUCLEAN;
>>> + mutex_unlock(&fs_info->tree_log_mutex);
>>> + goto scrub_continue;
>>> + }
>>> + ret = test_range_bit(&BTRFS_I(fs_info->btree_inode)->io_tree, 0,
>>> + (u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
>>> + 0, NULL);
>>> + if (ret > 0) {
>>> + WARN(1,
>>> + "btree io_tree not fully written back, start=%llu len=%llu\n",
>>> + found_start, found_end + 1 - found_start);
>>> + ret = -EUCLEAN;
>>> + mutex_unlock(&fs_info->tree_log_mutex);
>>> + goto scrub_continue;
>>> + }
>>> + }
>>> +
>>> spin_lock(&fs_info->trans_lock);
>>> cur_trans->state = TRANS_STATE_UNBLOCKED;
>>> fs_info->running_transaction = NULL;
>>>
>