Re: [PATCH v5 2/8] btrfs: only let one thread pre-flush delayed refs in commit

David Sterba Wed, 27 Jan 2021 02:15:32 -0800

On Tue, Jan 12, 2021 at 11:17:45AM +0200, Nikolay Borisov wrote:
> 
> 
> On 11.01.21 г. 23:50 ч., David Sterba wrote:
> > On Mon, Jan 11, 2021 at 10:33:42AM +0200, Nikolay Borisov wrote:
> >> On 8.01.21 г. 18:01 ч., David Sterba wrote:
> >>> On Fri, Dec 18, 2020 at 02:24:20PM -0500, Josef Bacik wrote:
> >>>> @@ -2043,23 +2043,22 @@ int btrfs_commit_transaction(struct 
> >>>> btrfs_trans_handle *trans)
> >>>>          btrfs_trans_release_metadata(trans);
> >>>>          trans->block_rsv = NULL;
> >>>>  
> >>>> -        /* make a pass through all the delayed refs we have so far
> >>>> -         * any runnings procs may add more while we are here
> >>>> -         */
> >>>> -        ret = btrfs_run_delayed_refs(trans, 0);
> >>>> -        if (ret) {
> >>>> -                btrfs_end_transaction(trans);
> >>>> -                return ret;
> >>>> -        }
> >>>> -
> >>>> -        cur_trans = trans->transaction;
> >>>> -
> >>>>          /*
> >>>> -         * set the flushing flag so procs in this transaction have to
> >>>> -         * start sending their work down.
> >>>> +         * We only want one transaction commit doing the flushing so we 
> >>>> do not
> >>>> +         * waste a bunch of time on lock contention on the extent root 
> >>>> node.
> >>>>           */
> >>>> -        cur_trans->delayed_refs.flushing = 1;
> >>>> -        smp_wmb();
> >>>
> >>> This barrier obviously separates the flushing = 1 and the rest of the
> >>> code, now implemented as test_and_set_bit, which implies full barrier.
> >>>
> >>> However, hunk in btrfs_should_end_transaction removes the barrier and
> >>> I'm not sure whether this is correct:
> >>>
> >>> - smp_mb();
> >>>   if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
> >>> -     cur_trans->delayed_refs.flushing)
> >>> +     test_bit(BTRFS_DELAYED_REFS_FLUSHING,
> >>> +              &cur_trans->delayed_refs.flags))
> >>>           return true;
> >>>
> >>> This is never called under locks so we don't have complete
> >>> synchronization of neither the transaction state nor the flushing bit.
> >>> btrfs_should_end_transaction is merely a hint and not called in critical
> >>> places so we could probably afford to keep it without a barrier, or keep
> >>> it with comment(s).
> >>
> >> I think the point is moot in this case, because the test_bit either sees
> >> the flag or it doesn't. It's not possible for the flag to be set AND
> >> should_end_transaction return false that would be gross violation of
> >> program correctness.
> > 
> > So that's for the flushing part, but what about cur_trans->state?
> 
> Looking at the code, the barrier was there to order the publishing of
> the delayed_ref.flushing (now replaced by the bit flag) against
> surrounding code.
> 
> So independently of this patch, let's reason about trans state. In
> should_end_transaction it's read without holding any locks. (U)
> 
> It's modified in btrfs_cleanup_transaction without holding the
> fs_info->trans_lock (U), but the STATE_ERROR flag is going to be set.
> 
> set in cleanup_transaction under fs_info->trans_lock (L)
> set in btrfs_commit_trans to COMMIT_START under fs_info->trans_lock.(L)
> set in btrfs_commit_trans to COMMIT_DOING under fs_info->trans_lock.(L)
> set in btrfs_commit_trans to COMMIT_UNBLOCK under fs_info->trans_lock.(L)
> 
> set in btrfs_commit_trans to COMMIT_COMPLETED without locks but at this
> point the transaction is finished and fs_info->running_trans is NULL (U
> but irrelevant).
> 
> So by the looks of it we can have a concurrent READ race with a Write,
> due to reads not taking a lock. In this case what we want to ensure is
> we either see new or old state. I consulted with Will Deacon and he said
> that in such a case we'd want to annotate the accesses to ->state with
> (READ|WRITE)_ONCE so as to avoid a theoretical tear, in this case I
> don't think this could happen but I imagine at some point kcsan would
> flag such an access as racy (which it is).


Thanks for the analysis, I've copied it to the changelog as there's
probably no shorter way to explain it.

Re: [PATCH v5 2/8] btrfs: only let one thread pre-flush delayed refs in commit

Reply via email to