On Tue, Jan 12, 2021 at 11:17:45AM +0200, Nikolay Borisov wrote:
>
>
> On 11.01.21 г. 23:50 ч., David Sterba wrote:
> > On Mon, Jan 11, 2021 at 10:33:42AM +0200, Nikolay Borisov wrote:
> >> On 8.01.21 г. 18:01 ч., David Sterba wrote:
> >>> On Fri, Dec 18, 2020 at 02:24:20PM -0500, Josef Bacik wrote:
> >>>> @@ -2043,23 +2043,22 @@ int btrfs_commit_transaction(struct
> >>>> btrfs_trans_handle *trans)
> >>>> btrfs_trans_release_metadata(trans);
> >>>> trans->block_rsv = NULL;
> >>>>
> >>>> - /* make a pass through all the delayed refs we have so far
> >>>> - * any runnings procs may add more while we are here
> >>>> - */
> >>>> - ret = btrfs_run_delayed_refs(trans, 0);
> >>>> - if (ret) {
> >>>> - btrfs_end_transaction(trans);
> >>>> - return ret;
> >>>> - }
> >>>> -
> >>>> - cur_trans = trans->transaction;
> >>>> -
> >>>> /*
> >>>> - * set the flushing flag so procs in this transaction have to
> >>>> - * start sending their work down.
> >>>> + * We only want one transaction commit doing the flushing so we
> >>>> do not
> >>>> + * waste a bunch of time on lock contention on the extent root
> >>>> node.
> >>>> */
> >>>> - cur_trans->delayed_refs.flushing = 1;
> >>>> - smp_wmb();
> >>>
> >>> This barrier obviously separates the flushing = 1 and the rest of the
> >>> code, now implemented as test_and_set_bit, which implies full barrier.
> >>>
> >>> However, hunk in btrfs_should_end_transaction removes the barrier and
> >>> I'm not sure whether this is correct:
> >>>
> >>> - smp_mb();
> >>> if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
> >>> - cur_trans->delayed_refs.flushing)
> >>> + test_bit(BTRFS_DELAYED_REFS_FLUSHING,
> >>> + &cur_trans->delayed_refs.flags))
> >>> return true;
> >>>
> >>> This is never called under locks so we don't have complete
> >>> synchronization of neither the transaction state nor the flushing bit.
> >>> btrfs_should_end_transaction is merely a hint and not called in critical
> >>> places so we could probably afford to keep it without a barrier, or keep
> >>> it with comment(s).
> >>
> >> I think the point is moot in this case, because the test_bit either sees
> >> the flag or it doesn't. It's not possible for the flag to be set AND
> >> should_end_transaction return false that would be gross violation of
> >> program correctness.
> >
> > So that's for the flushing part, but what about cur_trans->state?
>
> Looking at the code, the barrier was there to order the publishing of
> the delayed_ref.flushing (now replaced by the bit flag) against
> surrounding code.
>
> So independently of this patch, let's reason about trans state. In
> should_end_transaction it's read without holding any locks. (U)
>
> It's modified in btrfs_cleanup_transaction without holding the
> fs_info->trans_lock (U), but the STATE_ERROR flag is going to be set.
>
> set in cleanup_transaction under fs_info->trans_lock (L)
> set in btrfs_commit_trans to COMMIT_START under fs_info->trans_lock.(L)
> set in btrfs_commit_trans to COMMIT_DOING under fs_info->trans_lock.(L)
> set in btrfs_commit_trans to COMMIT_UNBLOCK under fs_info->trans_lock.(L)
>
> set in btrfs_commit_trans to COMMIT_COMPLETED without locks but at this
> point the transaction is finished and fs_info->running_trans is NULL (U
> but irrelevant).
>
> So by the looks of it we can have a concurrent READ race with a Write,
> due to reads not taking a lock. In this case what we want to ensure is
> we either see new or old state. I consulted with Will Deacon and he said
> that in such a case we'd want to annotate the accesses to ->state with
> (READ|WRITE)_ONCE so as to avoid a theoretical tear, in this case I
> don't think this could happen but I imagine at some point kcsan would
> flag such an access as racy (which it is).
Thanks for the analysis, I've copied it to the changelog as there's
probably no shorter way to explain it.