On Thu, Jan 31, 2019 at 02:34:54PM +0800, Anand Jain wrote: > > > On 1/30/19 10:07 PM, David Sterba wrote: > > On Wed, Jan 30, 2019 at 02:45:00PM +0800, Anand Jain wrote: > >> v3->v4: Fix list corruption as reported by btrfs/073 by David. > >> [1] > >> https://patchwork.kernel.org/patch/10705741/ > >> Which I was able to reproduce with an instrumented kernel but not with > >> btrfs/073. > >> In v3 patch, it releases the fs_info::scrub_lock to destroy the > >> work queue > >> which raced with new scrub requests, overwriting the scrub workers > >> pointers. So in v4, it kills the function scrub_workers_put(), and > >> performs the destroy_workqueue in two stages, with worker pointers > >> copied locally. > > > >> @@ -3932,9 +3925,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, > >> u64 devid, u64 start, > >> > >> mutex_lock(&fs_info->scrub_lock); > >> dev->scrub_ctx = NULL; > >> - scrub_workers_put(fs_info); > >> + if (--fs_info->scrub_workers_refcnt == 0) { > >> + scrub_workers = fs_info->scrub_workers; > >> + scrub_wr_comp = fs_info->scrub_wr_completion_workers; > >> + scrub_parity = fs_info->scrub_parity_workers; > >> + } > >> mutex_unlock(&fs_info->scrub_lock); > >> > >> + btrfs_destroy_workqueue(scrub_workers); > >> + btrfs_destroy_workqueue(scrub_wr_comp); > >> + btrfs_destroy_workqueue(scrub_parity); > > > > https://lore.kernel.org/linux-btrfs/1543554924-17397-2-git-send-email-anand.j...@oracle.com/ > > > > Comparing to the previous version, it's almost the same I think. If > > scrub_workers_get races between the unlock and destroy_workers, anything > > that uses fs_info->scrub_wokers will soon use freed memory. > > > > The difference is that the worker pointers are read from fs_info under a > > lock but are still used outside. I haven't tested this version but from > > the analysis of previous crash, I don't see how v4 is supposed to be > > better. > > > > Consider v3 code as below: > > When process-A is at [1] (below) start another > btrfs scrub start, lets call it process-B. > When process-A is at [1] it unlocks the fs_info::scrub_lock so the > process-B can overwrite fs_info::scrub_workers, > fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers > which the process-A at [1] has not yet called destroyed. > > Process-A > --------- > > btrfs scrub start /mnt > > :: > mutex_lock(&fs_info->scrub_lock); > :: > if (dev->scrub_ctx || > (!is_dev_replace && > btrfs_dev_replace_is_ongoing(&fs_info->dev_replace))) { > up_read(&fs_info->dev_replace.rwsem); > mutex_unlock(&fs_info->scrub_lock); > mutex_unlock(&fs_info->fs_devices->device_list_mutex); > ret = -EINPROGRESS; > goto out_free_ctx; > } > :: > ret = scrub_workers_get(fs_info, is_dev_replace); <-- [2] > :: > dev->scrub_ctx = sctx; > mutex_unlock(&fs_info->scrub_lock); > > :: > ret = scrub_enumerate_chunks(sctx, dev, start, end); > :: > atomic_dec(&fs_info->scrubs_running); > :: > > mutex_lock(&fs_info->scrub_lock); > dev->scrub_ctx = NULL; > scrub_workers_put(fs_info); > mutex_unlock(&fs_info->scrub_lock); > > > > static noinline_for_stack void scrub_workers_put(struct btrfs_fs_info > *fs_info) > { > lockdep_assert_held(&fs_info->scrub_lock); > if (--fs_info->scrub_workers_refcnt == 0) { > mutex_unlock(&fs_info->scrub_lock); > > <wait for process-B> [1] > > btrfs_destroy_workqueue(fs_info->scrub_workers); > > btrfs_destroy_workqueue(fs_info->scrub_wr_completion_workers); > btrfs_destroy_workqueue(fs_info->scrub_parity_workers); > mutex_lock(&fs_info->scrub_lock); > } > WARN_ON(fs_info->scrub_workers_refcnt < 0); > } > > > > > Process-B > --------- > Start when process-A is at [1] (above) > btrfs scrub start /mnt > > :: > at [2] (above) the fs_info::scrub_workers, > fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers > of process-A are overwritten. > > > So in v4. > -------- > > Similar to dev::scrub_ctx the fs_info::scrub_workers, > fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers > are stored locally before fs_info::scrub_lock is released, so the > list pointers aren't corrupted. > > Hope this clarifies.
Yes, thanks. I'd like to add some assertions, to make the worker pointers always NULL if the refcount is 0. Ie. initially they're 0 due to kzalloc of fs_info, so the additional part would go to the block. --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3744,17 +3744,20 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info, lockdep_assert_held(&fs_info->scrub_lock); if (refcount_read(&fs_info->scrub_workers_refcnt) == 0) { + ASSERT(fs_info->scrub_workers == NULL); fs_info->scrub_workers = btrfs_alloc_workqueue(fs_info, "scrub", flags, is_dev_replace ? 1 : max_active, 4); if (!fs_info->scrub_workers) goto fail_scrub_workers; + ASSERT(fs_info->scrub_wr_completion_workers == NULL); fs_info->scrub_wr_completion_workers = btrfs_alloc_workqueue(fs_info, "scrubwrc", flags, max_active, 2); if (!fs_info->scrub_wr_completion_workers) goto fail_scrub_wr_completion_workers; + ASSERT(fs_info->scrub_parity_workers == NULL); fs_info->scrub_parity_workers = btrfs_alloc_workqueue(fs_info, "scrubparity", flags, max_active, 2); @@ -3934,6 +3937,10 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, scrub_workers = fs_info->scrub_workers; scrub_wr_comp = fs_info->scrub_wr_completion_workers; scrub_parity = fs_info->scrub_parity_workers; + + fs_info->scrub_workers = NULL; + fs_info->scrub_wr_completion_workers = NULL; + fs_info->scrub_parity_workers = NULL; } mutex_unlock(&fs_info->scrub_lock);