On 1/30/19 10:07 PM, David Sterba wrote:
On Wed, Jan 30, 2019 at 02:45:00PM +0800, Anand Jain wrote:
v3->v4: Fix list corruption as reported by btrfs/073 by David.
[1]
https://patchwork.kernel.org/patch/10705741/
Which I was able to reproduce with an instrumented kernel but not with
btrfs/073.
In v3 patch, it releases the fs_info::scrub_lock to destroy the work
queue
which raced with new scrub requests, overwriting the scrub workers
pointers. So in v4, it kills the function scrub_workers_put(), and
performs the destroy_workqueue in two stages, with worker pointers
copied locally.
@@ -3932,9 +3925,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64
devid, u64 start,
mutex_lock(&fs_info->scrub_lock);
dev->scrub_ctx = NULL;
- scrub_workers_put(fs_info);
+ if (--fs_info->scrub_workers_refcnt == 0) {
+ scrub_workers = fs_info->scrub_workers;
+ scrub_wr_comp = fs_info->scrub_wr_completion_workers;
+ scrub_parity = fs_info->scrub_parity_workers;
+ }
mutex_unlock(&fs_info->scrub_lock);
+ btrfs_destroy_workqueue(scrub_workers);
+ btrfs_destroy_workqueue(scrub_wr_comp);
+ btrfs_destroy_workqueue(scrub_parity);
https://lore.kernel.org/linux-btrfs/1543554924-17397-2-git-send-email-anand.j...@oracle.com/
Comparing to the previous version, it's almost the same I think. If
scrub_workers_get races between the unlock and destroy_workers, anything
that uses fs_info->scrub_wokers will soon use freed memory.
The difference is that the worker pointers are read from fs_info under a
lock but are still used outside. I haven't tested this version but from
the analysis of previous crash, I don't see how v4 is supposed to be
better.
Consider v3 code as below:
When process-A is at [1] (below) start another
btrfs scrub start, lets call it process-B.
When process-A is at [1] it unlocks the fs_info::scrub_lock so the
process-B can overwrite fs_info::scrub_workers,
fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers
which the process-A at [1] has not yet called destroyed.
Process-A
---------
btrfs scrub start /mnt
::
mutex_lock(&fs_info->scrub_lock);
::
if (dev->scrub_ctx ||
(!is_dev_replace &&
btrfs_dev_replace_is_ongoing(&fs_info->dev_replace))) {
up_read(&fs_info->dev_replace.rwsem);
mutex_unlock(&fs_info->scrub_lock);
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
ret = -EINPROGRESS;
goto out_free_ctx;
}
::
ret = scrub_workers_get(fs_info, is_dev_replace); <-- [2]
::
dev->scrub_ctx = sctx;
mutex_unlock(&fs_info->scrub_lock);
::
ret = scrub_enumerate_chunks(sctx, dev, start, end);
::
atomic_dec(&fs_info->scrubs_running);
::
mutex_lock(&fs_info->scrub_lock);
dev->scrub_ctx = NULL;
scrub_workers_put(fs_info);
mutex_unlock(&fs_info->scrub_lock);
static noinline_for_stack void scrub_workers_put(struct btrfs_fs_info
*fs_info)
{
lockdep_assert_held(&fs_info->scrub_lock);
if (--fs_info->scrub_workers_refcnt == 0) {
mutex_unlock(&fs_info->scrub_lock);
<wait for process-B> [1]
btrfs_destroy_workqueue(fs_info->scrub_workers);
btrfs_destroy_workqueue(fs_info->scrub_wr_completion_workers);
btrfs_destroy_workqueue(fs_info->scrub_parity_workers);
mutex_lock(&fs_info->scrub_lock);
}
WARN_ON(fs_info->scrub_workers_refcnt < 0);
}
Process-B
---------
Start when process-A is at [1] (above)
btrfs scrub start /mnt
::
at [2] (above) the fs_info::scrub_workers,
fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers
of process-A are overwritten.
So in v4.
--------
Similar to dev::scrub_ctx the fs_info::scrub_workers,
fs_info::scrub_wr_completion_workers, fs_info::scrub_parity_workers
are stored locally before fs_info::scrub_lock is released, so the
list pointers aren't corrupted.
Hope this clarifies.
Thanks, Anand