On Wed, Aug 07, 2019 at 10:17:26AM +0300, Nikolay Borisov wrote: > > > On 6.08.19 г. 20:34 ч., Omar Sandoval wrote: > > From: Omar Sandoval <osan...@fb.com> > > > > We hit a the following very strange deadlock on a system with Btrfs on a > > loop device backed by another Btrfs filesystem: > > > > 1. The top (loop device) filesystem queues an async_cow work item from > > cow_file_range_async(). We'll call this work X. > > 2. Worker thread A starts work X (normal_work_helper()). > > 3. Worker thread A executes the ordered work for the top filesystem > > (run_ordered_work()). > > 4. Worker thread A finishes the ordered work for work X and frees X > > (work->ordered_free()). > > 5. Worker thread A executes another ordered work and gets blocked on I/O > > to the bottom filesystem (still in run_ordered_work()). > > 6. Meanwhile, the bottom filesystem allocates and queues an async_cow > > work item which happens to be the recently-freed X. > > 7. The workqueue code sees that X is already being executed by worker > > thread A, so it schedules X to be executed _after_ worker thread A > > finishes (see the find_worker_executing_work() call in > > process_one_work()). > > Isn't the bigger problem that a single run_ordered_work could > potentially run the ordered work for more than one normal work? E.g. > what if btrfs' code is reworked such that run_ordered_work executes > ordered_func for just one work item (the one which called the function > in the first place) ? Wouldn't that also resolve the issue? Correct me > if I'm wrong but it seems silly to have one work item outlive > ordered_free which is what currently happens, right?
We can't always run the ordered work for the normal work because then it wouldn't be ordered :) If work item N completes before item N-1, then we can't run the ordered work for N yet. Then, when N-1 completes, we need to do the ordered work for N-1 _and_ N, which is how we get in this situation.