On Thu, Oct 03, 2019 at 08:05:42AM -0600, Jens Axboe wrote:
> On 10/3/19 8:01 AM, Chris Mason wrote:
> >
> >
> > On 3 Oct 2019, at 4:41, Gao Xiang wrote:
> >
> >> Hi,
> >>
> >> On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote:
> >>> [cc linux-fsdevel, linux-block, tejun ]
> >>>
> >>> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
> >>>> Hi everyone,
> >>>>
> >>>> Does anyone /else/ see this crash in generic/299 on a V4 filesystem
> >>>> (tho
> >>>> afaict V5 configs crash too) and a 5.4-rc1 kernel? It seems to pop
> >>>> up
> >>>> on generic/299 though only 80% of the time.
> >>>>
> >>
> >> Just a quick glance, I guess there could is a race between (complete
> >> guess):
> >>
> >>
> >> 160 static void finish_writeback_work(struct bdi_writeback *wb,
> >> 161 struct wb_writeback_work *work)
> >> 162 {
> >> 163 struct wb_completion *done = work->done;
> >> 164
> >> 165 if (work->auto_free)
> >> 166 kfree(work);
> >> 167 if (done && atomic_dec_and_test(&done->cnt))
> >>
> >> ^^^ here
> >>
> >> 168 wake_up_all(done->waitq);
> >> 169 }
> >>
> >> since new wake_up_all(done->waitq); is completely on-stack,
> >> if (done && atomic_dec_and_test(&done->cnt))
> >> - wake_up_all(&wb->bdi->wb_waitq);
> >> + wake_up_all(done->waitq);
> >> }
> >>
> >> which could cause use after free if on-stack wb_completion is gone...
> >> (however previous wb->bdi is solid since it is not on-stack)
> >>
> >> see generic on-stack completion which takes a wait_queue spin_lock
> >> between
> >> test and wake_up...
> >>
> >> If I am wrong, ignore me, hmm...
> >
> > It's a good guess ;) Jens should have this queued up already:
> >
> > https://lkml.org/lkml/2019/9/23/972
>
> Yes indeed, it'll go out today or tomorrow for -rc2.
The patch fixes the problems I've been seeing, so:
Tested-by: Darrick J. Wong <[email protected]>
Thank you for taking care of this. :)
--D
> --
> Jens Axboe
>