On Tue, Sep 24, 2019 at 3:28 PM Josef Bacik <jo...@toxicpanda.com> wrote:
>
> On Tue, Sep 24, 2019 at 03:23:06PM +0100, Filipe Manana wrote:
> > On Tue, Sep 24, 2019 at 3:19 PM Josef Bacik <jo...@toxicpanda.com> wrote:
> > >
> > > On Tue, Sep 24, 2019 at 03:16:56PM +0100, Filipe Manana wrote:
> > > > On Tue, Sep 24, 2019 at 2:21 PM Josef Bacik <jo...@toxicpanda.com> 
> > > > wrote:
> > > > >
> > > > > On Tue, Sep 24, 2019 at 07:07:41AM -0400, James Harvey wrote:
> > > > > > On Tue, Sep 24, 2019 at 5:58 AM Filipe Manana <fdman...@gmail.com> 
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sun, Sep 15, 2019 at 2:55 PM Filipe Manana 
> > > > > > > <fdman...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Sun, Sep 15, 2019 at 1:46 PM James Harvey 
> > > > > > > > <jamespharve...@gmail.com> wrote:
> > > > > > > > > ...
> > > > > > > > > You'll see they're different looking backtraces than without 
> > > > > > > > > the
> > > > > > > > > patch, so I don't actually know if it's related to the 
> > > > > > > > > original
> > > > > > > > > regression that several others reported or not.
> > > > > > > >
> > > > > > > > It's a different problem.
> > > > > > >
> > > > > > > So the good news is that on upcoming 5.4 the problem can't 
> > > > > > > happen, due
> > > > > > > to a large patch series from Josef regarding space reservation
> > > > > > > handling which, as a side effect, solves that problem and doesn't
> > > > > > > introduce new ones with concurrent fsyncs.
> > > > > > >
> > > > > > > However that's a large patch set which depends on a lot of 
> > > > > > > previous
> > > > > > > cleanups, some of which landed in the 5.3 merge window,
> > > > > > > Backporting all those patches is against the backport policies for
> > > > > > > stable release [1], since many of the dependencies are cleanup 
> > > > > > > patches
> > > > > > > and many are large (well over the 100 lines limit).
> > > > > > >
> > > > > > > On the other it's not possible to send a fix for stable releases 
> > > > > > > that
> > > > > > > doesn't land on Linus' tree first, as there's nothing to fix on 
> > > > > > > the
> > > > > > > current merge window (5.4) since that deadlock can't happen there.
> > > > > > >
> > > > > > > So it seems like a dead end to me.
> > > > > > >
> > > > > > > Fortunately, as you told me privately, you only hit this once and 
> > > > > > > it's
> > > > > > > not a frequent issue for you (unlike the 5.2 regression which
> > > > > > > caused you the hang very often). You can workaround it by 
> > > > > > > mounting the
> > > > > > > fs with "-o notreelog", which makes fsyncs more expensive,
> > > > > > > so you'll likely see some performance degradation for your
> > > > > > > applications (higher latency, less throughput).
> > > > > > >
> > > > > > > [1] 
> > > > > > > https://www.kernel.org/doc/html/v4.15/process/stable-kernel-rules.html
> > > > > >
> > > > > >
> > > > > > All understood, thanks for letting me know.  Not a problem.  I have
> > > > > > still only ran into this crash once, about 9 days ago.  I haven't 
> > > > > > had
> > > > > > another btrfs problem since then, unlike the hourly hangs on 5.2 
> > > > > > with
> > > > > > heavy I/O.
> > > > >
> > > > > We are seeing this crash internally on our testing tier, we're still 
> > > > > running it
> > > > > down but it's pretty elusive.  I'll CC you when we find it and fix 
> > > > > it.  Thanks,
> > > >
> > > > Which crash?
> > > > There are 2 different deadlocks being mentioned in this thread.
> > > >
> > >
> > > The BUG_ON(!PageLocked(page)) crash, we're hunting that guy right now.  
> > > Thanks,
> >
> > I'm confused.
> > Where is that BUG_ON() mentioned in this thread? Only 2 deadlocks are
> > mentioned, neither of them involves page locks nor a BUG_ON().
> >
>
> Fuuuck sorry, that was a different thread, IDK how I got them confused.  Sorry
> about that,

Hehe, no worries. Thanks!

>
> Josef



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

Reply via email to