On Sun, Oct 13, 2019 at 9:46 PM Chris Murphy <li...@colorremedies.com> wrote:
>
> On Sat, Oct 12, 2019 at 5:29 PM James Harvey <jamespharve...@gmail.com> wrote:
> >
> > Was using a temporary BTRFS volume to compile mongodb, which is quite
> > intensive and takes quite a bit of time.  The volume has been
> > deadlocked for about 12 hours.
> >
> > Being a temporary volume, I just used mount without options, so it
> > used the defaults:  rw,relatime,ssd,space_cache,subvolid=5,subvol=/
> >
> > Apologies if upgrading to 5.3.5+ will fix this.  I didn't see
> > discussions of a deadlock looking like this.
>
> I think it's a bug in any case, in particular because its all default
> mount options, but it'd be interesting if any of the following make a
> difference:
>
> - space_cache=v2
> - noatime

Interesting.

This isn't 100% reproducible.  Before my original post, after my
initial deadlock, I tried again and immediately hit another deadlock.
But, yesterday, in response to your email, I tried again still without
"space_cache=v2,noatime" to re-confirm the deadlock.  I had to
re-compile mongodb about 6 times to hit another deadlock.  I was
almost at the point of thinking I wouldn't see it again.

After re-confirming it, I re-created the BTRFS volume to use
"space_cache=v2,noatime" mount options.  It deadlocked during the
first mongodb compilation.  w > sysrq_trigger is a little bit
different.  No trace including "btrfs_sync_log" or
"btrfs_async_reclaim_metadata_space".  Only traces including the
"btrfs_btrfs_async_reclaim_metadata_space".  Viewable here:
http://ix.io/1YGe

Who knows, maybe as a particular volume has more use, it becomes less
likely to deadlock.  IF it is space cache related, maybe as the tree
gets filled out, it becomes less likely?  Or, maybe I'm looking too
much into variance, and just the way the dice rolled was that it
happened on the first retry on the new volume.  My initial deadlock
was right after volume creation, as well.

I'll also mention this is on 32 cores and a Samsung 970 EVO NVMe, and
a multithreaded compilation, so perhaps it requires a pretty high load
to run into this.

Also, as I'm testing some issues with the mongodb compilation process
(upstream always forces debug symbols...), as a workaround to be able
to test its issues, I've used a temporary ext4 volume for it, which I
haven't had a single issue with.

Reply via email to