Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-24 Thread Austin S. Hemmelgarn

On 2017-05-23 14:32, Kai Krakow wrote:

Am Tue, 23 May 2017 07:21:33 -0400
schrieb "Austin S. Hemmelgarn" :


On 2017-05-22 22:07, Chris Murphy wrote:

On Mon, May 22, 2017 at 5:57 PM, Marc MERLIN 
wrote:

On Mon, May 22, 2017 at 05:26:25PM -0600, Chris Murphy wrote:

  [...]
  [...]
  [...]


Oh, swap will work, you're sure?
I already have an SSD, if that's good enough, I can give it a
shot.


Yeah although I have no idea how much swap is needed for it to
succeed. I'm not sure what the relationship is to fs metadata chunk
size to btrfs check RAM requirement is; but if it wants all of the
metadata in RAM, then whatever btrfs fi us shows you for metadata
may be a guide (?) for how much memory it's going to want.

I think the in-memory storage is a bit more space efficient than the
on-disk storage, but I'm not certain, and I'm pretty sure it takes up
more space when it's actually repairing things.  If I'm doing the
math correctly, you _may_ need up to 50% _more_ than the total
metadata size for the FS in virtual memory space.


Another possibility is zswap, which still requires a backing device,
but it might be able to limit how much swap to disk is needed if the
data to swap out is highly compressible. *shrug*
  

zswap won't help in that respect, but it might make swapping stuff
back in faster.  It just keeps a compressed copy in memory in
parallel to writing the full copy out to disk, then uses that
compressed copy to swap in instead of going to disk if the copy is
still in memory (but it will discard the compressed copies if memory
gets really low).  In essence, it reduces the impact of swapping when
memory pressure is moderate (the situation for most desktops for
example), but becomes almost useless when you have very high memory
pressure (which is what describes this usage).


Is this really how zswap works?
OK, looking at the documentation, you're correct, and my assumption 
based on the description of the frond-end (frontswap) and how the other 
back-end (the Xen transcendent memory driver) appears to behave was 
wrong. However, given how zswap does behave, I can't see how it would 
ever be useful with the default kernel settings, since without manual 
configuration, the kernel won't try to swap until memory pressure is 
pretty high, at which point zswap won't likely have much impact.


I always thought it acts as a compressed write-back cache in front of
the swap devices. Pages first go to zswap compressed, and later
write-back kicks in and migrates those compressed pages to real swap,
but still compressed. This is done by zswap putting two (or up to three
in modern kernels) compressed pages into one page. It has the downside
of uncompressing all "buddy pages" when only one is needed back in. But
it stays compressed. This also tells me zswap will either achieve
around 1:2 or 1:3 effective compression ratio or none. So it cannot be
compared to how streaming compression works.

OTOH, if the page is reloaded from cache before write-back kicks in, it
will never be written to swap but just uncompressed and discarded from
the cache.

Under high memory pressure it doesn't really work that well due to high
CPU overhead if pages constantly swap out, compress, write, read,
uncompress, swap in... This usually results in very low CPU usage for
processes but high IO and disk wait and high kernel CPU usage. But it
defers memory pressure conditions to a little later in exchange for
more a little more IO usage and more CPU usage. If you have a lot of
inactive memory around, it can make a difference. But it is counter
productive if almost all your memory is active and pressure is high.

So, in this scenario, it probably still doesn't help.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-23 Thread Kai Krakow
Am Tue, 23 May 2017 07:21:33 -0400
schrieb "Austin S. Hemmelgarn" :

> On 2017-05-22 22:07, Chris Murphy wrote:
> > On Mon, May 22, 2017 at 5:57 PM, Marc MERLIN 
> > wrote:  
> >> On Mon, May 22, 2017 at 05:26:25PM -0600, Chris Murphy wrote:  
>  [...]  
>  [...]  
>  [...]  
> >>
> >> Oh, swap will work, you're sure?
> >> I already have an SSD, if that's good enough, I can give it a
> >> shot.  
> >
> > Yeah although I have no idea how much swap is needed for it to
> > succeed. I'm not sure what the relationship is to fs metadata chunk
> > size to btrfs check RAM requirement is; but if it wants all of the
> > metadata in RAM, then whatever btrfs fi us shows you for metadata
> > may be a guide (?) for how much memory it's going to want.  
> I think the in-memory storage is a bit more space efficient than the 
> on-disk storage, but I'm not certain, and I'm pretty sure it takes up 
> more space when it's actually repairing things.  If I'm doing the
> math correctly, you _may_ need up to 50% _more_ than the total
> metadata size for the FS in virtual memory space.
> >
> > Another possibility is zswap, which still requires a backing device,
> > but it might be able to limit how much swap to disk is needed if the
> > data to swap out is highly compressible. *shrug*
> >  
> zswap won't help in that respect, but it might make swapping stuff
> back in faster.  It just keeps a compressed copy in memory in
> parallel to writing the full copy out to disk, then uses that
> compressed copy to swap in instead of going to disk if the copy is
> still in memory (but it will discard the compressed copies if memory
> gets really low).  In essence, it reduces the impact of swapping when
> memory pressure is moderate (the situation for most desktops for
> example), but becomes almost useless when you have very high memory
> pressure (which is what describes this usage).

Is this really how zswap works?

I always thought it acts as a compressed write-back cache in front of
the swap devices. Pages first go to zswap compressed, and later
write-back kicks in and migrates those compressed pages to real swap,
but still compressed. This is done by zswap putting two (or up to three
in modern kernels) compressed pages into one page. It has the downside
of uncompressing all "buddy pages" when only one is needed back in. But
it stays compressed. This also tells me zswap will either achieve
around 1:2 or 1:3 effective compression ratio or none. So it cannot be
compared to how streaming compression works.

OTOH, if the page is reloaded from cache before write-back kicks in, it
will never be written to swap but just uncompressed and discarded from
the cache.

Under high memory pressure it doesn't really work that well due to high
CPU overhead if pages constantly swap out, compress, write, read,
uncompress, swap in... This usually results in very low CPU usage for
processes but high IO and disk wait and high kernel CPU usage. But it
defers memory pressure conditions to a little later in exchange for
more a little more IO usage and more CPU usage. If you have a lot of
inactive memory around, it can make a difference. But it is counter
productive if almost all your memory is active and pressure is high.

So, in this scenario, it probably still doesn't help.


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-23 Thread Marc MERLIN
On Mon, May 22, 2017 at 09:19:34AM +, Duncan wrote:
> btrfs check is userspace, not kernelspace.  The btrfs-transacti threads 

That was my understanding, yes, but since I got it to starve my system,
including in kernel OOM issues I pasted in my last message and just
referenced in https://bugzilla.kernel.org/show_bug.cgi?id=195863 I think
it's not much as black and white as running a userland process that
takes too much RAM and get killed if it does.

> are indeed kernelspace, but the problem would appear to be either IO or 
> memory starvation triggered by the userspace check hogging all available 
> resources, not leaving enough for normal system, including kernel, 
> processes.

Looks like it, but also memory.

> * Keeping the number of snapshots as low as possible is strongly 
> recommended by pretty much everyone here, definitely under 300 per 
> subvolume and if possible, to double-digits per subvolume.

I agree that fewer snapshots is better, but between recovery snapshots
and btrfs snapshots for some amount of subvolumes, things add up :)

gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | wc -l
93
gargamel:/mnt/btrfs_pool2# btrfs subvolume list . | wc -l
103

> * I personally recommend disabling qgroups, unless you're actively 
> working with the devs on improving them.  In addition to the scaling 
> issues, quotas simply aren't reliable enough on btrfs yet to rely on them 
> if the use-case requires them (in which case using a mature filesystem 
> where they're proven to work is recommended), and if it doesn't, there's 
> simply too many remaining issues for the qgroups option to be worth it.
 
I had consider using them at some point for each size of each subvolume
but good to know they're still not ready quite yet.

> * I personally recommend keeping overall filesystem size to something one 
> can reasonably manage.  Most people's use-cases aren't going to allow for 
> an fsck taking days and tens of GiB, but /will/ allow for multi-TB 
> filesystems to be split out into multiple independent filesystems of 
> perhaps a TB or two each, tops, if that's the alternative to multiple-day 
> fscks taking tens of GiB.  (Some use-cases are of course exceptions.)

fsck ran in 6H with bcache, but the lowmem one could take a lot longer.
Running over ndb to another host with more RAM could indeed take days
given the loss of bcache and adding the latency/bandwidth of a
networkg.

> * The low-memory-mode btrfs check is being developed, tho unfortunately 
> it doesn't yet do repairs.  (Another reason is that it's an alternate 
> implementation that provides a very useful second opinion and the ability 
> to cross-check one implementation against the other in hard problem 
> cases.)

True.

> >> Sadly, I tried a scrub on the same device, and it stalled after 6TB.
> >> The scrub process went zombie and the scrub never succeeded, nor could
> >> it be stopped.
> 
> Quite apart from the "... after 6TB" bit setting off my own "it's too big 
> to reasonably manage" alarm, the filesystem obviously is bugged, and 
> scrub as well, since it shouldn't just go zombie regardless of the 
> problem -- it should fail much more gracefully.
 
:)
In this case it's mostly big files, so it's fine metadata wise but takes
a while to scrub (<24H though).

The problem I had is that I copied all of dshelf2 onto dshelf1 while I
blew ds2, and rebuilt it. That extra metadata (many smaller files)
tipped the metadata size of ds1 over the edge.
Once I blew that backup, things became ok again.

> Meanwhile, FWIW, unlike check, scrub /is/ kernelspace.

Correct, just like balance.

> As explained, check is userspace, but as you found, it can still 
> interfere with kernelspace, including unrelated btrfs-transaction 
> threads.  When the system's out of memory, it's out of memory.
 
userspace should not take the entire system down without the OOM killer
even firing.
Also, is the logs I just sent, it showed that none of my swap space had
been used. Why would that be?

> Tho there is ongoing work into better predicting memory allocation needs 
> for btrfs kernel threads and reserving memory space accordingly, so this 
> sort of thing doesn't happen any more.

That would be good.

> Agreed.  Lowmem mode looks like about your only option, beyond simply 
> blowing it away, at this point.  Too bad it doesn't do repair yet, but 

it's not an option since it won't fix the small corruption issue I had.
Thankfully deleting enough metadata allowed it to run within my RAM and
check --repair fixed it now.

> with a bit of luck it should at least give you and the devs some idea 
> what's wrong, information that can in turn be used to fix both scrub and 
> normal check mode, as well as low-mem repair mode, once it's available.

In this case, not useful information for the devs. It's a bad SAS card
that corrupted my data, not a bug in the kernel code.

> Of course your "days" comment is triggering my "it's too big to maintain" 
> reflex again, but obviously 

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-23 Thread Marc MERLIN
On Tue, May 23, 2017 at 07:21:33AM -0400, Austin S. Hemmelgarn wrote:
> > Yeah although I have no idea how much swap is needed for it to
> > succeed. I'm not sure what the relationship is to fs metadata chunk
> > size to btrfs check RAM requirement is; but if it wants all of the
> > metadata in RAM, then whatever btrfs fi us shows you for metadata may
> > be a guide (?) for how much memory it's going to want.
>
> I think the in-memory storage is a bit more space efficient than the on-disk
> storage, but I'm not certain, and I'm pretty sure it takes up more space
> when it's actually repairing things.  If I'm doing the math correctly, you
> _may_ need up to 50% _more_ than the total metadata size for the FS in
> virtual memory space.

So I was able to rescue/fix my system by removing a bunch of temporary
data on it, which in turn freed up enough metadata for things to btrfs
check to work again.
The things to check were minor, so they were fixed quickly.

I seem to have been the last person who last edited 
https://btrfs.wiki.kernel.org/index.php/Btrfsck
and it's therefore way out of date :)

I propose the following
1) One dev needs to confirm that as long as you have enough swap, btrfs
check should. Give some guideline of metadatasize to swap size.
Then again I think swap doesn't help, see below


2) I still think there is an issue with either the OOM killer, or btrfs
check actually chewing up kernel RAM. I've never seen any linux system
die in the spectacular ways mine died with that btrfs check, if it were
only taking userspace RAM.
I've filed a bug, because it looks bad:
https://bugzilla.kernel.org/show_bug.cgi?id=195863

Can someone read those better than me? Is it userspace RAM that is missing?
You said that swap would help, but in the dump below, I see:
Free swap  = 15366388kB
so my swap was unused and the system crashed due to OOM anyway.

btrfs-transacti: page allocation stalls for 23508ms, order:0, 
mode:0x1400840(GFP_NOFS|__GFP_NOFAIL), nodemask=(null)
btrfs-transacti cpuset=/ mems_allowed=0
Mem-Info:
active_anon:5274313 inactive_anon:378373 isolated_anon:3590
 active_file:3711 inactive_file:3809 isolated_file:0
 unevictable:1467 dirty:5068 writeback:49189 unstable:0
 slab_reclaimable:8721 slab_unreclaimable:67310
 mapped:556943 shmem:801313 pagetables:15777 bounce:0
 free:89741 free_pcp:6 free_cma:0
Node 0 active_anon:21097252kB inactive_anon:1513492kB active_file:14844kB 
inactive_file:15236kB unevictable:5868kB isolated(anon):14360kB 
isolated(file):0kB mapped:2227772kB dirty:20272kB writeback:196756kB 
shmem:3205252kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB 
writeback_tmp:0kB unstable:0kB pages_scanned:215184 all_unreclaimable? no
Node 0 DMA free:15880kB min:168kB low:208kB high:248kB active_anon:0kB 
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB 
writepending:0kB present:15972kB managed:15888kB mlocked:0kB 
slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB 
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 3201 23768 23768 23768
Node 0 DMA32 free:116720kB min:35424kB low:44280kB high:53136kB 
active_anon:3161376kB inactive_anon:8kB active_file:320kB inactive_file:332kB 
unevictable:0kB writepending:612kB present:3362068kB managed:3296500kB 
mlocked:0kB slab_reclaimable:460kB slab_unreclaimable:668kB kernel_stack:16kB 
pagetables:7292kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 20567 20567 20567
Node 0 Normal free:226664kB min:226544kB low:283180kB high:339816kB 
active_anon:17935552kB inactive_anon:1513564kB active_file:14524kB 
inactive_file:14904kB unevictable:5868kB writepending:216372kB 
present:21485568kB managed:21080208kB mlocked:5868kB slab_reclaimable:34412kB 
slab_unreclaimable:268520kB kernel_stack:12480kB pagetables:55816kB bounce:0kB 
free_pcp:148kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 
0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
Node 0 DMA32: 768*4kB (UME) 740*8kB (UME) 685*16kB (UME) 446*32kB (UME) 
427*64kB (UME) 233*128kB (UME) 79*256kB (UME) 10*512kB (UME) 0*1024kB 0*2048kB 
0*4096kB = 116720kB
Node 0 Normal: 25803*4kB (UME) 11297*8kB (UME) 947*16kB (UME) 260*32kB (ME) 
72*64kB (UM) 15*128kB (UM) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
223844kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
858720 total pagecache pages
49221 pages in swap cache
Swap cache stats: add 62319, delete 13131, find 75/76
Free swap  = 15366388kB
Total swap = 15616764kB
6215902 pages RAM
0 pages HighMem/MovableOnly
117753 pages reserved
4096 pages cma reserved


I'm also happy to modify the wiki to
1) mention that there is a lowmem mode which in turn isn't really useful
for much yet since it won't repair even a trivial thing (seen patches go
around, but not in upstream yet)

2) warn that for now check --repair of a big filesystem will crash 

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-23 Thread Austin S. Hemmelgarn

On 2017-05-22 22:07, Chris Murphy wrote:

On Mon, May 22, 2017 at 5:57 PM, Marc MERLIN  wrote:

On Mon, May 22, 2017 at 05:26:25PM -0600, Chris Murphy wrote:

On Mon, May 22, 2017 at 10:31 AM, Marc MERLIN  wrote:



I already have 24GB of RAM in that machine, adding more for the real
fsck repair to run, is going to be difficult and ndb would take days I
guess (then again I don't have a machine with 32 or 48 or 64GB of RAM
anyway).


If you can acquire an SSD, you can give the system a bunch of swap,
and at least then hopefully the check repair can complete. Yes it'll
be slower than with real RAM but it's not nearly as bad as you might
think it'd be, based on HDD based swap.


Oh, swap will work, you're sure?
I already have an SSD, if that's good enough, I can give it a shot.


Yeah although I have no idea how much swap is needed for it to
succeed. I'm not sure what the relationship is to fs metadata chunk
size to btrfs check RAM requirement is; but if it wants all of the
metadata in RAM, then whatever btrfs fi us shows you for metadata may
be a guide (?) for how much memory it's going to want.
I think the in-memory storage is a bit more space efficient than the 
on-disk storage, but I'm not certain, and I'm pretty sure it takes up 
more space when it's actually repairing things.  If I'm doing the math 
correctly, you _may_ need up to 50% _more_ than the total metadata size 
for the FS in virtual memory space.


Another possibility is zswap, which still requires a backing device,
but it might be able to limit how much swap to disk is needed if the
data to swap out is highly compressible. *shrug*

zswap won't help in that respect, but it might make swapping stuff back 
in faster.  It just keeps a compressed copy in memory in parallel to 
writing the full copy out to disk, then uses that compressed copy to 
swap in instead of going to disk if the copy is still in memory (but it 
will discard the compressed copies if memory gets really low).  In 
essence, it reduces the impact of swapping when memory pressure is 
moderate (the situation for most desktops for example), but becomes 
almost useless when you have very high memory pressure (which is what 
describes this usage).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-22 Thread Chris Murphy
On Mon, May 22, 2017 at 5:57 PM, Marc MERLIN  wrote:
> On Mon, May 22, 2017 at 05:26:25PM -0600, Chris Murphy wrote:
>> On Mon, May 22, 2017 at 10:31 AM, Marc MERLIN  wrote:
>>
>> >
>> > I already have 24GB of RAM in that machine, adding more for the real
>> > fsck repair to run, is going to be difficult and ndb would take days I
>> > guess (then again I don't have a machine with 32 or 48 or 64GB of RAM
>> > anyway).
>>
>> If you can acquire an SSD, you can give the system a bunch of swap,
>> and at least then hopefully the check repair can complete. Yes it'll
>> be slower than with real RAM but it's not nearly as bad as you might
>> think it'd be, based on HDD based swap.
>
> Oh, swap will work, you're sure?
> I already have an SSD, if that's good enough, I can give it a shot.

Yeah although I have no idea how much swap is needed for it to
succeed. I'm not sure what the relationship is to fs metadata chunk
size to btrfs check RAM requirement is; but if it wants all of the
metadata in RAM, then whatever btrfs fi us shows you for metadata may
be a guide (?) for how much memory it's going to want.

Another possibility is zswap, which still requires a backing device,
but it might be able to limit how much swap to disk is needed if the
data to swap out is highly compressible. *shrug*

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-22 Thread Marc MERLIN
On Mon, May 22, 2017 at 05:26:25PM -0600, Chris Murphy wrote:
> On Mon, May 22, 2017 at 10:31 AM, Marc MERLIN  wrote:
> 
> >
> > I already have 24GB of RAM in that machine, adding more for the real
> > fsck repair to run, is going to be difficult and ndb would take days I
> > guess (then again I don't have a machine with 32 or 48 or 64GB of RAM
> > anyway).
> 
> If you can acquire an SSD, you can give the system a bunch of swap,
> and at least then hopefully the check repair can complete. Yes it'll
> be slower than with real RAM but it's not nearly as bad as you might
> think it'd be, based on HDD based swap.

Oh, swap will work, you're sure? 
I already have an SSD, if that's good enough, I can give it a shot.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-22 Thread Chris Murphy
On Mon, May 22, 2017 at 10:31 AM, Marc MERLIN  wrote:

>
> I already have 24GB of RAM in that machine, adding more for the real
> fsck repair to run, is going to be difficult and ndb would take days I
> guess (then again I don't have a machine with 32 or 48 or 64GB of RAM
> anyway).

If you can acquire an SSD, you can give the system a bunch of swap,
and at least then hopefully the check repair can complete. Yes it'll
be slower than with real RAM but it's not nearly as bad as you might
think it'd be, based on HDD based swap.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-22 Thread Marc MERLIN
On Sun, May 21, 2017 at 06:35:53PM -0700, Marc MERLIN wrote:
> On Sun, May 21, 2017 at 04:45:57PM -0700, Marc MERLIN wrote:
> > On Sun, May 21, 2017 at 02:47:33PM -0700, Marc MERLIN wrote:
> > > gargamel:~# btrfs check --repair /dev/mapper/dshelf1
> > > enabling repair mode
> > > Checking filesystem on /dev/mapper/dshelf1
> > > UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
> > > checking extents
> > > 
> > > This causes a bunch of these:
> > > btrfs-transacti: page allocation stalls for 23508ms, order:0, 
> > > mode:0x1400840(GFP_NOFS|__GFP_NOFAIL), nodemask=(null)
> > > btrfs-transacti cpuset=/ mems_allowed=0
> > > 
> > > What's the recommended way out of this and which code is at fault? I 
> > > can't tell if btrfs is doing memory allocations wrong, or if it's just 
> > > being undermined by the block layer dying underneath.
> > 
> > I went back to 4.8.10, and similar problem.
> > It looks like btrfs check exercises the kernel and causes everything to 
> > come down to a halt :(
> > 
> > Sadly, I tried a scrub on the same device, and it stalled after 6TB. The 
> > scrub process went zombie
> > and the scrub never succeeded, nor could it be stopped.
> 
> So, putting the btrfs scrub that stalled issue, I didn't quite realize
> that btrs check memory issues actually caused the kernel to eat all the
> memory until everything crashed/deadlocked/stalled.
> Is that actually working as intended?
> Why doesn't it fail and stop instead of taking my entire server down?
> Clearly there must be a rule against a kernel subsystem taking all the
> memory from everything until everything crashes/deadlocks, right?
> 
> So for now, I'm doing a lowmem check, but it's not going to be very
> helpful since it cannot repair anything if it finds a problem.
> 
> At least my machine isn't crashing anymore, I suppose that's still an
> improvement.
> gargamel:~# btrfs check --mode=lowmem /dev/mapper/dshelf1
> We'll see how many days it takes.

Well, at least it's finding errors, but of course it can't fix them
since lowmem doesn't have repair yet (yes, I know it's WIP)

I already have 24GB of RAM in that machine, adding more for the real
fsck repair to run, is going to be difficult and ndb would take days I
guess (then again I don't have a machine with 32 or 48 or 64GB of RAM
anyway).

I'm guessing my next step is to delete a lot of data from that array
until its metadata use gets back below something that fits in RAM :-/
But hopefully check --repair can be fixed not to crash your machine if
it needs more RAM than is available.

 
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking free space cache [.]
ERROR: root 53282 EXTENT_DATA[8244 4096] interrupt
ERROR: root 53282 EXTENT_DATA[50585 4096] interrupt
ERROR: root 53282 EXTENT_DATA[51096 4096] interrupt
ERROR: root 53282 EXTENT_DATA[182617 4096] interrupt
ERROR: root 53282 EXTENT_DATA[212972 4096] interrupt
ERROR: root 53282 EXTENT_DATA[260115 4096] interrupt
ERROR: root 53282 EXTENT_DATA[278370 4096] interrupt
ERROR: root 53282 EXTENT_DATA[323505 4096] interrupt
ERROR: root 53282 EXTENT_DATA[396923 4096] interrupt
ERROR: root 53282 EXTENT_DATA[419599 4096] interrupt
ERROR: root 53282 EXTENT_DATA[490602 4096] interrupt
ERROR: root 53282 EXTENT_DATA[41 4096] interrupt
ERROR: root 53282 EXTENT_DATA[601942 4096] interrupt
ERROR: root 53282 EXTENT_DATA[682215 4096] interrupt
ERROR: root 53282 EXTENT_DATA[721729 4096] interrupt
ERROR: root 53282 EXTENT_DATA[916271 4096] interrupt
ERROR: root 53282 EXTENT_DATA[961074 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1118062 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1127879 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1142984 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1379975 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1398275 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1446265 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1459061 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1477900 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1477900 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1484265 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1509227 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1671096 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1692559 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1742832 4096] interrupt
ERROR: root 53282 EXTENT_DATA[1808649 4096] interrupt
ERROR: root 53292 EXTENT_DATA[57240 4096] interrupt
ERROR: root 53446 EXTENT_DATA[3554 4096] interrupt
ERROR: root 53446 EXTENT_DATA[64241 4096] interrupt
(...)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-22 Thread Duncan
Marc MERLIN posted on Sun, 21 May 2017 18:35:53 -0700 as excerpted:

> On Sun, May 21, 2017 at 04:45:57PM -0700, Marc MERLIN wrote:
>> On Sun, May 21, 2017 at 02:47:33PM -0700, Marc MERLIN wrote:
>> > gargamel:~# btrfs check --repair /dev/mapper/dshelf1 enabling repair
>> > mode Checking filesystem on /dev/mapper/dshelf1 UUID:
>> > 36f5079e-ca6c-4855-8639-ccb82695c18d checking extents
>> > 
>> > This causes a bunch of these:
>> > btrfs-transacti: page allocation stalls for 23508ms, order:0,
>> > mode:0x1400840(GFP_NOFS|__GFP_NOFAIL), nodemask=(null)
>> > btrfs-transacti cpuset=/ mems_allowed=0
>> > 
>> > What's the recommended way out of this and which code is at fault? I
>> > can't tell if btrfs is doing memory allocations wrong, or if it's
>> > just being undermined by the block layer dying underneath.
>> 
>> I went back to 4.8.10, and similar problem.
>> It looks like btrfs check exercises the kernel and causes everything to
>> come down to a halt :(

btrfs check is userspace, not kernelspace.  The btrfs-transacti threads 
are indeed kernelspace, but the problem would appear to be either IO or 
memory starvation triggered by the userspace check hogging all available 
resources, not leaving enough for normal system, including kernel, 
processes.

Check is /known/ to be memory intensive, with multi-TB filesystems often 
requiring tens of GiB of memory, and qgroups and snapshots are both known 
to dramatically intensify the scaling issues.  (btrfs balance, by 
contrast, has the same scaling issues, but is kernelspace.)

That's one reason why (not all of these may apply to your case) ...

* Keeping the number of snapshots as low as possible is strongly 
recommended by pretty much everyone here, definitely under 300 per 
subvolume and if possible, to double-digits per subvolume.

* I personally recommend disabling qgroups, unless you're actively 
working with the devs on improving them.  In addition to the scaling 
issues, quotas simply aren't reliable enough on btrfs yet to rely on them 
if the use-case requires them (in which case using a mature filesystem 
where they're proven to work is recommended), and if it doesn't, there's 
simply too many remaining issues for the qgroups option to be worth it.

* I personally recommend keeping overall filesystem size to something one 
can reasonably manage.  Most people's use-cases aren't going to allow for 
an fsck taking days and tens of GiB, but /will/ allow for multi-TB 
filesystems to be split out into multiple independent filesystems of 
perhaps a TB or two each, tops, if that's the alternative to multiple-day 
fscks taking tens of GiB.  (Some use-cases are of course exceptions.)

* The low-memory-mode btrfs check is being developed, tho unfortunately 
it doesn't yet do repairs.  (Another reason is that it's an alternate 
implementation that provides a very useful second opinion and the ability 
to cross-check one implementation against the other in hard problem 
cases.)

(The two "I personally recommend" points above aren't recommendations 
shared by everyone on the list, but obviously I've found them very useful 
here. =:^)

>> Sadly, I tried a scrub on the same device, and it stalled after 6TB.
>> The scrub process went zombie and the scrub never succeeded, nor could
>> it be stopped.

Quite apart from the "... after 6TB" bit setting off my own "it's too big 
to reasonably manage" alarm, the filesystem obviously is bugged, and 
scrub as well, since it shouldn't just go zombie regardless of the 
problem -- it should fail much more gracefully.

Meanwhile, FWIW, unlike check, scrub /is/ kernelspace.

> So, putting the btrfs scrub that stalled issue, I didn't quite realize
> that btrs check memory issues actually caused the kernel to eat all the
> memory until everything crashed/deadlocked/stalled.
> Is that actually working as intended?
> Why doesn't it fail and stop instead of taking my entire server down?
> Clearly there must be a rule against a kernel subsystem taking all the
> memory from everything until everything crashes/deadlocks, right?

As explained, check is userspace, but as you found, it can still 
interfere with kernelspace, including unrelated btrfs-transaction 
threads.  When the system's out of memory, it's out of memory.

Tho there is ongoing work into better predicting memory allocation needs 
for btrfs kernel threads and reserving memory space accordingly, so this 
sort of thing doesn't happen any more.

Of course it could also be some sort of (not necessarily directly btrfs) 
lockdep issue, and there's ongoing kernel-wide and btrfs work there as 
well.

> So for now, I'm doing a lowmem check, but it's not going to be very
> helpful since it cannot repair anything if it finds a problem.
> 
> At least my machine isn't crashing anymore, I suppose that's still an
> improvement.
> gargamel:~# btrfs check --mode=lowmem /dev/mapper/dshelf1 We'll see how
> many days it takes.

Agreed.  Lowmem mode looks like about your only option, beyond simply 

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-21 Thread Marc MERLIN
On Sun, May 21, 2017 at 04:45:57PM -0700, Marc MERLIN wrote:
> On Sun, May 21, 2017 at 02:47:33PM -0700, Marc MERLIN wrote:
> > gargamel:~# btrfs check --repair /dev/mapper/dshelf1
> > enabling repair mode
> > Checking filesystem on /dev/mapper/dshelf1
> > UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
> > checking extents
> > 
> > This causes a bunch of these:
> > btrfs-transacti: page allocation stalls for 23508ms, order:0, 
> > mode:0x1400840(GFP_NOFS|__GFP_NOFAIL), nodemask=(null)
> > btrfs-transacti cpuset=/ mems_allowed=0
> > 
> > What's the recommended way out of this and which code is at fault? I can't 
> > tell if btrfs is doing memory allocations wrong, or if it's just being 
> > undermined by the block layer dying underneath.
> 
> I went back to 4.8.10, and similar problem.
> It looks like btrfs check exercises the kernel and causes everything to come 
> down to a halt :(
> 
> Sadly, I tried a scrub on the same device, and it stalled after 6TB. The 
> scrub process went zombie
> and the scrub never succeeded, nor could it be stopped.

So, putting the btrfs scrub that stalled issue, I didn't quite realize
that btrs check memory issues actually caused the kernel to eat all the
memory until everything crashed/deadlocked/stalled.
Is that actually working as intended?
Why doesn't it fail and stop instead of taking my entire server down?
Clearly there must be a rule against a kernel subsystem taking all the
memory from everything until everything crashes/deadlocks, right?

So for now, I'm doing a lowmem check, but it's not going to be very
helpful since it cannot repair anything if it finds a problem.

At least my machine isn't crashing anymore, I suppose that's still an
improvement.
gargamel:~# btrfs check --mode=lowmem /dev/mapper/dshelf1
We'll see how many days it takes.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-21 Thread Marc MERLIN
On Sun, May 21, 2017 at 02:47:33PM -0700, Marc MERLIN wrote:
> gargamel:~# btrfs check --repair /dev/mapper/dshelf1
> enabling repair mode
> Checking filesystem on /dev/mapper/dshelf1
> UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
> checking extents
> 
> This causes a bunch of these:
> btrfs-transacti: page allocation stalls for 23508ms, order:0, 
> mode:0x1400840(GFP_NOFS|__GFP_NOFAIL), nodemask=(null)
> btrfs-transacti cpuset=/ mems_allowed=0
> 
> What's the recommended way out of this and which code is at fault? I can't 
> tell if btrfs is doing memory allocations wrong, or if it's just being 
> undermined by the block layer dying underneath.

I went back to 4.8.10, and similar problem.
It looks like btrfs check exercises the kernel and causes everything to come 
down to a halt :(

Sadly, I tried a scrub on the same device, and it stalled after 6TB. The scrub 
process went zombie
and the scrub never succeeded, nor could it be stopped.

What do I try next? My filesystem seems ok when I use it except for that
BUG() crash I just reported a few days ago. I'm willing to believe there is a 
some problem with it somewhere
but if I can't scrub or check it, it's kind of hard to look into it further 

[ 1090.912073] INFO: task kworker/dying:63 blocked for more than 120 seconds.
[ 1090.933850]   Tainted: G U  
4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
[ 1090.959465] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1090.983973] kworker/dying   D 9a23424e3d68 063  2 0x
[ 1091.006171]  9a23424e3d68 00ff9a213ab32140 8dc0d4c0 
9a23424dc100
[ 1091.029349]  9a23424e3d50 9a23424e4000 9a234098d064 
9a23424dc100
[ 1091.052490]  9a234098d068  9a23424e3d80 
8d6cf1a6
[ 1091.075679] Call Trace:
[ 1091.083882]  [] schedule+0x8b/0xa3
[ 1091.099532]  [] schedule_preempt_disabled+0x18/0x24
[ 1091.119518]  [] __mutex_lock_slowpath+0xce/0x16d
[ 1091.138705]  [] mutex_lock+0x17/0x27
[ 1091.154772]  [] ? mutex_lock+0x17/0x27
[ 1091.171382]  [] acct_process+0x4e/0xe0
[ 1091.187974]  [] ? rescuer_thread+0x24f/0x2d1
[ 1091.206170]  [] do_exit+0x3ba/0x97b
[ 1091.222001]  [] ? kfree+0x7a/0x99
[ 1091.237307]  [] ? worker_thread+0x2ab/0x2ba
[ 1091.255219]  [] ? rescuer_thread+0x2d1/0x2d1
[ 1091.273390]  [] kthread+0xbc/0xbc
[ 1091.288672]  [] ret_from_fork+0x1f/0x40
[ 1091.305524]  [] ? init_completion+0x24/0x24
[ 1091.323404] INFO: task kworker/u16:4:158 blocked for more than 120 seconds.
[ 1091.344956]   Tainted: G U  
4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
[ 1091.370145] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1091.394299] kworker/u16:4   D 9a233e607b58 0   158  2 0x
[ 1091.416260] Workqueue: btrfs-endio-write btrfs_endio_write_helper
[ 1091.435259]  9a233e607b58 00ff8d0899ae 9a21f0c76180 
9a233e6021c0
[ 1091.458328]  9a233e607b40 9a233e608000 7fff 
9a233e6021c0
[ 1091.481385]  8d6d1244 9a2317491e68 9a233e607b70 
8d6cf1a6
[ 1091.504472] Call Trace:
[ 1091.512751]  [] ? usleep_range+0x65/0x65
[ 1091.530093]  [] schedule+0x8b/0xa3
[ 1091.545833]  [] schedule_timeout+0x43/0x126
[ 1091.563782]  [] ? wake_up_process+0x15/0x17
[ 1091.581707]  [] do_wait_for_common+0x123/0x15f
[ 1091.600403]  [] ? do_wait_for_common+0x123/0x15f
[ 1091.619625]  [] ? wake_up_q+0x47/0x47
[ 1091.635983]  [] wait_for_common+0x3b/0x55
[ 1091.653380]  [] wait_for_completion+0x1d/0x1f
[ 1091.671811]  [] btrfs_async_run_delayed_refs+0xd3/0xed
[ 1091.692598]  [] __btrfs_end_transaction+0x2a7/0x2dd
[ 1091.712585]  [] btrfs_end_transaction+0x10/0x12
[ 1091.731529]  [] btrfs_finish_ordered_io+0x3f7/0x4db
[ 1091.751495]  [] finish_ordered_fn+0x15/0x17
[ 1091.769372]  [] btrfs_scrubparity_helper+0x10e/0x258
[ 1091.789590]  [] btrfs_endio_write_helper+0xe/0x10
[ 1091.809014]  [] process_one_work+0x186/0x29d
[ 1091.827123]  [] worker_thread+0x1ea/0x2ba
[ 1091.844438]  [] ? rescuer_thread+0x2d1/0x2d1
[ 1091.862521]  [] kthread+0xb4/0xbc
[ 1091.877718]  [] ret_from_fork+0x1f/0x40
[ 1091.894476]  [] ? init_completion+0x24/0x24
[ 1091.912276] INFO: task kworker/u16:5:159 blocked for more than 120 seconds.
[ 1091.933740]   Tainted: G U  
4.8.10-amd64-preempt-sysrq-20161121vb3tj1 #12
[ 1091.958847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1091.982906] kworker/u16:5   D 9a233e60f9c0 0   159  2 0x
[ 1092.004713] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[ 1092.023611]  9a233e60f9c0 0246 9a2342fa41c0 
9a233e608200
[ 1092.046575]  9a233e60f9a8 9a233e61 9a213898bd88 
9a233052f800
[ 1092.069536]  0001 0001 9a233e60f9d8 
8d6cf1a6
[ 1092.092523] Call Trace:
[ 1092.100496]  [] schedule+0x8b/0xa3
[ 1092.115995]  [] btrfs_tree_lock+0xd6/0x1fb
[ 1092.133574]  [] 

4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-21 Thread Marc MERLIN
gargamel:~# btrfs check --repair /dev/mapper/dshelf1
enabling repair mode
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents

This causes a bunch of these:
btrfs-transacti: page allocation stalls for 23508ms, order:0, 
mode:0x1400840(GFP_NOFS|__GFP_NOFAIL), nodemask=(null)
btrfs-transacti cpuset=/ mems_allowed=0

What's the recommended way out of this and which code is at fault? I can't tell 
if btrfs is doing memory allocations wrong, or if it's just being undermined by 
the block layer dying underneath.

And sadly, I'm also getting workqueues stalls like these:
[ 3996.047531] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 
stuck for 45s!
[ 3996.073512] BUG: workqueue lockup - pool cpus=3 node=0 flags=0x0 nice=0 
stuck for 52s!
[ 3996.099466] Showing busy workqueues and worker pools:
[ 3996.116824] workqueue events: flags=0x0
[ 3996.130409]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=3/256
[ 3996.150268] in-flight: 9661:do_sync_work
[ 3996.165186] pending: wait_rcu_exp_gp, cache_reap
[ 3996.182139]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=8/256
[ 3996.202099] in-flight: 9725:do_emergency_remount, 9738:do_poweroff
[ 3996.223543] pending: drm_fb_helper_dirty_work, cache_reap, do_sync_work, 
vmstat_shepherd, update_writeback_rate [bcache], do_poweroff
[ 3996.263991] workqueue writeback: flags=0x4e
[ 3996.278116]   pwq 16: cpus=0-7 flags=0x4 nice=0 active=2/256
[ 3996.296586] in-flight: 149:wb_workfn wb_workfn
[ 3996.312387] workqueue btrfs-endio-write: flags=0xe
[ 3996.328090]   pwq 16: cpus=0-7 flags=0x4 nice=0 active=2/8
[ 3996.345794] in-flight: 20326:btrfs_endio_write_helper, 
2927:btrfs_endio_write_helper
[ 3996.371981] workqueue kcryptd: flags=0x2a
[ 3996.386019]   pwq 16: cpus=0-7 flags=0x4 nice=0 active=8/8
[ 3996.404325] in-flight: 9950:kcryptd_crypt [dm_crypt], 8859:kcryptd_crypt 
[dm_crypt], 31087:kcryptd_crypt [dm_crypt], 2929:kcryptd_crypt [dm_crypt], 
20328:kcryptd_crypt [dm_crypt], 5951:kcryptd_crypt [dm_crypt], 
31084:kcryptd_crypt [dm_crypt], 7553:kcryptd_crypt [dm_crypt]
[ 3996.484333] delayed: kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]
[ 3996.719697] , kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]

Problems started here:
[ 3624.349624] php5: page allocation stalls for 15028ms, order:0, 
mode:0x1400840(GFP_NOFS|__GFP_NOFAIL), nodemask=(null)
[ 3624.382270] php5 cpuset=/ mems_allowed=0
[ 3624.395474] CPU: 1 PID: 9949 Comm: php5 Tainted: G U  
4.11.1-amd64-preempt-sysrq-20170406 #4
[ 3624.424907] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 3904 04/27/2013
[ 3624.453233] Call Trace:
[ 3624.461292]  dump_stack+0x61/0x7d
[ 3624.472098]  warn_alloc+0xfc/0x18c
[ 3624.483150]  __alloc_pages_slowpath+0x3bc/0xb31
[ 3624.497528]  ? finish_wait+0x5a/0x63
[ 3624.509018]  __alloc_pages_nodemask+0x12c/0x1e0
[ 3624.523343]  alloc_pages_current+0x9b/0xbd
[ 3624.536346]  __page_cache_alloc+0x8e/0xa4
[ 3624.549067]  pagecache_get_page+0xc9/0x16b
[ 3624.562067]  alloc_extent_buffer+0xdf/0x305
[ 3624.575342]  read_tree_block+0x19/0x4e
[ 3624.587295]  read_block_for_search.isra.21+0x211/0x264
[ 3624.603420]  btrfs_search_slot+0x52b/0x72e
[ 3624.616387]  btrfs_lookup_csum+0x52/0xf7
[ 3624.628835]  __btrfs_lookup_bio_sums+0x23b/0x448
[ 3624.643396]  btrfs_lookup_bio_sums+0x16/0x18
[ 3624.656886]  btrfs_submit_bio_hook+0xcb/0x14a
[ 3624.670639]