Hi,
Georgi Georgiev wrote on 2015/07/29 14:46 +0900:
Using BTRFS on a very large filesystem, and as we put and more data to
it, the time it takes to mount it grew to, presently, about 30 minutes.
Is there something wrong with the filesystem? Is there a way to bring
this time down?
...
Here is a snippet from dmesg, showing how long it takes to mount (the
EXT4-fs line is the filesystem mounted next in the boot sequence):
$ dmesg | grep -A1 btrfs
[ 12.215764] TECH PREVIEW: btrfs may not be fully supported.
[ 12.215766] Please review provided documentation for limitations.
--
[ 12.220266] btrfs: use zlib compression
[ 12.220815] btrfs: disk space caching is enabled
[ 22.427258] btrfs: bdev /dev/mapper/datavg-backuplv errs: wr 0, rd 0,
flush 0, corrupt 0, gen 0
[ 2022.397318] EXT4-fs (dm-2): mounted filesystem with ordered data mode.
Opts:
Quite common, especial when it grows large.
But it would be much better to use ftrace to show which btrfs operation
takes the most time.
We have some guess on this, from reading space cache to reading chunk info.
But didn't know which takes the most of time.
The btrfs filesystem is quite large:
$ sudo btrfs filesystem usage /dev/mapper/datavg-backuplv
Overall:
Device size: 82.58TiB
Device allocated: 82.58TiB
Device unallocated: 0.00B
Device missing: 0.00B
Used: 62.01TiB
Free (estimated): 17.76TiB (min: 17.76TiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 0.00B (used: 0.00B)
Data,single: Size:79.28TiB, Used:61.52TiB
/dev/mapper/datavg-backuplv 79.28TiB
Metadata,single: Size:8.00MiB, Used:0.00B
/dev/mapper/datavg-backuplv 8.00MiB
Metadata,DUP: Size:1.65TiB, Used:252.68GiB
/dev/mapper/datavg-backuplv 3.30TiB
System,single: Size:4.00MiB, Used:0.00B
/dev/mapper/datavg-backuplv 4.00MiB
System,DUP: Size:40.00MiB, Used:8.66MiB
/dev/mapper/datavg-backuplv 80.00MiB
Unallocated:
/dev/mapper/datavg-backuplv 0.00B
Wow, near 100T, that really huge now.
Other info about the filesystem is that it has a rather large number of
files and subvolumes and read only snapshots, which started from about
zero in March, and grew over to the current state of 3000 snapshots and
no idea how many files (filesystem usage is quite stable at the moment).
I also noticed that while the machine is rebooted on a weekly basis, the
time it takes to come up after a reboot has been growing. This is likely
correlated to how long it takes to mount the filesystem, and maybe
correlated to how much data there is on the filesystem.
Reboot time used to be normally about 3 minutes, then it jumped to 8
minutes on March 21 and the following weeks it went like this:
8 minutes, 11 minutes, 15 minutes...
19, 19, 19, 19, 23, 21, 22
32, 33, 36, 42, 46, 37, 30
This is on CentOS 6.6, and while I understand that the version of btrfs
is definitely oldish, even trying to mount the filesystem on a much more
recent kernel (3.14.43) there is no improvement. Switching the regular
OS kernel from the CentOS one (2.6.32-504.12.2.el6.x86_64) to something
more recent is also feasible.
I wanted to check the sytem for problems, so tried an offline "btrfs
check" using the latest btrfs-progs (version 4.1.2 freshly compiled from
source), but "btrfs check" ran out of memory after about 30 minutes.
The only output I get is this (timestamps added by me):
2015-07-28 18:14:45 $ sudo btrfs check /dev/datavg/backuplv
2015-07-28 18:33:05 checking extents
And at 19:04:55 btrfs was killed by OOM: (abbreviated log below,
full excerpt as an attachment).
Not surprised at all.
As for extent/chunk tree checking, it will read all the the chunk and
extents, and restore needed info into memory, and then do cross
reference check.
The btrfsck process really takes a lot of memory.
Maybe 1/10 or more of the metadata space.
In your case, your metadata is about 250GB, so maybe 25GB memory is used
to hold the needed info.
That's already known but we don't have some good idea or deveopler to
reduce the space usage yet.
Maybe we can change the behavior to do chunk by chunk extent cross
checking to reduce memory usage, but not now...
Thanks,
Qu
2015-07-28T19:04:55.224855+09:00 localhost kernel: [11689.692680] htop
invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
...
2015-07-28T19:04:55.225855+09:00 localhost kernel: [11689.801354] 631 total
pagecache pages
2015-07-28T19:04:55.225857+09:00 localhost kernel: [11689.801829] 0 pages in
swap cache
2015-07-28T19:04:55.225859+09:00 localhost kernel: [11689.802305] Swap cache
stats: add 0, delete 0, find 0/0
2015-07-28T19:04:55.225861+09:00 localhost kernel: [11689.802781] Free swap
= 0kB
2015-07-28T19:04:55.225863+09:00 localhost kernel: [11689.803341] Total swap
= 0kB
2015-07-28T19:04:55.225864+09:00 localhost kernel: [11689.946223] 16777215
pages RAM
2015-07-28T19:04:55.225867+09:00 localhost kernel: [11689.946724] 295175
pages reserved
2015-07-28T19:04:55.225869+09:00 localhost kernel: [11689.947223] 5173 pages
shared
2015-07-28T19:04:55.225871+09:00 localhost kernel: [11689.947721] 16369184
pages non-shared
2015-07-28T19:04:55.225874+09:00 localhost kernel: [11689.948222] [ pid ]
uid tgid total_vm rss cpu oom_adj oom_score_adj name
...
2015-07-28T19:04:55.225970+09:00 localhost kernel: [11689.994240] [16291]
0 16291 47166 177 18 0 0 sudo
2015-07-28T19:04:55.225972+09:00 localhost kernel: [11689.995232] [16292]
1000 16292 981 20 3 0 0 tai64n
2015-07-28T19:04:55.225974+09:00 localhost kernel: [11689.996241] [16293]
0 16293 47166 177 22 0 0 sudo
2015-07-28T19:04:55.225978+09:00 localhost kernel: [11689.997230] [16294]
1000 16294 1018 21 1 0 0 tai64nlocal
2015-07-28T19:04:55.225993+09:00 localhost kernel: [11689.998227] [16295]
0 16295 16122385 16118611 7 0 0 btrfs
2015-07-28T19:04:55.225995+09:00 localhost kernel: [11689.999210] [16296]
0 16296 25228 25 5 0 0 tee
2015-07-28T19:04:55.225997+09:00 localhost kernel: [11690.000201] [16297]
1000 16297 27133 162 1 0 0 bash
...
2015-07-28T19:04:55.226030+09:00 localhost kernel: [11690.008288] Out of
memory: Kill process 16295 (btrfs) score 949 or sacrifice child
2015-07-28T19:04:55.226031+09:00 localhost kernel: [11690.009300] Killed
process 16295, UID 0, (btrfs) total-vm:64489540kB, anon-rss:64474408kB,
file-rss:36kB
Thanks in advance for any advice,
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html