On 2018/12/4 下午9:07, Nikolay Borisov wrote:
>
>
> On 3.12.18 г. 20:20 ч., Wilson, Ellis wrote:
>> Hi all,
>>
>> Many months ago I promised to graph how long it took to mount a BTRFS
>> filesystem as it grows. I finally had (made) time for this, and the
>> attached is the result of my testing. The image is a fairly
>> self-explanatory graph, and the raw data is also attached in
>> comma-delimited format for the more curious. The columns are:
>> Filesystem Size (GB), Mount Time 1 (s), Mount Time 2 (s), Mount Time 3 (s).
>>
>> Experimental setup:
>> - System:
>> Linux pgh-sa-1-2 4.20.0-rc4-1.g1ac69b7-default #1 SMP PREEMPT Mon Nov 26
>> 06:22:42 UTC 2018 (1ac69b7) x86_64 x86_64 x86_64 GNU/Linux
>> - 6-drive RAID0 (mdraid, 8MB chunks) array of 12TB enterprise drives.
>> - 3 unmount/mount cycles performed in between adding another 250GB of data
>> - 250GB of data added each time in the form of 25x10GB files in their
>> own directory. Files generated in parallel each epoch (25 at the same
>> time, with a 1MB record size).
>> - 240 repetitions of this performed (to collect timings in increments of
>> 250GB between a 0GB and 60TB filesystem)
>> - Normal "time" command used to measure time to mount. "Real" time used
>> of the timings reported from time.
>> - Mount:
>> /dev/md0 on /btrfs type btrfs
>> (rw,relatime,space_cache=v2,subvolid=5,subvol=/)
>>
>> At 60TB, we take 30s to mount the filesystem, which is actually not as
>> bad as I originally thought it would be (perhaps as a result of using
>> RAID0 via mdraid rather than native RAID0 in BTRFS). However, I am open
>> to comment if folks more intimately familiar with BTRFS think this is
>> due to the very large files I've used. I can redo the test with much
>> more realistic data if people have legitimate reason to think it will
>> drastically change the result.
>>
>> With 14TB drives available today, it doesn't take more than a handful of
>> drives to result in a filesystem that takes around a minute to mount.
>> As a result of this, I suspect this will become an increasingly problem
>> for serious users of BTRFS as time goes on. I'm not complaining as I'm
>> not a contributor so I have no room to do so -- just shedding some light
>> on a problem that may deserve attention as filesystem sizes continue to
>> grow.
>
> Would it be possible to provide perf traces of the longer-running mount
> time? Everyone seems to be fixated on reading block groups (which is
> likely to be the culprit) but before pointing finger I'd like concrete
> evidence pointed at the offender.
IIRC I submitted such analyse years ago.
Nowadays it may change due to chunk <-> bg <-> dev_extents cross checking.
So yes, it would be a good idea to show such percentage.
Thanks,
Qu
>
>>
>> Best,
>>
>> ellis
>>