Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

Qu Wenruo Tue, 13 Feb 2018 03:40:54 -0800


On 2018年02月13日 19:25, John Ettedgui wrote:
> On Tue, Feb 13, 2018 at 3:04 AM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>
>>
>> On 2018年02月13日 18:21, John Ettedgui wrote:
>>> Hello there,
>>>
>>> have you found anything good since then?
>>
>> Unfortunately, not really much to speed it up.
> Oh :/
>>
>> This reminds me of the old (and crazy) idea to skip block group build
>> for RO mount.
>> But not really helpful for it.
>>
>>> With a default system, the behavior is pretty much still the same,
>>> though I have not recreated the partitions since.
>>>
>>> Defrag helps, but I think balance helps even more.
>>> clear_cache may help too, but I'm not really sure as I've not tried it
>>> on its own.
>>> I was actually able to get a 4TB partition on a 5400rpm HDD to mount
>>> in around 500ms, quite faster that even some Gb partitions I have on
>>> SSDs! Alas I wrote some files to it and it's taking over a second
>>> again, so no more magic there.
>>
>> The problem is not about how much space it takes, but how many extents
>> are here in the filesystem.
>>
>> For new fs filled with normal data, I'm pretty sure data extents will be
>> as large as its maximum size (256M), causing very little or even no
>> pressure to block group search.
>>
> What do you mean by "new fs",


I mean the 4TB partition on that 5400rpm HDD.

> was there any change that would improve
> the behavior if I were to recreate the FS?

If you backed up your fs, and recreate a new, empty btrfs on your
original SSD, then copying all data back, I believe it would be much
faster to mount.

> Last time we talked I believe max extent was 128M for non-compressed
> files, so maybe there's been some good change.

My fault, 128M is correct.

>>>
>>> The workarounds do work, so it's still not a major issue, but they're
>>> slow and sometimes I have to workaround the "no space left on device"
>>> which then takes even more time.
>>
>> And since I went to SUSE, some mail/info is lost during the procedure.
> I still have all mails, if you want it. No dump left though.
>>
>> Despite that, I have several more assumption to this problem:
>>
>> 1) Metadata usage bumped by inline files
> What are inline files? Should I view this as inline in C, in that the
> small files are stored in the tree directly?

Exactly.

>>    If there are a lot of small files (<2K as default),
> Of the slow to mount partitions:
> 2 partitions have less than a dozen files smaller than 2K.
> 1 has about 5 thousand and the last one 15 thousand.
> Are the latter considered a lot?

If using default 16K nodesize, 8 small files takes one leaf.
And 15K small failes means about 2K tree extents.

Not that much in my opinion, can't even fill half of a metadata chunk.

>> and your metadata
>>    usage is quite high (generally speaking, it meta:data ratio should be
>>    way below 1:8), that may be the cause.
>>
> The ratio is about 1:900 on average so that should be ok I guess.

Yep, that should be fine.
So not metadata to blame.

Then purely fragmented data extents.

>>    If so, try mount the fs with "max_inline=0" mount option and then
>>    try to rewrite such small files.
>>
> Should I try that?

No need, it won't cause much difference.

>> 2) SSD write amplification along with dynamic remapping
>>    To be honest, I'm not really buying this idea, since mount doesn't
>>    have anything related to write.
>>    But running fstrim won't harm anyway.
>>
> Oh I am not complaining about slow SSDs mounting. I was just amazed
> that a partition on a slow HDD mounted faster.
> Without any specific work, my SSDs partitions tend to mount around 1 sec or 
> so.
> Of course I'd be happy to worry about them once all the partitions on
> HDDs mount in a handful of ms :)
> 
>> 3) Rewrite the existing files (extreme defrag)
>>    In fact, defrag doesn't work well if there are subvolumes/snapshots
> I have no subvolume or snapshot so that's not a problem.
>>    /reflink involved.
>>    The most stupid and mindless way, is to write a small script and find
>>    all regular files, read them out and rewrite it back.
>>
> That's fairly straightforward to do, though it should be quite slow so
> I'd hope not to have to do that too often.

Then it could be tried on the most frequently updated files then.

And since you don't use snapshot, locate such files and then "chattr +C"
would make them nodatacow, reducing later fragments.

>>    This should acts much better than traditional defrag, although it's
>>    time-consuming and makes snapshot completely meaningless.
>>    (and since you're already hitting ENOSPC, I don't think the idea is
>>     really working for you)
>>
>> And since you're already hitting ENOSPC, either it's caused by
>> unbalanced meta/data usage, or it's really going hit the limit, I would
>> recommend to enlarge the fs or delete some files to see if it helps.
>>
> Yup, I usually either slowly ramp up the {d,m}usage to pass it, or
> when that does not work I free some space, then balance will finish.
> Or did you mean to free some space to see about mount speed?

Kind of, just do such freeing in advance, and try to make btrfs always
have unallocated space in case.

And finally, use latest kernel if possible.
IIRC old kernel doesn't have empty block group auto remove, which makes
user need to manually balance to free some space.

Thanks,
Qu

>> Thanks,
>> Qu
>>
> 
> Thank you for the quick reply!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

signature.asc
Description: OpenPGP digital signature

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

Reply via email to