At 07/15/2016 12:39 PM, John Ettedgui wrote:
On Thu, Jul 14, 2016 at 8:56 PM Qu Wenruo <quwen...@cn.fujitsu.com
<mailto:quwen...@cn.fujitsu.com>> wrote:

    Sorry for the late reply.

Oh it's all good, it's only a been a few days.

    [Slow mount]
    In fact we also reproduce the same problem, and found the problem.

Awesome!

    It's related to the size of extent tree.

    If the extent tree is large enough, mount needs to do quite a lot of IO
    to read out all block group items.
    And such read is random small read (default leaf size is just 16K), and
    considering the per GB cost, spinning rust is the normal choice for such
    large fs, which makes random small read even more slower.


    The good news is, we have patch to slightly speedup the mount, by
    avoiding reading out unrelated tree blocks.

    In our test environment, it takes 15% less time to mount a fs filled
    with 16K files(2T used space).

    https://patchwork.kernel.org/patch/9021421/


Great, I will try this and report on it.

    And according to the facts that only extent size is related to the
    problem, any method to reduce extent tree size will help, including
    defrag, nodatacow.

Would increasing the leaf size help as well?
May help.
But didn't test it, and since leafsize can only be determined at mkfs time, it's not an easy thing to try it.

nodatacow seems unsafe
Nodatacow is not that unsafe, as btrfs will still do data cow if it's needed, like rewriting data of another subvolume/snapshot.

That would be one of the most obvious method if you do a lot of rewrite.

as for defrag, all my partitions are already on
autodefrag, so I assume that should be good. Or is manual once in a
while a good idea as well?
AFAIK autodefrag will only help if you're doing appending write.

Manual one will help more, but since btrfs has problem defraging extents shared by different subvolumes, I doubt the effect if you have a lot of subvolumes/snapshots.


Another method is to disable compression.
For compression, file extent size up limit is 128K, while for non-compress case, it's 128M.

So for the same 1G sized file, it would cause 8K extents using compression, while only 8 extents without compression.


Is there a way to display the tree size? that would help knowing what
worked and what didn't.

You can dump the whole extent tree to get the accurate size:

# btrfs-debug-tree -t 2 <your dev> > some_file

It may be quite long, so output redirection is highly recommended.
You can do it online(mounted), but if the fs is very very large, it's recommended to do it offline(unmounted), or at least make sure there is not much write while mounted.

Check the first few line then you can already get the overall size:

------
btrfs-progs v4.6.1
extent tree key (EXTENT_TREE ROOT_ITEM 0)
node 30441472 level *1* items 41 free 452 generation 7 owner 2
------

If the level is high (7 is the highest possible value), it's almost sure that's the problem.

For accurate space size, use the following scrip
t to get the number of extent tree blocks:

------
$ egrep -e "^node" -e "^leaf" some_file | wc -l
------

Then multiple it by nodesize, you get the accurate size of extent tree.

Thanks,
Qu


    [Btrfsck OOM]
    Lu Fengqi is developing btrfsck low memory usage mode.
    It's not merged into mainline btrfs progs and not fully completely, but
    shows quite positive result for large fs.

    It may needs sometime to get it stable, but IMHO it's going the right
    direction.

Well that is great news as well, thank you for sharing it!

    Thanks,
    Qu


Thank you!
John


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to