At 07/15/2016 12:39 PM, John Ettedgui wrote:
On Thu, Jul 14, 2016 at 8:56 PM Qu Wenruo <quwen...@cn.fujitsu.com
<mailto:quwen...@cn.fujitsu.com>> wrote:
Sorry for the late reply.
Oh it's all good, it's only a been a few days.
[Slow mount]
In fact we also reproduce the same problem, and found the problem.
Awesome!
It's related to the size of extent tree.
If the extent tree is large enough, mount needs to do quite a lot of IO
to read out all block group items.
And such read is random small read (default leaf size is just 16K), and
considering the per GB cost, spinning rust is the normal choice for such
large fs, which makes random small read even more slower.
The good news is, we have patch to slightly speedup the mount, by
avoiding reading out unrelated tree blocks.
In our test environment, it takes 15% less time to mount a fs filled
with 16K files(2T used space).
https://patchwork.kernel.org/patch/9021421/
Great, I will try this and report on it.
And according to the facts that only extent size is related to the
problem, any method to reduce extent tree size will help, including
defrag, nodatacow.
Would increasing the leaf size help as well?
May help.
But didn't test it, and since leafsize can only be determined at mkfs
time, it's not an easy thing to try it.
nodatacow seems unsafe
Nodatacow is not that unsafe, as btrfs will still do data cow if it's
needed, like rewriting data of another subvolume/snapshot.
That would be one of the most obvious method if you do a lot of rewrite.
as for defrag, all my partitions are already on
autodefrag, so I assume that should be good. Or is manual once in a
while a good idea as well?
AFAIK autodefrag will only help if you're doing appending write.
Manual one will help more, but since btrfs has problem defraging extents
shared by different subvolumes, I doubt the effect if you have a lot of
subvolumes/snapshots.
Another method is to disable compression.
For compression, file extent size up limit is 128K, while for
non-compress case, it's 128M.
So for the same 1G sized file, it would cause 8K extents using
compression, while only 8 extents without compression.
Is there a way to display the tree size? that would help knowing what
worked and what didn't.
You can dump the whole extent tree to get the accurate size:
# btrfs-debug-tree -t 2 <your dev> > some_file
It may be quite long, so output redirection is highly recommended.
You can do it online(mounted), but if the fs is very very large, it's
recommended to do it offline(unmounted), or at least make sure there is
not much write while mounted.
Check the first few line then you can already get the overall size:
------
btrfs-progs v4.6.1
extent tree key (EXTENT_TREE ROOT_ITEM 0)
node 30441472 level *1* items 41 free 452 generation 7 owner 2
------
If the level is high (7 is the highest possible value), it's almost sure
that's the problem.
For accurate space size, use the following scrip
t to get the number of extent tree blocks:
------
$ egrep -e "^node" -e "^leaf" some_file | wc -l
------
Then multiple it by nodesize, you get the accurate size of extent tree.
Thanks,
Qu
[Btrfsck OOM]
Lu Fengqi is developing btrfsck low memory usage mode.
It's not merged into mainline btrfs progs and not fully completely, but
shows quite positive result for large fs.
It may needs sometime to get it stable, but IMHO it's going the right
direction.
Well that is great news as well, thank you for sharing it!
Thanks,
Qu
Thank you!
John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html