Re: Status of FST and mount times

Austin S. Hemmelgarn Tue, 20 Feb 2018 07:42:01 -0800

On 2018-02-20 09:59, Ellis H. Wilson III wrote:

On 02/16/2018 07:59 PM, Qu Wenruo wrote:

On 2018年02月16日 22:12, Ellis H. Wilson III wrote:

$ sudo btrfs-debug-tree -t chunk /dev/sdb | grep CHUNK_ITEM | wc -l
3454


OK, this explains everything.

There are too many chunks.
This means at mount you need to search for block group item 3454 times.

Even each search only needs to iterate 3 tree blocks, multiply it 3454
it would still be a big work.
Although some tree blocks like the root node and level 1 nodes can be
cached, we still need to read about 3500 tree blocks.

If the fs is created using 16K nodesize, this means you need to do
random read for 54M using 16K blocksize.

No wonder it will takes some time.

Normally I would expect 1G chunk for each data and metadata chunk.

If there is nothing special, it means your filesystem is already larger
than 3T.
If your used space is way smaller (less than 30%) than 3.5T, then this
means your chunk usage is pretty low, and in that case, balance to
reduce number of chunks (block groups) would reduce mount time.

The nodesize is 16K, and the filesystem data is 3.32TiB as reported bybtrfs fi df. So, from what I am hearing, this mount time is normal fora filesystem this size. Ignoring a more complex and proper fix like theones we've been discussing, would bumping the nodesize reduce the numberof chunks, thereby reducing the mount time?

It would probably not. Chunk size is only based on the total size ofthe filesystem, with reasonable base values, so you would still need tohave at least as many chunks to store the same amount of data (increasethe node size too much though, and you will end up with more chunks,because you'll have more empty space wasted).

I don't see why balance would come into play here -- my understandingwas that was for aged filesystems. The only operations I've done onhere was:
1. Format filesystem clean
2. Create a subvolume
3. rsync our home directories into that new subvolume
4. Create another subvolume
5. rsync our home directories into that new subvolume
Accordingly, zero (or at least, extremely little) data should have beenoverwritten, so I would expect things to be fairly well allocatedalready. Please correct me if this is naive thinking.

Your logic is in general correct regarding data, but not necessarilymetadata. Assuming you did not use the `--inplace` option for rsync, ithad to issue a rename for each individual file that got copied in, andas a result there was likely a lot of metadata being rewritten.

As far as balance being for aged filesystems, that's not exactly true.There are four big reasons you might run a balance:

1. As part of reshaping a volume. You generally want run a balancewhenever the number of disks in a volume permanently increases (it willhappen automatically when it permanently decreases, as the devicedeletion operation is a special type of balance under the hood). It'salso used for converting chunk profiles.2. To free up empty space inside chunks when the filesystem is full atthe chunk level.3. To redistribute data across multiple disks in a more even mannerafter deleting a lot of data.

4. To reduce the likelihood of 2 or 3 being an issue.

Reasons 2 and 3 are generally more likely to be needed on old volumes.Reason 1 is independent of the age of a volume. Reason 4 is the reasonfor the regular filtered balances that I and some other people recommendbe run as part of preventative maintenance, and is also generallyindependent of the age of a volume.

Qu's suggestion is actually independent of all the above reasons, butdoes kind of fit in with the fourth as another case of preventativemaintenance.

I was using btrfs sub del -C for the deletions, so I believe (if that
command truly waits for the subvolume to be utterly gone) it captures
the entirety of the snapshot.
No, snapshot deletion is completely delayed in background.

-C only ensures that even a powerloss happen after command return, you
won't see the snapshot anywhere, but it will still be deleted inbackground.
Ah, I had no idea. Thank you! Is there any way to "encourage"btrfs-cleaner to run at specific times, which I presume is the snapshotdeletion process you are referring to? If it can be told to run at agiven time, can I throttle how fast it works, such that I avoid some ofthe high foreground interruption I've seen in the past?

I don't think there's any way to do this right now (though it would benice if there was). In theory, you could adjust the priority of thekernel thread itself, but messing around with kthread priorities isseriously dangerous even if you know exactly what you're doing.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Status of FST and mount times

Reply via email to