On Tue, 14 May 2013 18:55:23 +0800, Liu Bo wrote: > On Tue, May 14, 2013 at 11:36:52AM +0200, Stefan Behrens wrote: >> Mapping UUIDs to subvolume IDs is an operation with a high effort >> today. Today, the algorithm even has quadratic effort (based on the >> number of existing subvolumes), which means, that it takes minutes >> to send/receive a single subvolume if 10,000 subvolumes exist. But >> even linear effort would be too much since it is a waste. And these >> data structures to allow mapping UUIDs to subvolume IDs are created >> every time a btrfs send/receive instance is started. >> >> So the issue to address is that Btrfs send / receive does not work >> as it is today when a high number of subvolumes exist. >> >> It is much more efficient to maintain a searchable persistent data >> structure in the filesystem, one that is updated whenever a >> subvolume/snapshot is created and deleted, and when the received >> subvolume UUID is set by the btrfs-receive tool. >> >> Therefore kernel code is added that is able to maintain data >> structures in the filesystem that allow to quickly search for a >> given UUID and to retrieve the subvol ID. >> >> Now follows the lengthy justification, why a new tree was added >> instead of using the existing root tree: >> >> The first approach was to not create another tree that holds UUID >> items. Instead, the items should just go into the top root tree. >> Unfortunately this confused the algorithm to assign the objectid >> of subvolumes and snapshots. The reason is that >> btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for >> the first created subvol or snapshot after mounting a filesystem, >> and this function simply searches for the largest used objectid in >> the root tree keys to pick the next objectid to assign. Of course, >> the UUID keys have always been the ones with the highest offset >> value, and the next assigned subvol ID was wastefully huge. >> >> To use any other existing tree did not look proper. To apply a >> workaround such as setting the objectid to zero in the UUID item >> key and to implement collision handling would either add >> limitations (in case of a btrfs_extend_item() approach to handle >> the collisions) or a lot of complexity and source code (in case a >> key would be looked up that is free of collisions). Adding new code >> that introduces limitations is not good, and adding code that is >> complex and lengthy for no good reason is also not good. That's the >> justification why a completely new tree was introduced. > > I'd appreciate if some performance number appear here since it's a speedup.
That's a good idea. The numbers are below in the table and there's also a link to a chart. I stopped the measurement with the old version after 10000 subvolumes because it already took almost 13 minutes to send a single, empty subvolume. All the time is spent building a database, and this is done each time the btrfs send or receive tool is started to send or receive a single subvolume. The table shows the time it takes to send a single, empty subvolume depending on the number of subvolume that exist in the filesystem. # of subvols | without | with in filesystem | UUID tree | UUID tree --------------+------------+---------- 2 | 0m00.004s | 0m00.003s 1000 | 0m07.010s | 0m00.004s 2000 | 0m28.210s | 0m00.004s 3000 | 1m04.872s | 0m00.004s 4000 | 1m56.059s | 0m00.004s 5000 | 3m00.489s | 0m00.004s 6000 | 4m27.376s | 0m00.004s 7000 | 6m08.938s | 0m00.004s 8000 | 7m54.020s | 0m00.004s 9000 | 10m05.108s | 0m00.004s 10000 | 12m47.406s | 0m00.004s Or as a chart: http://btrfs.giantdisaster.de/Btrfs-send-recv-perf.pdf The Hardware: Intel(R) Xeon(R) CPU X3450 @ 2.67GHz 8 GB RAM 6 high performance SSDs The script: #!/bin/bash set -e MOUNT=/mnt2 umount $MOUNT || true mkfs.btrfs -f -m raid0 -d raid0 -n 32768 /dev/sdc /dev/sdj /dev/sds /dev/sdt /dev/sdu /dev/sdv mount /dev/sdc $MOUNT btrfs subv create $MOUNT/0 btrfs subv snapshot -r $MOUNT/0 $MOUNT/0ro > /dev/null echo '2 subvols' time btrfs send $MOUNT/0ro > /dev/null umount $MOUNT mkfs.btrfs -f -m raid0 -d raid0 -n 32768 /dev/sdc /dev/sdj /dev/sds /dev/sdt /dev/sdu /dev/sdv mount /dev/sdc $MOUNT for i in `seq 1000`; do btrfs subv create $MOUNT/$i for j in `seq 500`; do btrfs subv create $MOUNT/$i/$j > /dev/null btrfs subv snapshot -r $MOUNT/$i/$j $MOUNT/$i/${j}ro > /dev/null done echo $i 'k subvols' time btrfs send $MOUNT/$i/1ro > /dev/null done -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html