Reviewed by: Matthew Ahrens mahr...@delphix.com Reviewed by: Pavel Zakharov pavel.zakha...@delphix.com Reviewed by: Brad Lewis <brad.le...@delphix.com> Reviewed by: George Wilson <george.wil...@delphix.com> Reviewed by: Paul Dagnelie <p...@delphix.com> Reviewed by: Prashanth Sreenivasa <p...@delphix.com>
Overview In analyzing the time it takes for a Delphix Engine to come up following a planned or unplanned reboot, we've determined that the SMF service (filesystem/local) that's responsible for mounting all local filesystems (except for /) is responsible for a significant percentage of the boot time. The longer it takes for the Delphix Engine to come up, the longer the Delphix Engine is unavailable during these outages. For example, on a Delphix Engine with roughly 3000 filesystems, we have the following breakdown of "filesystem/local" start time for a sample of 74 reboots: # NumSamples = 74; Min = 0.00; Max = 782.00 # Mean = 186.972973; Variance = 17853.891161; SD = 133.618454; Median 156.000000 # each * represents a count of 1 0.0000 - 78.2000 [ 10]: ********** 78.2000 - 156.4000 [ 27]: *************************** 156.4000 - 234.6000 [ 17]: ***************** 234.6000 - 312.8000 [ 8]: ******** 312.8000 - 391.0000 [ 8]: ******** 391.0000 - 469.2000 [ 1]: * 469.2000 - 547.4000 [ 1]: * 547.4000 - 625.6000 [ 1]: * 625.6000 - 703.8000 [ 0]: 703.8000 - 782.0000 [ 1]: * On average, it takes over 3 minutes to mount local filesystems on that system. A sampling of 56 reboots on another system which has 9000+ filesystems is below: # NumSamples = 56; Min = 0.00; Max = 1377.00 # Mean = 175.250000; Variance = 54092.223214; SD = 232.577349; Median 118.000000 # each * represents a count of 1 0.0000 - 137.7000 [ 37]: ************************************* 137.7000 - 275.4000 [ 11]: *********** 275.4000 - 413.1000 [ 4]: **** 413.1000 - 550.8000 [ 1]: * 550.8000 - 688.5000 [ 1]: * 688.5000 - 826.2000 [ 0]: 826.2000 - 963.9000 [ 0]: 963.9000 - 1101.6000 [ 1]: * 1101.6000 - 1239.3000 [ 0]: 1239.3000 - 1377.0000 [ 1]: * Mounting of filesystems in "filesystem/local" is done using `zfs mount -a`, which mounts each filesystems serially. The bottleneck for each mount is the I/O done to load metadata for each filesystem. As such, mounting filesystems using a parallel algorithm should be a big win, and bring down the runtime of "filesystem/local"'s start method. Performance Testing: System Configuration To test and verify these changes impacted performance how we expected it to, we used a VM with: - 8 vCPUs - zpool with 10 10k-SAS disks - filesystem hierarchy like so: 1 pool 2 groups 100 containers 2 timeflows 5 leaf datasets per group per container per timeflow test-pool-+-group-0-+-container-0-+---timeflow-0---+-ds-0 | | | +-ds-1 | | | +-ds-2 | | | +-ds-3 | | | +-ds-4 | | | | | +---timeflow-1---+-ds-0 | | +-ds-1 | | +-ds-2 | | +-ds-3 | | +-ds-4 | | | +-container-1-+---timeflow-0---+-ds-0 | | | +-ds-1 | | | +-ds-2 | | | +-ds-3 | | | +-ds-4 | | | | | +---timeflow-1---+-ds-0 | | +-ds-1 | | +-ds-2 | | +-ds-3 | | +-ds-4 | + ... | . | . | +-group-1 ... This makes for a total of 2603 filesystems: pool + groups + containers + timeflows + leaves 1 + 2 + 2*100 + 2(2*100) + 5(2(2*100)) = 2603 filesystems Additionally, a 1MB file was created in each leaf dataset. Because this filesystem heirarchy is not very deep, this lends itself well to the new parallel mounting algorithm implemented. Performance Testing: Methodology and Results The system described above was rebooted 10 times, and the duration of the start method of "filesystem/local" was measured. Specifically, the "zfs mount -va" comamnd that it calls was instrumented to break down the phases of the mounting process into three buckets: 1. gathering the list of filesystems to mount (aka "load") 2. mounting all filesystems (aka "mount") 3. left-over time spent doing anything else (aka "other") The results of these measurements is below: | other (s) | load (s) | mount (s) | ----+-----------+----------+-----------+ Before | 1.5 | 8.1 | 45.5 | ----+-------+------+-------+-----------+ After | 1.7 | 7.9 | 2.1 | ----+-----------+----------+-----------+ In summary, for this configuration, the filesystem/local SMF services goes from taking an average of 55.1 seconds (+/- 1.0s) to an average of 11.7 seconds (+/- 0.8s). The "other" and "load" times remain unchanged (unsurprising given that this project hasn't touched any code in those areas). The big win comes in the "mount" phase, which reduces the time from roughly 45 seconds to 2 seconds; a 95% decrease in latency. Using the same zpool as above, "zpool import" performance was also tested; the mounting done by "zpool import" now uses the same framework as "zfs mount -a". Performance improvement for this case is unsurprisingly on par with the "zfs mount -a" improvement documented above. Upstream bugs: DLPX-46555, DLPX-49847, DLPX-49351, 38457 You can view, comment on, or merge this pull request online at: https://github.com/openzfs/openzfs/pull/536 -- Commit Summary -- * 8115 parallel zfs mount (v2) -- File Changes -- M usr/src/cmd/zfs/Makefile (3) M usr/src/cmd/zfs/zfs_main.c (122) M usr/src/lib/Makefile (2) A usr/src/lib/libfakekernel/common/synch.h (25) M usr/src/lib/libzfs/Makefile.com (7) M usr/src/lib/libzfs/common/libzfs.h (5) M usr/src/lib/libzfs/common/libzfs_dataset.c (30) M usr/src/lib/libzfs/common/libzfs_impl.h (9) M usr/src/lib/libzfs/common/libzfs_mount.c (408) M usr/src/lib/libzfs/common/mapfile-vers (4) D usr/src/lib/libzfs/common/sys/zfs_context.h (37) M usr/src/pkg/manifests/system-test-zfstest.mf (5) M usr/src/test/zfs-tests/runfiles/delphix.run (2) M usr/src/test/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount.kshlib (8) A usr/src/test/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_all_fail.ksh (96) A usr/src/test/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_all_mountpoints.ksh (162) M usr/src/uts/common/fs/hsfs/hsfs_vfsops.c (3) M usr/src/uts/common/fs/pcfs/pc_vfsops.c (5) M usr/src/uts/common/fs/udfs/udf_vfsops.c (19) M usr/src/uts/common/fs/ufs/ufs_vfsops.c (8) M usr/src/uts/common/fs/vfs.c (8) M usr/src/uts/common/fs/zfs/sys/dsl_pool.h (2) M usr/src/uts/common/sys/vfs.h (3) -- Patch Links -- https://github.com/openzfs/openzfs/pull/536.patch https://github.com/openzfs/openzfs/pull/536.diff -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/536 ------------------------------------------ openzfs-developer Archives: https://openzfs.topicbox.com/groups/developer/discussions/T22334a01fda83bfd-Mb3c093e0f6c130ef57302a87 Powered by Topicbox: https://topicbox.com