Bug#663237: e2fsprogs: [patch] Make resize2fs shrinking use much less CPU
Package: e2fsprogs Version: 1.42.1-2 Severity: normal Tags: patch Shrinking a filesystem is currently very CPU intensive. On a 100G ext4 filesystem filled to 50% full by making copies of /usr/share/doc to $MNTPOINT/doc{1,2,3,...}, running resize2fs on a fairly slow (~5400 rpm) disk and a fast Core i7 processor: # time resize2fs -p $FILESYSTEM 75G [...] real96m26.852s user56m39.768s sys 0m27.922s Apparently shrinking larger filesystems causes even larger slowdowns (CPU time consumed grows superlinearly), but I don't have the numbers to back this up right now. Turns out the slow and CPU intensive phase is the Scanning inode table phase. The reason for this is that the inode table is scanned bit by bit in ext2fs_new_inode(). It appears that in many cases a huge string of bits needs to be tested to find the first zero bit. It doesn't help that for each bit test a modulo operation is done in ext2fs_new_inode(), after which the bit test is done by calling this chain of functions for each bit: ext2fs_new_inode() - ext2fs_fast_test_inode_bitmap2() [inlined] - ext2fs_test_generic_bmap() - ext2fs_test_generic_bitmap() - ext2fs_test_bit(). I also noted that resize2fs does not call ext2fs_open2() with EXT2_FLAG_64BITS set; is there a reason for that? Adding that flag change causes a minor speedup by eliminating ext2fs_test_generic_bitmap() from the call chain. I don't know if an algorithmic change would be the proper solution for this, but I managed to make the process a lot faster (fast enough to not be a bottleneck in this case, but perhaps it should also be tested on both bigger filesystems and SSDs) by changing the following while trying to affect the performance of other code in libext2fs as little as possible: 0. Original CPU time: 3400 seconds. 1. Move the modulo out of the loop in ext2fs_new_inode(). CPU time: 2585 seconds. 2. Call ext2fs_open2() with EXT2_FLAG_64BITS. CPU time: 2057 seconds. 3a. Implement a function, ext2fs_find_first_zero_generic_bmap(), which finds the first zero bit in a bitmap. Defaults to doing pretty much what ext2fs_new_inode() was doing if the relevant (new) function in the bitmap_ops struct is NULL. Change ext2fs_new_inode() to use it. 3b. Implement optimized ba_find_first_zero() for bitarray bitmaps. Simply scan the bitarray byte by byte to find a value != 0xff. CPU time: 45 seconds. This still appears to translate to 45 seconds real time. 4. Further optimize ba_find_first_zero() to scan in (aligned) 64-bit chunks CPU time: 17 seconds After these optimizations, the wall time taken to resize the same filesystem has decreased from 96m27s to 39m49s: # time resize2fs -p $FILESYSTEM 75G real39m49.287s user0m16.601s sys 0m26.606s dumpe2fs output from resulting filesystems before and after my modifications only differ in last write timestamp. e2fsck -f $FILESYSTEM reports no problems. A tentative (or proof of concept) patch is attached. Some concerns that I have: 1. How should internal errors, like functions returning EINVAL when that ostensibly cannot be due to invalid filesystem, be handled? Currently ext2fs_new_inode() just propagates the error further: + if (err != ENOENT) + return err; /* Internal error? */ 2. My additions in blkmap64_ba.c rely on the availability of stdint.h and intptr_t (to detect when a pointer is 8-byte aligned). Perhaps that should be done in another way if C89 compatibility is a concern? 3. 64-bit scans might be slower on a system with native register size 64 bits. I doubt they will be slower than the code used to be, though. 4. What is bitmap-cluster_bits and how should it be handled? in ba_find_first_zero(); similar code in ext2fs_find_first_zero_generic_bmap(): + /* FIXME what is this? */ + if (bitmap-cluster_bits) + return EINVAL; 5. Is returning ENOENT the preferred way of indicating not found? How is this supposed to integrate in the com_err framework? I'd be happy to update the patch to fix any remaining issues, unless you think there's another, cleaner solution to the slowness. By the way, compiling e2fsprogs with -Wall reveals things like missing headers: ../../lib/ext2fs/ext2fs.h:1712:3: warning: implicit declaration of function ‘open64’ [-Wimplicit-function-declaration] Sami -- System Information: Debian Release: wheezy/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 3.2.9 (SMP w/8 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages e2fsprogs depends on: ii e2fslibs1.42.1-2 ii libblkid1 2.20.1-4 ii libc6 2.13-27 ii libcomerr2 1.42.1-2 ii libss2 1.42.1-2 ii libuuid12.20.1-4 ii util-linux 2.20.1-4 e2fsprogs recommends no packages. Versions of packages e2fsprogs suggests: ii e2fsck-static
Bug#663237: e2fsprogs: [patch] Make resize2fs shrinking use much less CPU
On Fri, Mar 09, 2012 at 08:11:05PM +0200, Sami Liedes wrote: Shrinking a filesystem is currently very CPU intensive. On a 100G ext4 filesystem filled to 50% full by making copies of /usr/share/doc to $MNTPOINT/doc{1,2,3,...}, running resize2fs on a fairly slow (~5400 rpm) disk and a fast Core i7 processor Thanks for looking into this! I also noted that resize2fs does not call ext2fs_open2() with EXT2_FLAG_64BITS set; is there a reason for that? Adding that flag change causes a minor speedup by eliminating ext2fs_test_generic_bitmap() from the call chain. Yeah, resize2fs hasn't had work done on it to support 64-bit file systems, which is why it hasn't been changed to set that flag. That flag only means that it was compiled with the new 64-bit capable header files, though, so it's safe to set that flag. We just hadn't gotten around to it. After these optimizations, the wall time taken to resize the same filesystem has decreased from 96m27s to 39m49s: Cool!! A tentative (or proof of concept) patch is attached. Some concerns that I have: 1. How should internal errors, like functions returning EINVAL when that ostensibly cannot be due to invalid filesystem, be handled? Currently ext2fs_new_inode() just propagates the error further: + if (err != ENOENT) + return err; /* Internal error? */ In general, errors get propagated upwords, although if a function can recover in some way, or wishes to translate the error from ENOENT to some new com_err defined code, so that the user gets a more specific error message (i.e, Configuration file not found instead of just file not found), it can do that. 2. My additions in blkmap64_ba.c rely on the availability of stdint.h and intptr_t (to detect when a pointer is 8-byte aligned). Perhaps that should be done in another way if C89 compatibility is a concern? You could just cast the pointer to an unsigned long before you mask it with 0x07 in the highly unlikely case that the pointer is wider than an unsigned long, it won't matter since you really want to look at the low three bits anyway. 3. 64-bit scans might be slower on a system with native register size 64 bits. I doubt they will be slower than the code used to be, though. Probably; it's easy enough to test on a 32-bit x86 binary. 4. What is bitmap-cluster_bits and how should it be handled? in ba_find_first_zero(); similar code in ext2fs_find_first_zero_generic_bmap(): + /* FIXME what is this? */ + if (bitmap-cluster_bits) + return EINVAL; It's something which is only used for bigalloc file systems, where it indicates the cluster size. With this feature, instead of the block allocation bitmap indicate block numbers being free, it indicates cluster numbers being free, where a cluster is a power of two number of blocks. It's not something you need to worry about in this context, because resize2fs doesn't support bigalloc file systems yet, and it doesn't make much difference in terms of how the libext2fs library functions work. 5. Is returning ENOENT the preferred way of indicating not found? How is this supposed to integrate in the com_err framework? What is it that is not being found? A bit in the bitmap? In that case I'd probably define a new com_err error code. I'd be happy to update the patch to fix any remaining issues, unless you think there's another, cleaner solution to the slowness. It would be really great if you could break things into small logical patches, and send them to the linux-ext4 list for review. Thanks, - Ted -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org