Bug#663237: e2fsprogs: [patch] Make resize2fs shrinking use much less CPU

2012-03-09 Thread Sami Liedes
Package: e2fsprogs
Version: 1.42.1-2
Severity: normal
Tags: patch

Shrinking a filesystem is currently very CPU intensive. On a 100G ext4
filesystem filled to 50% full by making copies of /usr/share/doc to
$MNTPOINT/doc{1,2,3,...}, running resize2fs on a fairly slow (~5400
rpm) disk and a fast Core i7 processor:

# time resize2fs -p $FILESYSTEM 75G
[...]
real96m26.852s
user56m39.768s
sys 0m27.922s

Apparently shrinking larger filesystems causes even larger slowdowns
(CPU time consumed grows superlinearly), but I don't have the numbers
to back this up right now.

Turns out the slow and CPU intensive phase is the Scanning inode
table phase. The reason for this is that the inode table is scanned
bit by bit in ext2fs_new_inode(). It appears that in many cases a huge
string of bits needs to be tested to find the first zero bit. It
doesn't help that for each bit test a modulo operation is done in
ext2fs_new_inode(), after which the bit test is done by calling this
chain of functions for each bit:

ext2fs_new_inode() - ext2fs_fast_test_inode_bitmap2() [inlined] -
ext2fs_test_generic_bmap() - ext2fs_test_generic_bitmap() -
ext2fs_test_bit().

I also noted that resize2fs does not call ext2fs_open2() with
EXT2_FLAG_64BITS set; is there a reason for that? Adding that flag
change causes a minor speedup by eliminating
ext2fs_test_generic_bitmap() from the call chain.

I don't know if an algorithmic change would be the proper solution
for this, but I managed to make the process a lot faster (fast enough
to not be a bottleneck in this case, but perhaps it should also be
tested on both bigger filesystems and SSDs) by changing the following
while trying to affect the performance of other code in libext2fs as
little as possible:

0. Original CPU time: 3400 seconds.

1. Move the modulo out of the loop in ext2fs_new_inode().
   CPU time: 2585 seconds.

2. Call ext2fs_open2() with EXT2_FLAG_64BITS. CPU time: 2057 seconds.

3a. Implement a function, ext2fs_find_first_zero_generic_bmap(), which
finds the first zero bit in a bitmap. Defaults to doing pretty
much what ext2fs_new_inode() was doing if the relevant (new)
function in the bitmap_ops struct is NULL. Change
ext2fs_new_inode() to use it.

3b. Implement optimized ba_find_first_zero() for bitarray bitmaps.
Simply scan the bitarray byte by byte to find a value != 0xff.
CPU time: 45 seconds. This still appears to translate to 45
seconds real time.

4. Further optimize ba_find_first_zero() to scan in (aligned) 64-bit
   chunks CPU time: 17 seconds

After these optimizations, the wall time taken to resize the same
filesystem has decreased from 96m27s to 39m49s:

# time resize2fs -p $FILESYSTEM 75G
real39m49.287s
user0m16.601s
sys 0m26.606s

dumpe2fs output from resulting filesystems before and after my
modifications only differ in last write timestamp. e2fsck -f
$FILESYSTEM reports no problems.

A tentative (or proof of concept) patch is attached. Some concerns
that I have:

1. How should internal errors, like functions returning EINVAL when
   that ostensibly cannot be due to invalid filesystem, be handled?
   Currently ext2fs_new_inode() just propagates the error further:

+   if (err != ENOENT)
+   return err; /* Internal error? */

2. My additions in blkmap64_ba.c rely on the availability of
   stdint.h and intptr_t (to detect when a pointer is 8-byte
   aligned). Perhaps that should be done in another way if C89
   compatibility is a concern?

3. 64-bit scans might be slower on a system with native register size
   64 bits. I doubt they will be slower than the code used to be,
   though.

4. What is bitmap-cluster_bits and how should it be handled? in
   ba_find_first_zero(); similar code in
   ext2fs_find_first_zero_generic_bmap():

+   /* FIXME what is this? */
+   if (bitmap-cluster_bits)
+   return EINVAL;

5. Is returning ENOENT the preferred way of indicating not found?
   How is this supposed to integrate in the com_err framework?

I'd be happy to update the patch to fix any remaining issues, unless
you think there's another, cleaner solution to the slowness.

By the way, compiling e2fsprogs with -Wall reveals things like missing
headers:

../../lib/ext2fs/ext2fs.h:1712:3: warning: implicit declaration of function 
‘open64’ [-Wimplicit-function-declaration]

Sami


-- System Information:
Debian Release: wheezy/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.9 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages e2fsprogs depends on:
ii  e2fslibs1.42.1-2
ii  libblkid1   2.20.1-4
ii  libc6   2.13-27
ii  libcomerr2  1.42.1-2
ii  libss2  1.42.1-2
ii  libuuid12.20.1-4
ii  util-linux  2.20.1-4

e2fsprogs recommends no packages.

Versions of packages e2fsprogs suggests:
ii  e2fsck-static 

Bug#663237: e2fsprogs: [patch] Make resize2fs shrinking use much less CPU

2012-03-09 Thread Ted Ts'o
On Fri, Mar 09, 2012 at 08:11:05PM +0200, Sami Liedes wrote:
 Shrinking a filesystem is currently very CPU intensive. On a 100G ext4
 filesystem filled to 50% full by making copies of /usr/share/doc to
 $MNTPOINT/doc{1,2,3,...}, running resize2fs on a fairly slow (~5400
 rpm) disk and a fast Core i7 processor

Thanks for looking into this!

 I also noted that resize2fs does not call ext2fs_open2() with
 EXT2_FLAG_64BITS set; is there a reason for that? Adding that flag
 change causes a minor speedup by eliminating
 ext2fs_test_generic_bitmap() from the call chain.

Yeah, resize2fs hasn't had work done on it to support 64-bit file
systems, which is why it hasn't been changed to set that flag.  That
flag only means that it was compiled with the new 64-bit capable
header files, though, so it's safe to set that flag.  We just hadn't
gotten around to it.

 After these optimizations, the wall time taken to resize the same
 filesystem has decreased from 96m27s to 39m49s:

Cool!!

 
 A tentative (or proof of concept) patch is attached. Some concerns
 that I have:
 
 1. How should internal errors, like functions returning EINVAL when
that ostensibly cannot be due to invalid filesystem, be handled?
Currently ext2fs_new_inode() just propagates the error further:
 
 +   if (err != ENOENT)
 +   return err; /* Internal error? */

In general, errors get propagated upwords, although if a function can
recover in some way, or wishes to translate the error from ENOENT to
some new com_err defined code, so that the user gets a more specific
error message (i.e, Configuration file not found instead of just
file not found), it can do that.

 2. My additions in blkmap64_ba.c rely on the availability of
stdint.h and intptr_t (to detect when a pointer is 8-byte
aligned). Perhaps that should be done in another way if C89
compatibility is a concern?

You could just cast the pointer to an unsigned long before you mask it
with 0x07  in the highly unlikely case that the pointer is wider
than an unsigned long, it won't matter since you really want to look
at the low three bits anyway.

 3. 64-bit scans might be slower on a system with native register size
64 bits. I doubt they will be slower than the code used to be,
though.

Probably; it's easy enough to test on a 32-bit x86 binary.


 4. What is bitmap-cluster_bits and how should it be handled? in
ba_find_first_zero(); similar code in
ext2fs_find_first_zero_generic_bmap():
 
 +   /* FIXME what is this? */
 +   if (bitmap-cluster_bits)
 +   return EINVAL;
 

It's something which is only used for bigalloc file systems, where
it indicates the cluster size.  With this feature, instead of the
block allocation bitmap indicate block numbers being free, it
indicates cluster numbers being free, where a cluster is a power of
two number of blocks.  It's not something you need to worry about in
this context, because resize2fs doesn't support bigalloc file systems
yet, and it doesn't make much difference in terms of how the libext2fs
library functions work.

 5. Is returning ENOENT the preferred way of indicating not found?
How is this supposed to integrate in the com_err framework?

What is it that is not being found?  A bit in the bitmap?  In that
case I'd probably define a new com_err error code.

 I'd be happy to update the patch to fix any remaining issues, unless
 you think there's another, cleaner solution to the slowness.

It would be really great if you could break things into small logical
patches, and send them to the linux-ext4 list for review.

Thanks,

- Ted



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org