It's expensive to set buffer flags that are already set, because that causes a costly cache line transition.
A common case is setting the "verified" flag during ext4 writes. This patch checks for the flag being set first. With the AIM7/creat-clo benchmark testing on a 48G ramdisk based-on ext4 file system, we see 3.3%(15431->15936) improvement of aim7.jobs-per-min on a 2-sockets broadwell platform. What the benchmark does is: it forks 3000 processes, and each process do the following: a) open a new file b) close the file c) delete the file until loop=100*1000 times. The original patch is contributed by Andi Kleen. Signed-off-by: Andi Kleen <a...@linux.intel.com> Signed-off-by: Kemi Wang <kemi.w...@intel.com> Tested-by: Kemi Wang <kemi.w...@intel.com> Reviewed-by: Jens Axboe <ax...@kernel.dk> --- include/linux/buffer_head.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index c8dae55..211d8f5 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -80,11 +80,14 @@ struct buffer_head { /* * macro tricks to expand the set_buffer_foo(), clear_buffer_foo() * and buffer_foo() functions. + * To avoid reset buffer flags that are already set, because that causes + * a costly cache line transition, check the flag first. */ #define BUFFER_FNS(bit, name) \ static __always_inline void set_buffer_##name(struct buffer_head *bh) \ { \ - set_bit(BH_##bit, &(bh)->b_state); \ + if (!test_bit(BH_##bit, &(bh)->b_state)) \ + set_bit(BH_##bit, &(bh)->b_state); \ } \ static __always_inline void clear_buffer_##name(struct buffer_head *bh) \ { \ -- 2.7.4