It's expensive to set buffer flags that are already set, because that
causes a costly cache line transition.

A common case is setting the "verified" flag during ext4 writes.
This patch checks for the flag being set first.

With the AIM7/creat-clo benchmark testing on a 48G ramdisk based-on ext4
file system, we see 3.3%(15431->15936) improvement of aim7.jobs-per-min on
a 2-sockets broadwell platform.

What the benchmark does is: it forks 3000 processes, and each  process do
the following:
a) open a new file
b) close the file
c) delete the file
until loop=100*1000 times.

The original patch is contributed by Andi Kleen.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
Signed-off-by: Kemi Wang <kemi.w...@intel.com>
Tested-by: Kemi Wang <kemi.w...@intel.com>
Reviewed-by: Jens Axboe <ax...@kernel.dk>
---
 include/linux/buffer_head.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index c8dae55..211d8f5 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -80,11 +80,14 @@ struct buffer_head {
 /*
  * macro tricks to expand the set_buffer_foo(), clear_buffer_foo()
  * and buffer_foo() functions.
+ * To avoid reset buffer flags that are already set, because that causes
+ * a costly cache line transition, check the flag first.
  */
 #define BUFFER_FNS(bit, name)                                          \
 static __always_inline void set_buffer_##name(struct buffer_head *bh)  \
 {                                                                      \
-       set_bit(BH_##bit, &(bh)->b_state);                              \
+       if (!test_bit(BH_##bit, &(bh)->b_state))                        \
+               set_bit(BH_##bit, &(bh)->b_state);              \
 }                                                                      \
 static __always_inline void clear_buffer_##name(struct buffer_head *bh)        
\
 {                                                                      \
-- 
2.7.4

Reply via email to