Commit-ID:  ec81048cc340bb03334e6ca62661ecc0a684897a
Gitweb:     http://git.kernel.org/tip/ec81048cc340bb03334e6ca62661ecc0a684897a
Author:     Boqun Feng <boqun.f...@gmail.com>
AuthorDate: Wed, 23 Aug 2017 23:25:38 +0800
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 29 Aug 2017 15:14:38 +0200

sched/completion: Avoid unnecessary stack allocation for 
COMPLETION_INITIALIZER_ONSTACK()

In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the
stack allocation of the caller. However, on some compilers, a temporary
structure was allocated for the return value of
COMPLETION_INITIALIZER_ONSTACK().

For example in write_journal() with LOCKDEP_COMPLETIONS=y (GCC is 7.1.1):

        io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
            2462:       e8 00 00 00 00          callq  2467 <write_journal+0x47>
            2467:       48 8d 85 80 fd ff ff    lea    -0x280(%rbp),%rax
            246e:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi
            2475:       48 c7 c2 00 00 00 00    mov    $0x0,%rdx
                x->done = 0;
            247c:       c7 85 90 fd ff ff 00    movl   $0x0,-0x270(%rbp)
            2483:       00 00 00
                init_waitqueue_head(&x->wait);
            2486:       48 8d 78 18             lea    0x18(%rax),%rdi
            248a:       e8 00 00 00 00          callq  248f <write_journal+0x6f>
                if (commit_start + commit_sections <= ic->journal_sections) {
            248f:       41 8b 87 a8 00 00 00    mov    0xa8(%r15),%eax
                io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
            2496:       48 8d bd e8 f9 ff ff    lea    -0x618(%rbp),%rdi
            249d:       48 8d b5 90 fd ff ff    lea    -0x270(%rbp),%rsi
            24a4:       b9 17 00 00 00          mov    $0x17,%ecx
            24a9:       f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)
                if (commit_start + commit_sections <= ic->journal_sections) {
            24ac:       41 39 c6                cmp    %eax,%r14d
                io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
            24af:       48 8d bd 90 fd ff ff    lea    -0x270(%rbp),%rdi
            24b6:       48 8d b5 e8 f9 ff ff    lea    -0x618(%rbp),%rsi
            24bd:       b9 17 00 00 00          mov    $0x17,%ecx
            24c2:       f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)

We can obviously see the temporary structure allocated, and the compiler
also does two meaningless memcpy with "rep movsq".

And according to:

        https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs

The return value of a statement expression is returned by value, so the
temporary variable is created in COMPLETION_INITIALIZER_ONSTACK(), and
that's why the temporary structures are allocted.

To fix this, make the brace block in COMPLETION_INITIALIZER_ONSTACK()
return a pointer and dereference it outside the block rather than return
the whole structure, in this way, we are able to teach the compiler not
to do the unnecessary stack allocation.

This could also reduce the stack size even if !LOCKDEP, for example in
write_journal(), compiled with gcc 7.1.1, the result of command:

         objdump -d drivers/md/dm-integrity.o | ./scripts/checkstack.pl x86

before:

        0x0000246a write_journal [dm-integrity.o]:              696

after:

        0x00002b7a write_journal [dm-integrity.o]:              296

Reported-by: Arnd Bergmann <a...@arndb.de>
Signed-off-by: Boqun Feng <boqun.f...@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
Acked-by: Arnd Bergmann <a...@arndb.de>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Byungchul Park <byungchul.p...@lge.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Nicholas Piggin <npig...@gmail.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: wal...@google.com
Cc: wi...@infradead.org
Link: http://lkml.kernel.org/r/20170823152542.5150-3-boqun.f...@gmail.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 include/linux/completion.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 791f053..cae5400 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -74,7 +74,7 @@ static inline void complete_release_commit(struct completion 
*x) {}
 #endif
 
 #define COMPLETION_INITIALIZER_ONSTACK(work) \
-       ({ init_completion(&work); work; })
+       (*({ init_completion(&work); &work; }))
 
 /**
  * DECLARE_COMPLETION - declare and initialize a completion structure

Reply via email to