Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

2015-04-30 Thread mancha security
On Thu, Apr 30, 2015 at 01:43:07AM +0200, Daniel Borkmann wrote:
 On 04/29/2015 04:54 PM, mancha security wrote:
 On Wed, Apr 29, 2015 at 04:01:19PM +0200, Daniel Borkmann wrote:
 On 04/29/2015 03:08 PM, mancha security wrote:
 ...
 By the way, has anyone been able to verify that __memory_barrier
 provides DSE protection under various optimizations? Unfortunately,
 I don't have ready access to ICC at the moment or I'd test it
 myself.
 
 Never used icc, but it looks like it's free for open source
 projects; I can give it a try, but in case you're faster than I am,
 feel free to post results here.
 
 Time permitting, I'll try setting this up and post my results.
 
 So I finally got the download link and an eval license for icc, and
 after needing to download gigbytes of bloat for the suite, I could
 finally start to experiment a bit.

Ugh.

 So __GNUC__ and __INTEL_COMPILER is definitely defined by icc, __ECC
 not in my case, so that part is as expected for the kernel header
 includes.
 
 With barrier_data(), I could observe insns for an inlined memset()
 being emitted in the disassembly, same with barrier(), same with
 __memory_barrier(). In fact, even if I only use ...
 
 static inline void memzero_explicit(void *s, size_t count)
 {
   memset(s, 0, count);
 }
 
 int main(void)
 {
   char buff[20];
   memzero_explicit(buff, sizeof(buff));
   return 0;
 }
 
 ... icc will emit memset instrinsic insns (did you notice that as
 well?) when using various optimization levels. Using f.e. -Ofast
 -ffreestanding resp. -fno-builtin-memset will emit a function call,
 presumably, icc is then not allowed to make any assumptions, so given
 the previous result, then would then be expected.

I didn't get around to installing ICC so thanks for sharing the very
interesting results.

 So, crafting a stupid example:
 
 static inline void
 dumb_memset(char *s, unsigned char c, size_t n)
 {
 int i;
 
 for (i = 0; i  n; i++)
 s[i] = c;
 }
 
 static inline void memzero_explicit(void *s, size_t count)
 {
   dumb_memset(s, 0, count);
   barrier-variant
 }
 
 int main(void)
 {
   char buff[20];
   memzero_explicit(buff, sizeof(buff));
   return 0;
 }
 
 With no barrier at all, icc optimizes all that away (using -Ofast),
 with barrier_data() it inlines and emits additional mov* insns!  Just
 using barrier() or __memory_barrier(), we end up with the same case as
 with clang, that is, it gets optimized away. So, barrier_data() seems
 to be better here as well.

For now, seems we're good with barrier_data should things like the LTO
initiative pick up steam, etc.

Cheers.


pgpG363roYfoo.pgp
Description: PGP signature


Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

2015-04-29 Thread mancha security
Hi Daniel et al.

Looks good from here.

By the way, has anyone been able to verify that __memory_barrier 
provides DSE protection under various optimizations? Unfortunately, I
don't have ready access to ICC at the moment or I'd test it myself.

--mancha

PS It would be nice if memset_s were universally adopted/implemented so
we could stop worrying about these things.


On Tue, Apr 28, 2015 at 05:22:20PM +0200, Daniel Borkmann wrote:
 In commit 0b053c951829 (lib: memzero_explicit: use barrier instead
 of OPTIMIZER_HIDE_VAR), we made memzero_explicit() more robust in
 case LTO would decide to inline memzero_explicit() and eventually
 find out it could be elimiated as dead store.
 
 While using barrier() works well for the case of gcc, recent efforts
 from LLVMLinux people suggest to use llvm as an alternative to gcc,
 and there, Stephan found in a simple stand-alone user space example
 that llvm could nevertheless optimize and thus elimitate the memset().
 A similar issue has been observed in the referenced llvm bug report,
 which is regarded as not-a-bug.
 
 The fix in this patch now works for both compilers (also tested with
 more aggressive optimization levels). Arguably, in the current kernel
 tree it's more of a theoretical issue, but imho, it's better to be
 pedantic about it.
 
 It's clearly visible though, with the below code: if we would have
 used barrier()-only here, llvm would have omitted clearing, not so
 with barrier_data() variant:
 
   static inline void memzero_explicit(void *s, size_t count)
   {
 memset(s, 0, count);
 barrier_data(s);
   }
 
   int main(void)
   {
 char buff[20];
 memzero_explicit(buff, sizeof(buff));
 return 0;
   }
 
   $ gcc -O2 test.c
   $ gdb a.out
   (gdb) disassemble main
   Dump of assembler code for function main:
0x00400400  +0: lea   -0x28(%rsp),%rax
0x00400405  +5: movq  $0x0,-0x28(%rsp)
0x0040040e +14: movq  $0x0,-0x20(%rsp)
0x00400417 +23: movl  $0x0,-0x18(%rsp)
0x0040041f +31: xor   %eax,%eax
0x00400421 +33: retq
   End of assembler dump.
 
   $ clang -O2 test.c
   $ gdb a.out
   (gdb) disassemble main
   Dump of assembler code for function main:
0x004004f0  +0: xorps  %xmm0,%xmm0
0x004004f3  +3: movaps %xmm0,-0x18(%rsp)
0x004004f8  +8: movl   $0x0,-0x8(%rsp)
0x00400500 +16: lea-0x18(%rsp),%rax
0x00400505 +21: xor%eax,%eax
0x00400507 +23: retq
   End of assembler dump.
 
 As clang (but also icc) defines __GNUC__, it's sufficient to define this
 in compiler-gcc.h only.
 
 Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
 Reported-by: Stephan Mueller smuel...@chronox.de
 Signed-off-by: Daniel Borkmann dan...@iogearbox.net
 Cc: Theodore Ts'o ty...@mit.edu
 Cc: Stephan Mueller smuel...@chronox.de
 Cc: Hannes Frederic Sowa han...@stressinduktion.org
 Cc: mancha security manc...@zoho.com
 Cc: Mark Charlebois charl...@gmail.com
 Cc: Behan Webster beh...@converseincode.com
 ---
  include/linux/compiler-gcc.h | 16 +++-
  include/linux/compiler.h |  4 
  lib/string.c |  2 +-
  3 files changed, 20 insertions(+), 2 deletions(-)
 
 diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
 index cdf13ca..371e560 100644
 --- a/include/linux/compiler-gcc.h
 +++ b/include/linux/compiler-gcc.h
 @@ -9,10 +9,24 @@
  + __GNUC_MINOR__ * 100 \
  + __GNUC_PATCHLEVEL__)
  
 -
  /* Optimization barrier */
 +
  /* The volatile is due to gcc bugs */
  #define barrier() __asm__ __volatile__(: : :memory)
 +/*
 + * This version is i.e. to prevent dead stores elimination on @ptr
 + * where gcc and llvm may behave differently when otherwise using
 + * normal barrier(): while gcc behavior gets along with a normal
 + * barrier(), llvm needs an explicit input variable to be assumed
 + * clobbered. The issue is as follows: while the inline asm might
 + * access any memory it wants, the compiler could have fit all of
 + * @ptr into memory registers instead, and since @ptr never escaped
 + * from that, it proofed that the inline asm wasn't touching any of
 + * it. This version works well with both compilers, i.e. we're telling
 + * the compiler that the inline asm absolutely may see the contents
 + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
 + */
 +#define barrier_data(ptr) __asm__ __volatile__(: :r(ptr) :memory)
  
  /*
   * This macro obfuscates arithmetic on a variable address so that gcc
 diff --git a/include/linux/compiler.h b/include/linux/compiler.h
 index 0e41ca0..8677225 100644
 --- a/include/linux/compiler.h
 +++ b/include/linux/compiler.h
 @@ -169,6 +169,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, 
 int val, int expect);
  # define barrier() __memory_barrier()
  #endif
  
 +#ifndef barrier_data
 +# define barrier_data(ptr) barrier()
 +#endif
 +
  /* Unreachable code */
  #ifndef 

Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

2015-04-29 Thread mancha security
On Wed, Apr 29, 2015 at 04:01:19PM +0200, Daniel Borkmann wrote:
 On 04/29/2015 03:08 PM, mancha security wrote:
 ...
 By the way, has anyone been able to verify that __memory_barrier
 provides DSE protection under various optimizations? Unfortunately, I
 don't have ready access to ICC at the moment or I'd test it myself.
 
 Never used icc, but it looks like it's free for open source projects;
 I can give it a try, but in case you're faster than I am, feel free
 to post results here.

Time permitting, I'll try setting this up and post my results.

 
 From what I see based on the code, i.e. after that buggy cleanup
 commit ...
 
 commit 73679e50820123ebdedc67ebcda4562d1d6e4aba
 Author: Pranith Kumar bobby.pr...@gmail.com
 Date:   Tue Apr 15 12:05:22 2014 -0400
 
 compiler-intel.h: Remove duplicate definition
 
 barrier is already defined as __memory_barrier in compiler.h
 Remove this unnecessary redefinition.
 
 Signed-off-by: Pranith Kumar bobby.pr...@gmail.com
 Link: 
 http://lkml.kernel.org/r/cajhhmcanypy0%2bqd-1kbnjplt3xgajdr12j%2byssnpgmzcpbe...@mail.gmail.com
 Signed-off-by: H. Peter Anvin h...@linux.intel.com
 
 ... it looks like it's currently using the _same_ gcc inline asm
 for the barrier on icc instead of what that commit intended to do.
 
 So funny enough, we don't actually use __memory_barrier() at the
 moment. ;)
 
 Nonetheless, having a look might be good.

Nice catch, 73679e50820 is indeed buggy because ICC defines __GNUC__
(unless -no-gcc is used). That should be reverted.

Bug aside, according to [1], ICC does support GNU-style inline asm so
for the purposes of barrier_data(), it would be interesting to see if
it affords better/worse DSE protection compared to __memory_barrier().

--mancha

[1]
https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-5100C4FC-BC2F-4E36-943A-120CFFFB4285.htm



pgpO7LVWF_g1e.pgp
Description: PGP signature


Re: [PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

2015-04-28 Thread Stephan Mueller
Am Dienstag, 28. April 2015, 17:22:20 schrieb Daniel Borkmann:

Hi Daniel,

In commit 0b053c951829 (lib: memzero_explicit: use barrier instead
of OPTIMIZER_HIDE_VAR), we made memzero_explicit() more robust in
case LTO would decide to inline memzero_explicit() and eventually
find out it could be elimiated as dead store.

While using barrier() works well for the case of gcc, recent efforts
from LLVMLinux people suggest to use llvm as an alternative to gcc,
and there, Stephan found in a simple stand-alone user space example
that llvm could nevertheless optimize and thus elimitate the memset().
A similar issue has been observed in the referenced llvm bug report,
which is regarded as not-a-bug.

The fix in this patch now works for both compilers (also tested with
more aggressive optimization levels). Arguably, in the current kernel
tree it's more of a theoretical issue, but imho, it's better to be
pedantic about it.

It's clearly visible though, with the below code: if we would have
used barrier()-only here, llvm would have omitted clearing, not so
with barrier_data() variant:

  static inline void memzero_explicit(void *s, size_t count)
  {
memset(s, 0, count);
barrier_data(s);
  }

  int main(void)
  {
char buff[20];
memzero_explicit(buff, sizeof(buff));
return 0;
  }

  $ gcc -O2 test.c
  $ gdb a.out
  (gdb) disassemble main
  Dump of assembler code for function main:
   0x00400400  +0: lea   -0x28(%rsp),%rax
   0x00400405  +5: movq  $0x0,-0x28(%rsp)
   0x0040040e +14: movq  $0x0,-0x20(%rsp)
   0x00400417 +23: movl  $0x0,-0x18(%rsp)
   0x0040041f +31: xor   %eax,%eax
   0x00400421 +33: retq
  End of assembler dump.

  $ clang -O2 test.c
  $ gdb a.out
  (gdb) disassemble main
  Dump of assembler code for function main:
   0x004004f0  +0: xorps  %xmm0,%xmm0
   0x004004f3  +3: movaps %xmm0,-0x18(%rsp)
   0x004004f8  +8: movl   $0x0,-0x8(%rsp)
   0x00400500 +16: lea-0x18(%rsp),%rax
   0x00400505 +21: xor%eax,%eax
   0x00400507 +23: retq
  End of assembler dump.

As clang (but also icc) defines __GNUC__, it's sufficient to define this
in compiler-gcc.h only.

Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
Reported-by: Stephan Mueller smuel...@chronox.de
Signed-off-by: Daniel Borkmann dan...@iogearbox.net
Cc: Theodore Ts'o ty...@mit.edu
Cc: Stephan Mueller smuel...@chronox.de
Cc: Hannes Frederic Sowa han...@stressinduktion.org
Cc: mancha security manc...@zoho.com
Cc: Mark Charlebois charl...@gmail.com
Cc: Behan Webster beh...@converseincode.com

Using a user space test app: tested clang -O3, clang -O2, gcc -O3, gcc -O2.

Tested-by: Stephan Mueller smuel...@chronox.de

Ciao
Stephan
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination

2015-04-28 Thread Daniel Borkmann
In commit 0b053c951829 (lib: memzero_explicit: use barrier instead
of OPTIMIZER_HIDE_VAR), we made memzero_explicit() more robust in
case LTO would decide to inline memzero_explicit() and eventually
find out it could be elimiated as dead store.

While using barrier() works well for the case of gcc, recent efforts
from LLVMLinux people suggest to use llvm as an alternative to gcc,
and there, Stephan found in a simple stand-alone user space example
that llvm could nevertheless optimize and thus elimitate the memset().
A similar issue has been observed in the referenced llvm bug report,
which is regarded as not-a-bug.

The fix in this patch now works for both compilers (also tested with
more aggressive optimization levels). Arguably, in the current kernel
tree it's more of a theoretical issue, but imho, it's better to be
pedantic about it.

It's clearly visible though, with the below code: if we would have
used barrier()-only here, llvm would have omitted clearing, not so
with barrier_data() variant:

  static inline void memzero_explicit(void *s, size_t count)
  {
memset(s, 0, count);
barrier_data(s);
  }

  int main(void)
  {
char buff[20];
memzero_explicit(buff, sizeof(buff));
return 0;
  }

  $ gcc -O2 test.c
  $ gdb a.out
  (gdb) disassemble main
  Dump of assembler code for function main:
   0x00400400  +0: lea   -0x28(%rsp),%rax
   0x00400405  +5: movq  $0x0,-0x28(%rsp)
   0x0040040e +14: movq  $0x0,-0x20(%rsp)
   0x00400417 +23: movl  $0x0,-0x18(%rsp)
   0x0040041f +31: xor   %eax,%eax
   0x00400421 +33: retq
  End of assembler dump.

  $ clang -O2 test.c
  $ gdb a.out
  (gdb) disassemble main
  Dump of assembler code for function main:
   0x004004f0  +0: xorps  %xmm0,%xmm0
   0x004004f3  +3: movaps %xmm0,-0x18(%rsp)
   0x004004f8  +8: movl   $0x0,-0x8(%rsp)
   0x00400500 +16: lea-0x18(%rsp),%rax
   0x00400505 +21: xor%eax,%eax
   0x00400507 +23: retq
  End of assembler dump.

As clang (but also icc) defines __GNUC__, it's sufficient to define this
in compiler-gcc.h only.

Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
Reported-by: Stephan Mueller smuel...@chronox.de
Signed-off-by: Daniel Borkmann dan...@iogearbox.net
Cc: Theodore Ts'o ty...@mit.edu
Cc: Stephan Mueller smuel...@chronox.de
Cc: Hannes Frederic Sowa han...@stressinduktion.org
Cc: mancha security manc...@zoho.com
Cc: Mark Charlebois charl...@gmail.com
Cc: Behan Webster beh...@converseincode.com
---
 include/linux/compiler-gcc.h | 16 +++-
 include/linux/compiler.h |  4 
 lib/string.c |  2 +-
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index cdf13ca..371e560 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -9,10 +9,24 @@
   + __GNUC_MINOR__ * 100 \
   + __GNUC_PATCHLEVEL__)
 
-
 /* Optimization barrier */
+
 /* The volatile is due to gcc bugs */
 #define barrier() __asm__ __volatile__(: : :memory)
+/*
+ * This version is i.e. to prevent dead stores elimination on @ptr
+ * where gcc and llvm may behave differently when otherwise using
+ * normal barrier(): while gcc behavior gets along with a normal
+ * barrier(), llvm needs an explicit input variable to be assumed
+ * clobbered. The issue is as follows: while the inline asm might
+ * access any memory it wants, the compiler could have fit all of
+ * @ptr into memory registers instead, and since @ptr never escaped
+ * from that, it proofed that the inline asm wasn't touching any of
+ * it. This version works well with both compilers, i.e. we're telling
+ * the compiler that the inline asm absolutely may see the contents
+ * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
+ */
+#define barrier_data(ptr) __asm__ __volatile__(: :r(ptr) :memory)
 
 /*
  * This macro obfuscates arithmetic on a variable address so that gcc
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 0e41ca0..8677225 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -169,6 +169,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, 
int val, int expect);
 # define barrier() __memory_barrier()
 #endif
 
+#ifndef barrier_data
+# define barrier_data(ptr) barrier()
+#endif
+
 /* Unreachable code */
 #ifndef unreachable
 # define unreachable() do { } while (1)
diff --git a/lib/string.c b/lib/string.c
index a579201..bb3d4b6 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -607,7 +607,7 @@ EXPORT_SYMBOL(memset);
 void memzero_explicit(void *s, size_t count)
 {
memset(s, 0, count);
-   barrier();
+   barrier_data(s);
 }
 EXPORT_SYMBOL(memzero_explicit);
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at