[Bug c/77992] Provide feature to initialize padding bytes to avoid information leaks

2016-10-17 Thread kjlu at gatech dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77992

--- Comment #15 from Kangjie Lu  ---
(In reply to Richard Biener from comment #14)
> Re-opening as an enhacement request for sth like -fexplict-padding adding
> artificial fields to structures padding.
> 
> Patches welcome (hint: look into stor-layout.c, start_record_layout /
> finish_record_layout, place_field).

Sounds great! 
Thanks for the hint.
Since we have been working on preventing uninitialized data leaks for quite a
while, we are interested in implementing this gcc feature.
Once we finish the beta version, we will ask gcc for review and test.

[Bug c/77992] Failures to initialize padding bytes -- causing many information leaks

2016-10-16 Thread kjlu at gatech dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77992

--- Comment #13 from Kangjie Lu  ---
(In reply to jos...@codesourcery.com from comment #10)
> If you care about information in bytes that are not part of a field with 
> other semantic significance, you should use -Werror=padded to get errors 
> on structs with padding and use that information to add explicit dummy 
> fields in the source code where there was padding.  Once there are 
> explicit dummy fields, their values will be preserved by the compiler, so 
> you can either zero the whole struct with memset and rely on the zeroing 
> of dummy fields not being optimized away, or use a struct initializer and 
> rely on it implicitly zeroing those fields.  (Of course this may reduce 
> efficiency as optimizations such as SRA now need to track values of those 
> fields, whereas they do not need to track values of padding.)

This is a candidate solution, but I think it cannot scale.  
Given that the Linux kernel has tens of thousands of modules, the idea of 
manually initializing padding bytes for all data structures will be definitely 
declined by the Linux community. 

My opinion is still that, as padding is introduced by compilers and is 
"invisible" to developers, initializing padding should be done by on the 
compiler side.

[Bug c/77992] Failures to initialize padding bytes -- causing many information leaks

2016-10-16 Thread kjlu at gatech dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77992

--- Comment #12 from Kangjie Lu  ---
(In reply to Andreas Schwab from comment #11)
> The problem with that strategy is that padding is architecture dependent,
> and care must be taken not to introduce ABI breakage.

Agreed. Or a developer will have to write corresponding dummy fields for 
various platforms, which will be annoying for code maintenance.

[Bug c/77992] Failures to initialize padding bytes -- causing many information leaks

2016-10-14 Thread kjlu at gatech dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77992

--- Comment #9 from Kangjie Lu  ---
(In reply to Andrew Pinski from comment #8)
> A simple google search (secure memset [glibc]) finds a few things:
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1381.pdf
> 
> https://sourceware.org/ml/libc-alpha/2014-12/msg00506.html
> 
> https://www.securecoding.cert.org/confluence/display/c/MSC06-C.
> +Beware+of+compiler+optimizations
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8537

Thanks for sharing these interesting links. 
Sure, compiler optimizations sometime may aggressively eliminate dead code.

As I mentioned in my last reply, this is not a problem in our work because
our instrumentation is inserted after all LLVM optimization passes. 
The inserted memset will not be removed.

Back to my original problem, many Linux kernel developers also hope GCC can 
provide a feature (like a compilation option) that can zero-initialize 
padding bytes. Fixing these information leaks manually will make the code
maintenance extremely difficult.  
Anyway, I just wanted to report this issue :)

[Bug c/77992] Failures to initialize padding bytes -- causing many information leaks

2016-10-14 Thread kjlu at gatech dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77992

--- Comment #7 from Kangjie Lu  ---
(In reply to Andrew Pinski from comment #6)
> >More information can be found in our research paper: 
> >http://www.cc.gatech.edu/~klu38/publications/unisan-ccs16.pdf
> 
> 
> You research paper is wrong and does not consider C is an inherently
> insecure language to be begin with.  There are many other things wrong with
> it.  Like for an example recommending the use of memset when you want to
> hide the stores from the compiler.  There is already a thread on the glibc
> mailing list about this exact thing about adding a secure memset which is
> GCC is not going to optimize away.

Thanks for your feedback. 
We do think C is not safe language and that's why we want to secure programs 
written in C.
Could you provide me more information about the thread. We use LLVM instead
of GCC. Our instrumentation is inserted after optimization passes.

Thanks!

[Bug driver/77992] Failures to initialize padding bytes -- causing many information leaks

2016-10-14 Thread kjlu at gatech dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77992

Kangjie Lu  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #4 from Kangjie Lu  ---
(In reply to Andrew Pinski from comment #3)
> There is no way in C to do that. If you want a secure language you need
> something different.

Could you please explain why there is no way in C to initialize padding?
Besides performance (I understand that the unaligned initialization could be
expensive), any other reasons?

Thanks!

[Bug driver/77992] Failures to initialize padding bytes -- causing many information leaks

2016-10-14 Thread kjlu at gatech dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77992

Kangjie Lu  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #2 from Kangjie Lu  ---
Then I guess this is an unspecified area in C11.

Anyway, the failure to initialize the padding bytes will cause information
leaks; many leaks have been confirmed.

I would suggest gcc to initialize padding bytes even it is not specified in
C11.


Thanks,
Kangjie

[Bug driver/77992] New: Failures to initialize padding bytes -- causing many information leaks

2016-10-14 Thread kjlu at gatech dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77992

Bug ID: 77992
   Summary: Failures to initialize padding bytes -- causing many
information leaks
   Product: gcc
   Version: 5.4.0
Status: UNCONFIRMED
  Severity: critical
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kjlu at gatech dot edu
  Target Milestone: ---

Created attachment 39817
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39817&action=edit
testcase

Hello,

I'd like to report an implementation (or even design) problem in GCC.

Chapter ยง6.7.9/10 in C11:
"If an object that has static or thread storage duration is not initialized
explicitly, then:
...
if it is an aggregate, every member is initialized (recursively) according to
these rules, and any padding is initialized to zero bits;"

According to this specification, padding bytes should be initialized when the
initializer is static.
Take a look at this example (say x86_64):
/
struct S {
long l;
char c;
};

void main () {
struct S s ={
.l = 0,
.c = 0
};
}
/
The developer has carefully initialized all fields with constants.
Object "s" is supposed to be fully initialized, i.e., the seven padding bytes
right after "s.c" are supposed to be initialized.
However, these padding bytes are not initialized in fact. 
In contrast, LLVM would initialize the padding bytes in such a case.

Similarly, when "variables" are used to initialize the fields of "s", padding
bytes are not initialized either, such as:
/
struct S s ={
.l = variable1,
.c = variable2
};
/

Such failures to initialize padding bytes will result in many information
leaks. We have found many information leaks in the Linux kernel.
Here is an example:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4482
More information can be found in our research paper:
http://www.cc.gatech.edu/~klu38/publications/unisan-ccs16.pdf

The testing program for reproducing the leak is attached.

Testing environment:
"Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
5.4.0-6ubuntu1~16.04.2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-5 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2)"


My suggestion to reliably address this problem is that padding bytes of an
object, which are implicitly introduced by compilers, should be
zero-initialized upon object allocation.

Please let me know if you need more information or any assistance.

Best Regards,
Kangjie Lu