[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2014-01-27 Thread trippels at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309

Markus Trippelsdorf  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||trippels at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #59 from Markus Trippelsdorf  ---
(In reply to Kostya Serebryany from comment #58)
> FTR, here are the new numbers; except for 464.h264ref looks good.

Thanks. Lets close this bug then.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2014-01-27 Thread kcc at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309

--- Comment #58 from Kostya Serebryany  ---
FTR, here are the new numbers; except for 464.h264ref looks good.
clang r199888, gcc r207025
flags: -O2 -fsanitize=address
machine: Dell 3500 (Intel(R) Xeon(R) CPU W3690  @ 3.47GHz)

   clang  gccdiff
   400.perlbench,  1286.00,-1.00,-0.00
   401.bzip2,   857.00,   940.00, 1.10
 403.gcc,   621.00,   606.00, 0.98
 429.mcf,   578.00,   574.00, 0.99
   445.gobmk,   860.00,   850.00, 0.99
   456.hmmer,   880.00,  1149.00, 1.31
   458.sjeng,   992.00,   996.00, 1.00
  462.libquantum,   492.00,   483.00, 0.98
 464.h264ref,  1274.00,  3998.00, 3.14
 471.omnetpp,   566.00,   569.00, 1.01
   473.astar,   661.00,   647.00, 0.98
   483.xalancbmk,   478.00,   491.00, 1.03
433.milc,   620.00,   611.00, 0.99
444.namd,   601.00,   528.00, 0.88
  447.dealII,   624.00,   670.00, 1.07
  450.soplex,   366.00,   389.00, 1.06
  453.povray,   430.00,   374.00, 0.87
 470.lbm,   355.00,   452.00, 1.27
 482.sphinx3,   926.00,  1108.00, 1.20


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-28 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #57 from Kostya Serebryany  2013-02-28 
11:31:54 UTC ---

I've created a page that describes how I run SPEC with asan. 

There is also a patch that works around the known SPEC bugs.



https://code.google.com/p/address-sanitizer/wiki/RunningSpecBenchmarks

https://code.google.com/p/address-sanitizer/source/browse/trunk/spec/spec2006-asan.patch


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-25 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #56 from Kostya Serebryany  2013-02-26 
07:43:19 UTC ---

http://llvm.org/viewvc/llvm-project?rev=176078&view=rev

makes memcmp interceptor more aggressive, so that clang finds this bug 

in perlbmk too.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread joseph at codesourcery dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #55 from joseph at codesourcery dot com  2013-02-22 16:10:49 UTC ---

I believe the arguments to memcmp must point to objects with at least the 

given number of bytes.  (For strcmp, they must point to NUL-terminated 

strings.  For strncmp, they must point to objects that either have at 

least the given number of bytes or have bytes present up to a NUL within 

that number of bytes - there's no guarantee that comparison stops early 

when characters differ except for not reading after a NUL.  By comparison, 

the array passed to memchr may be shorter than the given length if a 

matching character is found early - see the wording added in C11 for 

memchr for alignment with POSIX.  But memcmp has no such special rule.)


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #54 from Jakub Jelinek  2013-02-22 
15:13:34 UTC ---

gcc instruments many of the builtins inline, on the assumption that the

builtins are often expanded inline and thus the interceptor might not be called

at all.  Either it isn't, or is and the gcc instrumentation is done in addition

to the interceptor's instrumentation.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #53 from Kostya Serebryany  2013-02-22 
15:06:25 UTC ---

The interceptor we have is conservative: 



INTERCEPTOR(int, memcmp, const void *a1, const void *a2, uptr size) {

  if (!asan_inited) return internal_memcmp(a1, a2, size);

  ENSURE_ASAN_INITED();

  unsigned char c1 = 0, c2 = 0;

  const unsigned char *s1 = (const unsigned char*)a1;

  const unsigned char *s2 = (const unsigned char*)a2;

  uptr i;

  for (i = 0; i < size; i++) {

c1 = s1[i];

c2 = s2[i];

if (c1 != c2) break;

  }

  ASAN_READ_RANGE(s1, Min(i + 1, size));

  ASAN_READ_RANGE(s2, Min(i + 1, size));

  return CharCmp(c1, c2);

} 



looks like gcc partially inlines memcmp and 

bypasses out conservative interceptor.



We could make the interceptor more strict (ASAN_READ_RANGE(s2, size);).

I am trying to remember why we didn't do this...


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



Jakub Jelinek  changed:



   What|Removed |Added



 CC||jsm28 at gcc dot gnu.org



--- Comment #52 from Jakub Jelinek  2013-02-22 
15:03:31 UTC ---

CCing Joseph for expert opinion on whether memcmp ("abcdef", "qrst", 6); is

valid C99.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #51 from Jakub Jelinek  2013-02-22 
15:01:08 UTC ---

Looks like a real SPEC bug to me.



PerlIO_funcs *

PerlIO_find_layer(pTHX_ const char *name, STRLEN len, int load)

{

IV i;

if ((SSize_t) len <= 0)

len = strlen(name);

for (i = 0; i < PL_known_layers->cur; i++) {

PerlIO_funcs *f = PL_known_layers->array[i].funcs;

if (memEQ(f->name, name, len) && f->name[len] == 0) {

PerlIO_debug("%.*s => %p\n", (int) len, name, (void*)f);

return f;

}

}



memEQ is memcmp, and my reading of ISO C99 or

http://pubs.opengroup.org/onlinepubs/9699919799/functions/memcmp.html is that

it is a bug to call memcmp ("abcdef", "defg", 6).  A valid memcmp

implementation could preread all bytes from both arrays (of the given length)

and only then compare.  And, at least some implementations (e.g. glibc

string/memcmp.c) does that if the two strings aren't starting at the same

address modulo size of word.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #50 from Kostya Serebryany  2013-02-22 
14:54:24 UTC ---

reproducer: 



#include 

#include 

int foo(const char *x, const char *y, int len) {

  return memcmp(x, y, len);

}

int main() {

  printf("%d\n", foo("perlio", "unix", 6));

}



clang does not report a warning here, but gcc does. 

This is a gray area for me, not sure if we should treat this as a buggy code. 



on one hand, memcmp gets size=6, while one of the buffers is smaller. 

otoh, the first bytes of the strings are different and memcmp should not read

the rest. 



I vaguely remember some similar case where we decided that the code is correct. 

Anyone?


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #49 from Kostya Serebryany  2013-02-22 
14:29:27 UTC ---

with -gdwarf-3: 

==11621== ERROR: AddressSanitizer: global-buffer-overflow on address

0x0078e2a5 at pc 0x4e47d7 bp 0x7fff553d4cc0 sp 0x7fff553d4cb8

READ of size 1 at 0x0078e2a5 thread T0

#0 0x4e47d6 in PerlIO_find_layer perlio.c:751

#1 0x4e63e6 in PerlIO_default_buffer perlio.c:1015

#2 0x4e678e in PerlIO_default_layers perlio.c:1113

#3 0x4e7a41 in PerlIO_resolve_layers perlio.c:1433

#4 0x4e8145 in PerlIO_openn perlio.c:1519

#5 0x4f5c08 in PerlIO_fdopen perlio.c:4745

#6 0x4e68a3 in PerlIO_stdstreams perlio.c:1150

#7 0x4f5b46 in Perl_PerlIO_stdin perlio.c:4686

#8 0x4dd7ee in S_open_script perl.c:3348

#9 0x4d3be6 in S_parse_body perl.c:1718

#10 0x4d2a4b in perl_parse perl.c:1312

#11 0x4f6ee8 in main perlmain.c:96

#12 0x7f686f32576c in __libc_start_main libc-start.c:226

#13 0x4037d8 in _start ??:0

0x0078e2a5 is located 0 bytes to the right of global variable '*.LC50

(perlio.c)' (0x78e2a0) of size 5

  '*.LC50 (perlio.c)' is ascii string 'unix'


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread Joost.VandeVondele at mat dot ethz.ch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #48 from Joost VandeVondele  
2013-02-22 13:55:16 UTC ---

(In reply to comment #47)

> 

> Interestingly, the symbolization/debuginfo seems to be completely broken :( 

> 

I've tried compiling with -gdwarf-3 , with some luck


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #47 from Kostya Serebryany  2013-02-22 
13:52:12 UTC ---

(In reply to comment #46)

> (In reply to comment #43)

> > 400.perlbench fails with a global-buffer-overflow which clang does not 
> > detect.

> > I did not investigate why. It could be a gcc false positive or clang false

> > negative.

> 

> On which file/function the global-buffer-overflow was?  Can you send me the

> asan diagnostics?



Interestingly, the symbolization/debuginfo seems to be completely broken :( 



% g++ -g -fsanitize=address ./use-after-free.cc -static-libasan ; ./a.out  2>&1

| grep '#0'

#0 0x4179c2 (/home/kcc/tmp/a.out+0x4179c2)

#0 0x40f18a (/home/kcc/tmp/a.out+0x40f18a)

#0 0x40f26a (/home/kcc/tmp/a.out+0x40f26a)

% addr2line -f -e ./a.out 0x4179c2 0x40f18a 0x40f26a 

main

??:0

free

??:0

malloc

??:0

% 



==580== ERROR: AddressSanitizer: global-buffer-overflow on address

0x0078e2a5 at pc 0x4e47d7 bp 0x7fffa2fbc7b0 sp 0x7fffa2fbc7a8

READ of size 1 at 0x0078e2a5 thread T0

#0 0x4e47d6 in PerlIO_find_layer

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4e47d6)

#1 0x4e63e6 in PerlIO_default_buffer

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4e63e6)

#2 0x4e678e in PerlIO_default_layers

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4e678e)

#3 0x4e7a41 in PerlIO_resolve_layers

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4e7a41)

#4 0x4e8145 in PerlIO_openn

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4e8145)

#5 0x4f5d32 in PerlIO_open

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4f5d32)

#6 0x4dd808 in S_open_script

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4dd808)

#7 0x4d3be6 in S_parse_body

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4d3be6)

#8 0x4d2a4b in perl_parse

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4d2a4b)

#9 0x4f6ee8 in main

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4f6ee8)

#10 0x7fd3a245376c (/lib/x86_64-linux-gnu/libc.so.6+0x2176c)

#11 0x4037d8

(benchspec/CPU2006/400.perlbench/run/run_base_train_z./perlbench_base.z+0x4037d8)

0x0078e2a5 is located 0 bytes to the right of global variable '*.LC50

(perlio.c)' (0x78e2a0) of size 5

  '*.LC50 (perlio.c)' is ascii string 'unix'

SUMMARY: AddressSanitizer: global-buffer-overflow ??:0 PerlIO_find_layer









> 

> > 464.h264ref is VERY slow, I did not look why.

> 

> And it didn't fail on that:

> for (dd=d[k=0]; k<16; dd=d[++k])

> {

>   satd += (dd < 0 ? -dd : dd);

> }

> or have you fixed that up in your SPEC sources?



Interestingly, no. I haven't touched SPEC sources here. 

Maybe gcc does full unroll thus eliminating the buggy read (I did not check).


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #46 from Jakub Jelinek  2013-02-22 
13:09:10 UTC ---

(In reply to comment #43)

> 400.perlbench fails with a global-buffer-overflow which clang does not detect.

> I did not investigate why. It could be a gcc false positive or clang false

> negative.



On which file/function the global-buffer-overflow was?  Can you send me the

asan diagnostics?



> 464.h264ref is VERY slow, I did not look why.



And it didn't fail on that:

for (dd=d[k=0]; k<16; dd=d[++k])

{

  satd += (dd < 0 ? -dd : dd);

}

or have you fixed that up in your SPEC sources?


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #45 from Kostya Serebryany  2013-02-22 
08:36:14 UTC ---

> I'm wondering if the failure goes away compiled with -O0 instead ?

No, the failure is still present with -O0


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-22 Thread Joost.VandeVondele at mat dot ethz.ch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



Joost VandeVondele  changed:



   What|Removed |Added



 CC||Joost.VandeVondele at mat

   ||dot ethz.ch



--- Comment #44 from Joost VandeVondele  
2013-02-22 08:31:11 UTC ---

(In reply to comment #43)

> 400.perlbench fails with a global-buffer-overflow which clang does not detect.



I'm wondering if the failure goes away compiled with -O0 instead ?


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-21 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #43 from Kostya Serebryany  2013-02-22 
07:11:06 UTC ---

gcc r196201:  -O2 -fno-aggressive-loop-optimizations

clang 175735: -O2 



x86_64 linux, both are using the new 7fff8000 shadow offset



   400.perlbench,  1136.00,-1.00,-0.00

   401.bzip2,   838.00,  1154.00, 1.38

 403.gcc,   716.00,   742.00, 1.04

 429.mcf,   582.00,   578.00, 0.99

   445.gobmk,   801.00,  1138.00, 1.42

   456.hmmer,  1277.00,  1515.00, 1.19

   458.sjeng,   869.00,  1258.00, 1.45

  462.libquantum,   532.00,   469.00, 0.88

 464.h264ref,  1303.00,  4395.00, 3.37

 471.omnetpp,   568.00,   585.00, 1.03

   473.astar,   647.00,   748.00, 1.16

   483.xalancbmk,   460.00,   534.00, 1.16

433.milc,   659.00,   614.00, 0.93

444.namd,   592.00,   531.00, 0.90

  447.dealII,   614.00,   706.00, 1.15

  450.soplex,   367.00,   406.00, 1.11

  453.povray,   423.00,   410.00, 0.97

 470.lbm,   377.00,   401.00, 1.06

 482.sphinx3,   958.00,  1325.00, 1.38



400.perlbench fails with a global-buffer-overflow which clang does not detect.

I did not investigate why. It could be a gcc false positive or clang false

negative.



464.h264ref is VERY slow, I did not look why.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-12 Thread howarth at nitro dot med.uc.edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #42 from Jack Howarth  2013-02-12 
14:41:56 UTC ---

(In reply to comment #41)



FYI, most of the codegen issues with xplor-nih compiled with gfortran can be

suppressed with -fno-tree-vectorize at -O3 (hence my interest in a function

libasan on darwin).


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-12 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #41 from Jakub Jelinek  2013-02-12 
14:11:28 UTC ---

That is definitely stage1 material, and a lot of work, especially to teach the

vectorizer how to deal with these.  And, we don't want to introduce the asan

instrumentation too late, e.g. vectorization often reads even from memory

outside of what the source code actually accesses, when it e.g. knows it is

sufficiently aligned and won't cause crashes.  That would be false positives

for asan.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-12 Thread howarth at nitro dot med.uc.edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



Jack Howarth  changed:



   What|Removed |Added



 CC||howarth at nitro dot

   ||med.uc.edu



--- Comment #40 from Jack Howarth  2013-02-12 
14:00:15 UTC ---

(In reply to comment #23)



> #1 afaict, the asan pass happens in the middle of the gcc optimization flow.

> imho it should happen as late as possible so that the instrumentation 

> happens on fully optimized code. 



I can confirm this is the case from my experiments compiling xplor-nih with

-fsanitize=address. This code is habitually miscompiled by gfortran at the

higher optimizations levels. The addition of the  -fsanitize=address flag to

the build suppresses most of the xplor-nih testsuite failures indicating that

it has changed the code optimization in gfortran. Is there any chance of moving

the asan pass or is that definitely stage 1 material?


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-12 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #39 from Jakub Jelinek  2013-02-12 
11:42:33 UTC ---

So, if Darwin keeps the old 1ULL << 44, then the corresponding gcc change (to

be applied together with asan merge) would be something like (untested):

--- gcc/sanitizer.def2013-01-11 09:02:37.879637130 +0100

+++ gcc/sanitizer.def2013-02-12 12:39:12.743272092 +0100

@@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.

for other FEs by asan.c.  */



 /* Address Sanitizer */

-DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_INIT, "__asan_init",

+DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_INIT, "__asan_init_v1",

   BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)

 /* Do not reorder the BUILT_IN_ASAN_REPORT* builtins, e.g. cfgcleanup.c

relies on this order.  */

--- gcc/config/i386/i386.c2013-02-12 11:23:35.400193705 +0100

+++ gcc/config/i386/i386.c2013-02-12 12:38:30.775503155 +0100

@@ -5436,7 +5436,9 @@ ix86_legitimate_combined_insn (rtx insn)

 static unsigned HOST_WIDE_INT

 ix86_asan_shadow_offset (void)

 {

-  return (unsigned HOST_WIDE_INT) 1 << (TARGET_LP64 ? 44 : 29);

+  return TARGET_LP64 ? (TARGET_MACHO ? (HOST_WIDE_INT_1 << 44)

+ : HOST_WIDE_INT_C (0x7fff8000))

+ : (HOST_WIDE_INT_1 << 29);

 }





 /* Argument support functions.  */


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-12 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



Kostya Serebryany  changed:



   What|Removed |Added



 CC||glider at google dot com



--- Comment #38 from Kostya Serebryany  2013-02-12 
11:31:20 UTC ---

Unfortunately, this does not work on Mac, so we will have to keep the old 

mapping on Mac. gr


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-12 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #37 from Kostya Serebryany  2013-02-12 
11:17:45 UTC ---

http://llvm.org/viewvc/llvm-project?rev=174957&view=rev (and r174958)

change the default offset for x86_64 to 7fff8000

and changes __asan_init to __asan_init_v1


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-12 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #36 from Kostya Serebryany  2013-02-12 
08:58:56 UTC ---

> I see, but then you could use the global vars (perhaps weak ones in libasan

> with some default), combined together with arguments to __asan_init (or some

> alternative name of the same function for compatibility).  All that it would 
> do

> beyond normal initialization would be complain if the requested scale/offset

> pair is different from the chosen one.



Maybe we could add calls to e.g. 

  __asan_check_abi_mismatch(uptr a1, uptr a2, uptr a3, uptr a4, uptr a5, uptr

a6)

after every call to __asan_init

(a1 == offset, a2 == shift, a3 == something_else, etc)



if any of a1..a6 is different between the calls to __asan_check_abi_mismatch --

fire an error.

WDYT?


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-12 Thread dvyukov at google dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #35 from Dmitry Vyukov  2013-02-12 
08:47:21 UTC ---

On Tue, Feb 12, 2013 at 12:39 PM, jakub at gcc dot gnu.org

 wrote:

>

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309

>

> --- Comment #34 from Jakub Jelinek  2013-02-12 
> 08:39:33 UTC ---

> (In reply to comment #32)

>> Good news, 0x7fff8000 seems great:

>> There is another suggestion (from dvyukov) to use 
>> -Wl,-Ttext-segment=0x4000

>> together with zerobase (pie is not required) which is worth investigating.

>

> Glad to hear that.  The disadvantage of

> -Wl,-Ttext-segment=0x4000 is that it requires special command line option

> for building the executable, i.e. you can't e.g. just build some shared 
> library

> with -fsanitize=address and leave the main executable non-instrumented.

> Plus, I don't see how can

> -Wl,-Ttext-segment=0x4000 be used for x86_64, where you need 16TB of 
> shadow

> memory for >> 3 scale.  For zero shadow offset you'd need to place the

> executable above 16TB, and that implies non-small model.



It is intended for x86_64. The binary is situated at 0x4000 and

it's shadow is at 0x1000-0x3fff (MAP_32BIT can live here as

well).

Dynamic libraries and mmap live either at 0x7fxx or at

0x55xx, that is mapped way above the executable. So there are

no overlaps.











> If -Ttext-segment is meant for 32-bit programs, then it could allow zero 
> shadow

> offset, but with the disadvantage of special building of executables, and on

> i?86 the offset already fits into the immediates, so it is basically the

> 0x7fff8000 case for x86_64 already.

>

> (In reply to comment #33)

>> > , it might be better to have the scale

>> > and offset as arguments of __asan_init?

>>

>> We did this in the very early version, but it did not work in general.

>> Consider you are linking your program with a third-party object

>> not built with asan. It may have constructor functions called before main and

>> before __asan_init, and those functions call malloc which has to

>> call __asan_init, but can not pass arguments.

>

> I see, but then you could use the global vars (perhaps weak ones in libasan

> with some default), combined together with arguments to __asan_init (or some

> alternative name of the same function for compatibility).  All that it would 
> do

> beyond normal initialization would be complain if the requested scale/offset

> pair is different from the chosen one.

>

> --

> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email

> --- You are receiving this mail because: ---

> You are on the CC list for the bug.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-12 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #34 from Jakub Jelinek  2013-02-12 
08:39:33 UTC ---

(In reply to comment #32)

> Good news, 0x7fff8000 seems great: 

> There is another suggestion (from dvyukov) to use 
> -Wl,-Ttext-segment=0x4000

> together with zerobase (pie is not required) which is worth investigating.



Glad to hear that.  The disadvantage of 

-Wl,-Ttext-segment=0x4000 is that it requires special command line option

for building the executable, i.e. you can't e.g. just build some shared library

with -fsanitize=address and leave the main executable non-instrumented.

Plus, I don't see how can 

-Wl,-Ttext-segment=0x4000 be used for x86_64, where you need 16TB of shadow

memory for >> 3 scale.  For zero shadow offset you'd need to place the

executable above 16TB, and that implies non-small model.

If -Ttext-segment is meant for 32-bit programs, then it could allow zero shadow

offset, but with the disadvantage of special building of executables, and on

i?86 the offset already fits into the immediates, so it is basically the

0x7fff8000 case for x86_64 already.



(In reply to comment #33)

> > , it might be better to have the scale

> > and offset as arguments of __asan_init?  

>

> We did this in the very early version, but it did not work in general. 

> Consider you are linking your program with a third-party object 

> not built with asan. It may have constructor functions called before main and

> before __asan_init, and those functions call malloc which has to 

> call __asan_init, but can not pass arguments.



I see, but then you could use the global vars (perhaps weak ones in libasan

with some default), combined together with arguments to __asan_init (or some

alternative name of the same function for compatibility).  All that it would do

beyond normal initialization would be complain if the requested scale/offset

pair is different from the chosen one.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-11 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #33 from Kostya Serebryany  2013-02-12 
07:02:40 UTC ---

(In reply to comment #31)

> If the mapping is so flexible, how can you detect mismatches?  Different scale

> or shadow offsets are ABI incompatible...

We don't detect mismatches. 

This has never been a problem for our users (who build everything from scratch)

but we do see it as a coming problem as asan is becoming more popular. 



(in reply to comment from another bug)

> Perhaps instead of global vars defined outside of libasan (which e.g. requires

> GOT accesses to those vars in libasan)



Accessing these vars was never a perf problem (we run asan with perf regularly)



> , it might be better to have the scale

> and offset as arguments of __asan_init?  



We did this in the very early version, but it did not work in general. 

Consider you are linking your program with a third-party object 

not built with asan. It may have constructor functions called before main and

before __asan_init, and those functions call malloc which has to 

call __asan_init, but can not pass arguments. 

In some cases we can use .preinit_array to call __asan_init there, but 

that is not always available (?). 



We were (and still are) thinking about encoding the abi version in the name

of the  init function, e.g. __asan_init_v_123. 

It will help us detect abi mismatches when two objects are instrumented with

different generations of asan. 

This doesn't solve the problem of using different offsets though. 







> Then you could easily test at runtime,

> whether all compilation units agree on the same offset/scale, and complain if

> they don't.  Then __asan_mapping_offset and __asan_mapping_scale or how are 
> the

> vars called could be hidden attribute, used with PC relative addressing and

> avoid one extra indirection, and more importantly have better runtime checking

> of mismatches.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-11 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #32 from Kostya Serebryany  2013-02-12 
06:47:56 UTC ---

Good news, 0x7fff8000 seems great: 



t0: orig

t1: short offset (0x7fff8000)

t2: zero offset + pie



  t0   t1 t1/t0   t2t2/t0  t2/t1

---

 400.perlbench, 1206.00, 1151.00, 0.95, 1192.00, 0.99, 1.04

 401.bzip2,  884.00,  842.00, 0.95,  821.00, 0.93, 0.98

   403.gcc,  738.00,  722.00, 0.98,  716.00, 0.97, 0.99

   429.mcf,  609.00,  596.00, 0.98,  586.00, 0.96, 0.98

 445.gobmk,  844.00,  804.00, 0.95,  809.00, 0.96, 1.01

 456.hmmer, 1304.00, 1223.00, 0.94, 1235.00, 0.95, 1.01

 458.sjeng,  916.00,  868.00, 0.95,  897.00, 0.98, 1.03

462.libquantum,  547.00,  535.00, 0.98,  534.00, 0.98, 1.00

   464.h264ref, 1328.00, 1313.00, 0.99, 1265.00, 0.95, 0.96

   471.omnetpp,  628.00,  601.00, 0.96,  596.00, 0.95, 0.99

 473.astar,  665.00,  646.00, 0.97,  657.00, 0.99, 1.02

 483.xalancbmk,  480.00,  449.00, 0.94,  445.00, 0.93, 0.99

  433.milc,  709.00,  655.00, 0.92,  656.00, 0.93, 1.00

  444.namd,  636.00,  594.00, 0.93,  593.00, 0.93, 1.00

447.dealII,  649.00,  615.00, 0.95,  637.00, 0.98, 1.04

450.soplex,  390.00,  374.00, 0.96,  370.00, 0.95, 0.99

453.povray,  452.00,  402.00, 0.89,  421.00, 0.93, 1.05

   470.lbm,  389.00,  378.00, 0.97,  387.00, 0.99, 1.02

   482.sphinx3,  980.00,  930.00, 0.95,  926.00, 0.94, 1.00



So, 0x7fff8000 seems to be a win, even compared to pie+zerobase. 

We'll do some more testing a flip the switch in clang. 



There is another suggestion (from dvyukov) to use -Wl,-Ttext-segment=0x4000

together with zerobase (pie is not required) which is worth investigating.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-11 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #31 from Jakub Jelinek  2013-02-11 
15:02:25 UTC ---

If the mapping is so flexible, how can you detect mismatches?  Different scale

or shadow offsets are ABI incompatible...


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-11 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #30 from Kostya Serebryany  2013-02-11 
14:42:43 UTC ---

> Could we on x86_64 think about mem_to_shadow(x) (x >> 3) + 0x7fff8000



Committed http://llvm.org/viewvc/llvm-project?rev=174886&view=rev

which adds an optional flag -mllvm -asan-short-64bit-mapping-offset=1



On bzip2/train it gives us ~ 2/3 of the zero-base-offset benefits:

  orig  0x7fff8000zero  

401.bzip2,68.80,64.80,62.70



Measuring the rest. 



Note that with clang this did not require any change in the run-time

(since recently we switched to ASAN_FLEXIBLE_MAPPING_AND_OFFSET=1)


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-08 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #29 from Jakub Jelinek  2013-02-08 
09:25:22 UTC ---

I think not in the default memory model, it can support only first 2GB of

code+data.  Otherwise you couldn't call from the start of executable to a

function at the end of it (if text segment is bigger than 2GB) or reference

data from a function at the start of executable that is located at the end of

data segment.

So, with zero offset model, your restriction on programs would be essentially,

non-PIE executables (i.e. -mcmodel={small,medium,large} are unsupported),

with 0x7fff8000 (or perhaps even 0x7000) it would be non-PIE executables of

-mcmodel=medium is unsupported and -mcmodel=large is unsupported, unless linked

to an address above shadow mem end.  -mcmodel=small supported.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-08 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #28 from Kostya Serebryany  2013-02-08 
09:13:27 UTC ---

> Could we on x86_64 think about mem_to_shadow(x) (x >> 3) + 0x7fff8000 (note,

> not |, but +)?



That sounds compelling, but I afraid we may have binaries with 2G of

text+globals. (!!)

Still, worth investigating. 



I agree with your arguments about not everyone willing to use -pie, 

but many large projects already do this anyway (e.g. Chrome)


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-08 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #27 from Jakub Jelinek  2013-02-08 
09:02:23 UTC ---

Zero based offset has the big disadvantage of imposing big requirements on the

executable.

Could we on x86_64 think about mem_to_shadow(x) (x >> 3) + 0x7fff8000 (note,

not |, but +)?

Then instead of something like:

movq%rdi, %rdx

movabsq $17592186044416, %rax

shrq$3, %rdx

cmpb$0, (%rdx,%rax)

jne .L5

movq(%rdi), %rax

ret

.L5:

pushq   %rax

call__asan_report_load8

we could emit:

movq%rdi, %rdx

shrq$3, %rdx

cmpb$0, 0x7fff8000(%rdx)

jne .L5

movq(%rdi), %rax

ret

.L5:

pushq   %rax

call__asan_report_load8

which is 7 bytes shorter sequence, without the need of an extra register and

the not so cheap movabs insn.  By forcing PIE for everything, you are forcing

the PIC overhead of unnecessary extra indirections in many places (and, on

non-x86_64 usually it is even much more expensive).


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-07 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #26 from Kostya Serebryany  2013-02-08 
06:31:26 UTC ---

FTR: here is the perf data for zero-based offset (clang)

https://code.google.com/p/address-sanitizer/wiki/ZeroBasedShadow#Performance


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-07 Thread dvyukov at google dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #25 from Dmitry Vyukov  2013-02-07 
17:18:05 UTC ---

On Thu, Feb 7, 2013 at 9:00 PM, jakub at gcc dot gnu.org

 wrote:

>

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309

>

> --- Comment #24 from Jakub Jelinek  2013-02-07 
> 17:00:17 UTC ---

> (In reply to comment #23)

>> #1 afaict, the asan pass happens in the middle of the gcc optimization flow.

>> imho it should happen as late as possible so that the instrumentation

>> happens on fully optimized code.

>

> Our current plan for 4.9 is add __builtin_asan_mem_test (address, length,

> is_write) or similar builtin, where the current asan pass would just insert

> these builtins.  Then, we'd teach the alias oracle and other code about these

> builtins (that they shouldn't be optimized away, unless dominated by similar

> test on the same address with same or bigger length, without an intervening

> call that could free memory, and that they on the other side don't modify any

> memory), teach the vectorizer how to vectorize these builtins and look at 
> other

> passes where it might prevent some optimizations (I guess vectorization will 
> be

> the most important though).  And, finally have some later pass that will do 
> the

> optimization Dodji just wrote, but on the builtins in the IL, with some

> propagation etc. (and could handle tsan builtins too), and then lower this

> special asan builtin to the shadow memory load + test + __asan_report*.



If a memory access is *post* dominated by a memory access to the same

location, then the first one can be eliminated even if there are

intervening function calls, because it's impossible to make an

unaddressable variable addressable again.

This is not true for tsan, though.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-07 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #24 from Jakub Jelinek  2013-02-07 
17:00:17 UTC ---

(In reply to comment #23)

> #1 afaict, the asan pass happens in the middle of the gcc optimization flow.

> imho it should happen as late as possible so that the instrumentation 

> happens on fully optimized code. 



Our current plan for 4.9 is add __builtin_asan_mem_test (address, length,

is_write) or similar builtin, where the current asan pass would just insert

these builtins.  Then, we'd teach the alias oracle and other code about these

builtins (that they shouldn't be optimized away, unless dominated by similar

test on the same address with same or bigger length, without an intervening

call that could free memory, and that they on the other side don't modify any

memory), teach the vectorizer how to vectorize these builtins and look at other

passes where it might prevent some optimizations (I guess vectorization will be

the most important though).  And, finally have some later pass that will do the

optimization Dodji just wrote, but on the builtins in the IL, with some

propagation etc. (and could handle tsan builtins too), and then lower this

special asan builtin to the shadow memory load + test + __asan_report*.



> #2 asan speed is very sensitive to quality of regalloc. It would be 
> interesting

> (and useful anyway) to implement zero-offset-shadow

> (https://code.google.com/p/address-sanitizer/wiki/ZeroBasedShadow)

> and see how much it helps with performance. 

> If more than clang's 5% -- we have issues with regalloc, otherwise see #1


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-06 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #23 from Kostya Serebryany  2013-02-07 
05:01:53 UTC ---

with the patch from comment 22 (all benchmarks, ref data): 

   orig  patched

   400.perlbench,-1.00,  1244.00, -1244.00

   401.bzip2,  1189.00,  1137.00, 0.96

 403.gcc,   754.00,   750.00, 0.99

 429.mcf,   611.00,   610.00, 1.00

   445.gobmk,  1211.00,  1167.00, 0.96

   456.hmmer,  1834.00,  1501.00, 0.82

   458.sjeng,  1353.00,  1288.00, 0.95

  462.libquantum,   478.00,   480.00, 1.00

 464.h264ref,  1880.00,  1836.00, 0.98

 471.omnetpp,   621.00,   621.00, 1.00

   473.astar,   766.00,   763.00, 1.00

   483.xalancbmk,   515.00,   517.00, 1.00

433.milc,   631.00,   625.00, 0.99

444.namd,   538.00,   538.00, 1.00

  447.dealII,   716.00,   719.00, 1.00

  450.soplex,   421.00,   415.00, 0.99

  453.povray,   433.00,   429.00, 0.99

 470.lbm,   415.00,   411.00, 0.99

 482.sphinx3,  1377.00,  1343.00, 0.98



The average speedup is similar to what we saw with equivalent optimization in

clang. Strangely, 400.perlbench fails with a warning when built with trunk but

passes with this patch. I did not investigate this further yet.



If we are looking for greater speedup we need to perform more comprehensive 

research. I have two wild guesses (not supported by any data). 



#1 afaict, the asan pass happens in the middle of the gcc optimization flow.

imho it should happen as late as possible so that the instrumentation 

happens on fully optimized code. 

#2 asan speed is very sensitive to quality of regalloc. It would be interesting

(and useful anyway) to implement zero-offset-shadow

(https://code.google.com/p/address-sanitizer/wiki/ZeroBasedShadow)

and see how much it helps with performance. 

If more than clang's 5% -- we have issues with regalloc, otherwise see #1


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-06 Thread dodji at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



Dodji Seketeli  changed:



   What|Removed |Added



  Attachment #29366|0   |1

is obsolete||



--- Comment #22 from Dodji Seketeli  2013-02-06 
15:02:44 UTC ---

Created attachment 29370

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29370

Candidate patch to avoid duplicated intra bb instrumentation



> Trying this patch: 

> % cat inc.cc

> void foo(int *a) {

>   (*a)++;

> }

> % gcc -fsanitize=address -O2 inc.cc -S -o - | grep __asan_report

> call__asan_report_load4

> call__asan_report_store4

> % clang -fsanitize=address -O2 inc.cc -S -o - | grep __asan_report 

> callq   __asan_report_load4

> % 

> 

> Is this test expected to work (have one __asan_error call) with this patch?



The patch indeed (naively) considers read and write accesses as being

different, you are right.  I am attaching a patch that does not, and

that generates just one __asan_report call here.



I'd be nice to know if that makes any change to ...



> First results with the patch (c-only tests, train data):

>  orig  patched

>401.bzip2,89.60,90.10, 1.01

>  429.mcf,23.50,23.90, 1.02

>456.hmmer,   181.00,   145.00, 0.80

>   462.libquantum, 1.64, 1.64, 1.00

>  464.h264ref,   249.00,   249.00, 1.00

> 433.milc,20.10,20.00, 1.00

>  470.lbm,37.20,37.20, 1.00

>  482.sphinx3,17.50,17.50, 1.00

> 

> significant speedup on 456.hmmer, no difference elsewhere. 



... this.  Hopefully, if subsequent intrumentations on same BB on

read/write are considered redundant now, we should see some speed

difference on more tests.



> 3 benchmarks fail to build: 



> Error: 1x403.gcc 1x445.gobmk 1x458.sjeng

> resource.c:431:1: internal compiler error: in

> update_mem_ref_hash_table, at

> asan.c:460



The updated patch hopefully addresses that too.



Thank you for doing this!


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-06 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #21 from Jakub Jelinek  2013-02-06 
12:48:39 UTC ---

As the shadow memory doesn't have information about what locations are

read-only, it only has info whether the relevant bytes are valid, or invalid

(or some invalid, some valid), and for all invalid a few magic values for more

detailed reporting.  So, if you have a RMW statement, without any asan

optimization it will first check the read, and already fail on the read, so

even with the optimization if you just check the read and not the write, the

user visible behavior will be exactly the same.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-06 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #20 from Kostya Serebryany  2013-02-06 
12:43:09 UTC ---

> The clang variant looks incorrect to me - if asan distinguishes between

> loads and stores

It doesn't.

The only reason why we have two callbacks is that asan 

prints a message containing "READ" or "WRITE"

In this case we can report a bad read or a bad write -- doesn't matter.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-06 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #19 from Richard Biener  2013-02-06 
12:39:21 UTC ---

(In reply to comment #17)

> Trying this patch: 

> % cat inc.cc

> void foo(int *a) {

>   (*a)++;

> }

> % gcc -fsanitize=address -O2 inc.cc -S -o - | grep __asan_report

> call__asan_report_load4

> call__asan_report_store4

> % clang -fsanitize=address -O2 inc.cc -S -o - | grep __asan_report 

> callq   __asan_report_load4

> % 



The clang variant looks incorrect to me - if asan distinguishes between

loads and stores the __asan_report_load4 should have been promoted to

a __asan_report_store4.  Consider a pointing to read-only memory.

Or rather asan would need a __asan_report_load_store4 to be really correct.



> Is this test expected to work (have one __asan_error call) with this patch?

> 

> (I've checked that the patch is applied correctly, on 

> gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-1.c 

> it reduces the number of calls from 16 to 5)


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-06 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #18 from Kostya Serebryany  2013-02-06 
12:24:51 UTC ---

First results with the patch (c-only tests, train data):

 orig  patched

   401.bzip2,89.60,90.10, 1.01

 429.mcf,23.50,23.90, 1.02

   456.hmmer,   181.00,   145.00, 0.80

  462.libquantum, 1.64, 1.64, 1.00

 464.h264ref,   249.00,   249.00, 1.00

433.milc,20.10,20.00, 1.00

 470.lbm,37.20,37.20, 1.00

 482.sphinx3,17.50,17.50, 1.00



significant speedup on 456.hmmer, no difference elsewhere. 

3 benchmarks fail to build: 

Error: 1x403.gcc 1x445.gobmk 1x458.sjeng

resource.c:431:1: internal compiler error: in update_mem_ref_hash_table, at

asan.c:460

 find_dead_or_set_registers (target, res, jump_target, jump_count, set, needed)

 ^

0x7d0c74 update_mem_ref_hash_table

../../gcc/gcc/asan.c:460

0x7d15ab maybe_instrument_assignment

../../gcc/gcc/asan.c:1799

0x7d15ab transform_statements

../../gcc/gcc/asan.c:1870

0x7d15ab asan_instrument

../../gcc/gcc/asan.c:2209


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-06 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #17 from Kostya Serebryany  2013-02-06 
11:18:28 UTC ---

Trying this patch: 

% cat inc.cc

void foo(int *a) {

  (*a)++;

}

% gcc -fsanitize=address -O2 inc.cc -S -o - | grep __asan_report

call__asan_report_load4

call__asan_report_store4

% clang -fsanitize=address -O2 inc.cc -S -o - | grep __asan_report 

callq   __asan_report_load4

% 



Is this test expected to work (have one __asan_error call) with this patch?



(I've checked that the patch is applied correctly, on 

gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-1.c 

it reduces the number of calls from 16 to 5)


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-06 Thread dodji at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #16 from Dodji Seketeli  2013-02-06 
10:55:38 UTC ---

Created attachment 29366

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29366

Candidate patch to avoid duplicated intra bb instrumentation



> As for Dodji's patch: can someone attach it here?



Here is the attachment of what I currently have.



> Let me benchmark it too,



Thank you, that would be very appreciated.



> although if that's just optimizing within one BB I don't expect more

> than 5% difference (based on my experiments in llvm).



That would be what I'd expect too, based on my experiments on GCC.

But then I'd be very curious to hear about your findings.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-05 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #15 from Kostya Serebryany  2013-02-05 
12:22:56 UTC ---

Well, I of course can change the SPEC code

 464.h264ref,  1271.00,1879.00,1.47



As for Dodji's patch: can someone attach it here? 

Let me benchmark it too, although if that's just optimizing within one BB

I don't expect more than 5% difference (based on my experiments in llvm). 



Dodji, what are your numbers?


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-05 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #14 from Jakub Jelinek  2013-02-05 
11:26:05 UTC ---

(In reply to comment #11)

> bug in SPEC, it would be much better to just report it to SPEC and hope they

> fix it up.  Though given http://www.spec.org/cpu2006/Docs/faq.html#Run.05 I

> don't have much hope they will (when they even don't see it as C89 violation).



Ah, in this case it is actually the same bug.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-05 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #13 from Jakub Jelinek  2013-02-05 
11:24:23 UTC ---

Please, let's not make this PR into a general gcc vs. clang compile time

comparison (see e.g. Vlad Makarov's mails on this topic, if you care more about

compile time than runtime, supposedly e.g. -O1 might be better than -O2), for

this particular PR I think it matters what relative slowdown -fsanitize=address

causes on compile time and runtime for both compilers, and whether with Dodji's

changes help here.  If not, it is time to look at testcases and figure out what

is going on.  Without Dodji's patch we know what's going on and what could make

the difference.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-05 Thread markus at trippelsdorf dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #12 from Markus Trippelsdorf  
2013-02-05 11:17:42 UTC ---

(In reply to comment #9)

> > And, for compile time, you want to be testing with --enable-checking=release

> Thanks! 

> With --enable-checking=release gcc's compile time drops to 374 seconds.

> That's much better, but still 50% slower than clang (built with asserts)



Hmm, that means gcc is 35% slower (374 vs. 243). That is exactly the

slowdown that I see in all my tests. (So switching to clang is like

moving from a 4-core to a 6-core machine from a compile time perspective.)


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-05 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #11 from Jakub Jelinek  2013-02-05 
10:54:46 UTC ---

I really don't like the blacklist hack, such changes belong to the source, not

outside of it.  If you want to disable instrumentation of SATD, I think

modification of the source is preferrable, or I guess you can

use

echo > buggy-spec-workarounds.h <<\EOF

extern int SATD (int *, int) __attribute__((__no_address_safety_analysis__));

EOF

and use -include .../buggy-spec-workarounds.h, though of course if it is a real

bug in SPEC, it would be much better to just report it to SPEC and hope they

fix it up.  Though given http://www.spec.org/cpu2006/Docs/faq.html#Run.05 I

don't have much hope they will (when they even don't see it as C89 violation).


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-05 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #10 from Kostya Serebryany  2013-02-05 
10:41:20 UTC ---

(In reply to comment #8)

> "464.h264ref with gcc loops forever, I did not investigate why."

> is PR53073 , you can use -fno-aggressive-loop-optimizations to workaround the

> invalid code in SPEC.

Thanks. But then again we hit another bug in 464.h264ref.

So, if we want to run h264ref and perbmk w/o changing sources under gcc+asan

we need to have the blacklist functionality. 

(see the same links as above). 



==8720== ERROR: AddressSanitizer: stack-buffer-overflow on address

0x7fff625736a0 at pc 0x4e2a98 bp 0x7fff62573600 sp 0x7fff625735f8

READ of size 4 at 0x7fff625736a0 thread T0

#0 0x4e2a97 in SATD

(benchspec/CPU2006/464.h264ref/run/run_base_ref_z./h264ref_base.z+0x4e2a97)

#1 0x4e47c0 in SubPelBlockMotionSearch

(benchspec/CPU2006/464.h264ref/run/run_base_ref_z./h264ref_base.z+0x4e47c0)

...

Address 0x7fff625736a0 is located at offset 96 in frame  of T0's stack:

  This frame has 1 object(s):

[32, 96) 'd'


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-05 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #9 from Kostya Serebryany  2013-02-05 
10:30:16 UTC ---

> And, for compile time, you want to be testing with --enable-checking=release

Thanks! 

With --enable-checking=release gcc's compile time drops to 374 seconds.

That's much better, but still 50% slower than clang (built with asserts)


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-05 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #8 from Jakub Jelinek  2013-02-05 
09:56:17 UTC ---

"464.h264ref with gcc loops forever, I did not investigate why."

is PR53073 , you can use -fno-aggressive-loop-optimizations to workaround the

invalid code in SPEC.

As for runtime performance of gcc -fsanitize=address code, it would be

interesting to try also with Dodji's patchset, how that improves things.



And, for compile time, you want to be testing with --enable-checking=release

built gcc, that is what people will actually use if they aren't developing gcc.


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-05 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #7 from Kostya Serebryany  2013-02-05 
09:43:11 UTC ---

If we are talking about compile time, I observe 2x difference in favor of

clang: 

building 483.xalancbmk

gcc+asan+O2:   564 seconds

clang+asan+O2: 243 second



gcc is built with default options

clang is built with -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON


[Bug sanitizer/55309] gcc's address-sanitizer 66% slower than clang's

2013-02-05 Thread kcc at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55309



--- Comment #6 from Kostya Serebryany  2013-02-05 
09:21:59 UTC ---

I am slightly confused. Are we discussing compile time or test-run-time? 

I've just built SPEC 2006 with -fsanitize=address -O2

gcc: r195706

clang: r174324

Measured on Intel(R) Xeon(R) CPU W3690  @ 3.47GHz



   clang gcc

   400.perlbench,  1209.00,-1.00,-0.00

   401.bzip2,   885.00,  1187.00, 1.34

 403.gcc,   739.00,   756.00, 1.02

 429.mcf,   602.00,   612.00, 1.02

   445.gobmk,   840.00,  1191.00, 1.42

   456.hmmer,  1304.00,  1838.00, 1.41

   458.sjeng,   923.00,  1326.00, 1.44

  462.libquantum,   543.00,   481.00, 0.89

 464.h264ref,  1271.00,-1.00,-0.00

 471.omnetpp,   631.00,   624.00, 0.99

   473.astar,   672.00,   765.00, 1.14

   483.xalancbmk,   500.00,   521.00, 1.04

433.milc,   710.00,   629.00, 0.89

444.namd,   637.00,   539.00, 0.85

  447.dealII,   650.00,   714.00, 1.10

  450.soplex,   389.00,   419.00, 1.08

  453.povray,   459.00,   432.00, 0.94

 470.lbm,   388.00,   409.00, 1.05

 482.sphinx3,   998.00,  1335.00, 1.34





400.perlbench fails with a real asan-ish warning 

(clang can use a blacklist file and disables instrumentation for the buggy

function.

See https://code.google.com/p/address-sanitizer/wiki/FoundBugs#Spec_CPU_2006

and 

https://code.google.com/p/address-sanitizer/wiki/AddressSanitizer#Turning_off_instrumentation)



464.h264ref with gcc loops forever, I did not investigate why. 



So, on average clang+asan is faster than gcc-asan (up to 40%!), 

but in some cases (mostly, FP code) gcc is faster (up to 15%)