Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread lin zuojian
Hi Jakub,
Any comments on this patch?
--
Regards
lin zuojian

On Tue, Feb 25, 2014 at 03:28:14PM +0800, lin zuojian wrote:
 Sorry,I have forgot setting another shadow_mem's align.And many strbs
 bump up.
 Here 's patch v4.(last one contains html,damn thunderbird).
 
 --
 Without aligning the asan stack base,this base will only 64-bit aligned
 in ARM machines.
 But asan require 256-bit aligned base because of this:
 1.right shift take ASAN_SHADOW_SHIFT(which is 3) bits are zeros
 2.store multiple/load multiple instructions require the other 2 bits are
 zeros
 
 that add up lowest 5 bits should be zeros.That means 32 bytes or 256
 bits aligned.
 
 * asan.c (asan_emit_stack_protection): Forcing the base to align to 256
 bits if STRICT_ALIGNMENT.
 And set shadow_mem align to 256 bits if STRICT_ALIGNMENT
 * cfgexpand.c (expand_stack_vars): set base_align appropriately when
 asan is on
 (expand_used_vars): Leaving a space in the stack frame for alignment if
 STRICT_ALIGNMENT
 ---
 gcc/asan.c | 14 ++
 gcc/cfgexpand.c | 13 -
 2 files changed, 26 insertions(+), 1 deletion(-)
 
 diff --git a/gcc/asan.c b/gcc/asan.c
 index 53992a8..64898cd 100644
 --- a/gcc/asan.c
 +++ b/gcc/asan.c
 @@ -1017,8 +1017,16 @@ asan_emit_stack_protection (rtx base, rtx pbase,
 unsigned int alignb,
 base_align_bias = ((asan_frame_size + alignb - 1)
  ~(alignb - HOST_WIDE_INT_1)) - asan_frame_size;
 }
 + /* Align base if target is STRICT_ALIGNMENT. */
 + if (STRICT_ALIGNMENT)
 + base = expand_binop (Pmode, and_optab, base,
 + gen_int_mode (-((GET_MODE_ALIGNMENT (SImode)  ASAN_SHADOW_SHIFT) /
 BITS_PER_UNIT), Pmode),
 + NULL_RTX, 1, OPTAB_DIRECT);
 +
 if (use_after_return_class == -1  pbase)
 emit_move_insn (pbase, base);
 +
 +
 base = expand_binop (Pmode, add_optab, base,
 gen_int_mode (base_offset - base_align_bias, Pmode),
 NULL_RTX, 1, OPTAB_DIRECT);
 @@ -1097,6 +1105,8 @@ asan_emit_stack_protection (rtx base, rtx pbase,
 unsigned int alignb,
  (ASAN_RED_ZONE_SIZE  ASAN_SHADOW_SHIFT) == 4);
 shadow_mem = gen_rtx_MEM (SImode, shadow_base);
 set_mem_alias_set (shadow_mem, asan_shadow_set);
 + if (STRICT_ALIGNMENT)
 + set_mem_align(shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
 prev_offset = base_offset;
 for (l = length; l; l -= 2)
 {
 @@ -1186,6 +1196,10 @@ asan_emit_stack_protection (rtx base, rtx pbase,
 unsigned int alignb,
 
 shadow_mem = gen_rtx_MEM (BLKmode, shadow_base);
 set_mem_alias_set (shadow_mem, asan_shadow_set);
 +
 + if (STRICT_ALIGNMENT)
 + set_mem_align(shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
 +
 prev_offset = base_offset;
 last_offset = base_offset;
 last_size = 0;
 diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
 index 06d494c..14fd1c2 100644
 --- a/gcc/cfgexpand.c
 +++ b/gcc/cfgexpand.c
 @@ -1013,10 +1013,18 @@ expand_stack_vars (bool (*pred) (size_t), struct
 stack_vars_data *data)
 if (data-asan_base == NULL)
 data-asan_base = gen_reg_rtx (Pmode);
 base = data-asan_base;
 +
 + if (!STRICT_ALIGNMENT)
 + base_align = crtl-max_used_stack_slot_alignment;
 + else
 + base_align = MAX(crtl-max_used_stack_slot_alignment,
 + (GET_MODE_ALIGNMENT (SImode)  ASAN_SHADOW_SHIFT));
 }
 else
 + {
 offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
 - base_align = crtl-max_used_stack_slot_alignment;
 + base_align = crtl-max_used_stack_slot_alignment;
 + }
 }
 else
 {
 @@ -1843,6 +1851,9 @@ expand_used_vars (void)
 = alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE);
 data.asan_vec.safe_push (prev_offset);
 data.asan_vec.safe_push (offset);
 + /* Leave a space for alignment if STRICT_ALIGNMENT. */
 + if (STRICT_ALIGNMENT)
 + alloc_stack_frame_space ((GET_MODE_ALIGNMENT (SImode) 
 ASAN_SHADOW_SHIFT) / BITS_PER_UNIT , 1);
 
 var_end_seq
 = asan_emit_stack_protection (virtual_stack_vars_rtx,
 -- 
 1.8.3.2
 


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread Jakub Jelinek
On Fri, Feb 28, 2014 at 08:47:56AM +0100, Bernd Edlinger wrote:
 I see the problem too.
 
 But I think it is not necessary to change the stack alignment
 to solve the problem.
 
 It appears to me that the code in asan_emit_stack_protection
 is just wrong. It uses SImode when the memory is not aligned
 enough for that mode. This would not happen if that code
 is rewritten to use get_best_mode, and by the way, even on
 x86_64 the emitted code is not optimal, because that target
 could work with DImode more efficiently.

No, the use of SImode on x86_64 is very much intentional, movabsqs + movq
are generally slower.

 So, to fix that, it would be better to concentrate on that function,
 and use word_mode instead of SImode, and let get_best_mode
 choose the required mode.

No.  As I wrote earlier, the alternative is to use unaligned stores for ARM,
I've asked Lin to benchmark that compared to his patch, but haven't seen
that done yet.

Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread Jakub Jelinek
On Sat, Mar 01, 2014 at 08:22:32PM +0100, Bernd Edlinger wrote:
 So, that is what I mean: this patch makes the stack grow by
 32 bytes, just because the emit_stack_protection uses SImode,
 with unaligned addresses which is not possible for ARM, and
 not optimal for X86_64.

Incorrect, for this case it is optimal on x86-64.

Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread Jakub Jelinek
On Mon, Mar 03, 2014 at 04:51:20PM +0800, lin zuojian wrote:
 Hi Jakub,
 Any comments on this patch?

Can you please repost the patch (+ ChangeLog entry) as attachment?
Or use saner MUA?  When all tabs are eaten and other whitespace crippled,
it is impossible to look at the formatting.

Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread lin zuojian
Hi Jakub,

 No.  As I wrote earlier, the alternative is to use unaligned stores for ARM,
 I've asked Lin to benchmark that compared to his patch, but haven't seen
 that done yet.
 
   Jakub
I have not benchmark yet.But according to what I hear from an ARM Engineer in 
Huawei,
unaligned accessing usually slow.And not recommand to use too much.
-- 
Regards
lin zuojian


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread Jakub Jelinek
On Mon, Mar 03, 2014 at 05:04:45PM +0800, lin zuojian wrote:
  No.  As I wrote earlier, the alternative is to use unaligned stores for ARM,
  I've asked Lin to benchmark that compared to his patch, but haven't seen
  that done yet.

 I have not benchmark yet.But according to what I hear from an ARM Engineer in 
 Huawei,
 unaligned accessing usually slow.And not recommand to use too much.

It is expected it will not be as fast as aligned store, the question is if
an unaligned 32-bit store is faster than 4 8-bit stores, and/or if the cost of
the unaligned stores is bad enough (note, usually it is just a few stores in
the prologue and epilogue) to offset for the penalties introduced by
realigning the stack (typically one extra register that has to be live, plus
the cost of the realignment itself).

Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread lin zuojian
Okay,I will use mutt as my MUA.
--
Regards
lin zuojian

On Mon, Mar 03, 2014 at 09:58:59AM +0100, Jakub Jelinek wrote:
 On Mon, Mar 03, 2014 at 04:51:20PM +0800, lin zuojian wrote:
  Hi Jakub,
  Any comments on this patch?
 
 Can you please repost the patch (+ ChangeLog entry) as attachment?
 Or use saner MUA?  When all tabs are eaten and other whitespace crippled,
 it is impossible to look at the formatting.
 
   Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread lin zuojian
Hi Jakub,

On Mon, Mar 03, 2014 at 10:08:53AM +0100, Jakub Jelinek wrote:
 On Mon, Mar 03, 2014 at 05:04:45PM +0800, lin zuojian wrote:
   No.  As I wrote earlier, the alternative is to use unaligned stores for 
   ARM,
   I've asked Lin to benchmark that compared to his patch, but haven't seen
   that done yet.
 
  I have not benchmark yet.But according to what I hear from an ARM Engineer 
  in Huawei,
  unaligned accessing usually slow.And not recommand to use too much.
 
 It is expected it will not be as fast as aligned store, the question is if
 an unaligned 32-bit store is faster than 4 8-bit stores, and/or if the cost of
 the unaligned stores is bad enough (note, usually it is just a few stores in
 the prologue and epilogue) to offset for the penalties introduced by
 realigning the stack (typically one extra register that has to be live, plus
 the cost of the realignment itself).
Will I run some microbenchmarks?Like 
char * a = new char[100];
a++;
int * b = reinterpret_castint*(a);
b[0] = xxx;
b[4] = xxx;
b[1] = xxx;
b[2] = xxx;
b[3] = xxx;
...
I don't know if this is accurate.
 
   Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-02 Thread lin zuojian
Hi Bernd,
   I think about whether the best mode is so important to x86_64.There
   is no mov r/m64,imm64 refering to Intel Software Developer's Manual
   Volume 2A 3-505.imm32 is the biggest number.And asan clears stack
   using imms.
   And even if there is a mov r/m64,imm64 in the future.The gcc peephole
   pass will do its job.These constant moves will get combined.
   So Maybe the best mode is not so important.
--
Regards
lin zuojian


RE: [PATCH v4] PR middle-end/60281

2014-03-01 Thread Bernd Edlinger
Hi Lin,

On Fri, 28 Feb 2014 19:14:11, lin zuojian wrote:

 于 2014年02月28日 15:58, lin zuojian 写道:
 Hi Bernd,
 I agree you with the mode problem.

 And I have not change the stack alignment.What I change is the virtual
 register base's alignment.

I tried your patch on this test case:

gcc -O2 -fsanitize=address -S gcc/testsuite/c-c++-common/asan/pr59063-1.c

and compared the stack usage with and without patch:

without patch:

    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 96
    @ frame_needed = 0, uses_anonymous_args = 0
    stmfd   sp!, {r4, r5, r6, r7, r8, lr}
    ldr r3, .L12
    ldr r3, [r3]
    sub sp, sp, #96
    cmp r3, #0
    mov r5, sp
    mov r8, sp

and with patch:

    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 128
    @ frame_needed = 0, uses_anonymous_args = 0
    stmfd   sp!, {r4, r5, r6, r7, r8, lr}
    ldr r3, .L12
    ldr r3, [r3]
    sub sp, sp, #128
    cmp r3, #0
    add r3, sp, #128
    bic r4, r3, #31
    sub r4, r4, #96
    mov r8, r4

So, that is what I mean: this patch makes the stack grow by
32 bytes, just because the emit_stack_protection uses SImode,
with unaligned addresses which is not possible for ARM, and
not optimal for X86_64.

Why not use the true alignment value in set_mem_align?
And simply let get_best_mode() choose the right mode to
use for that alignment.

IMHO it should not be necessary to use STRICT_ALIGNMENT
everywhere, because get_best_mode should know what mode
is appropriate for which alignment value.


Regards
Bernd.

 Realignment must be make in !STRICT_ALIGNMENT machine,or emitting the
 efficient code is impossible.
 Sorry, it should be Realignment must be make in STRICT_ALIGNMENT machine.
 For example 4 set mem:QI X,REG:QI Y will not combine into one set mem:SI
 X1,REG:SI Y1,if X is not mentioned as SI mode aligned.
 To make sure X is SI mode algined,virtual register base must be realigned.

 For this patch,I only intent to make it right.Making it best is next task.
 --
 Regards
 lin zuojian.

 于 2014年02月28日 15:47, Bernd Edlinger 写道:
 Hi,

 I see the problem too.

 But I think it is not necessary to change the stack alignment
 to solve the problem.

 It appears to me that the code in asan_emit_stack_protection
 is just wrong. It uses SImode when the memory is not aligned
 enough for that mode. This would not happen if that code
 is rewritten to use get_best_mode, and by the way, even on
 x86_64 the emitted code is not optimal, because that target
 could work with DImode more efficiently.

 So, to fix that, it would be better to concentrate on that function,
 and use word_mode instead of SImode, and let get_best_mode
 choose the required mode.


 Regards
 Bernd Edlinger.
 

Re: [PATCH v4] PR middle-end/60281

2014-03-01 Thread lin zuojian
Hi Bernd,

set_mem_align is not working like magic.

set_mem_align just set the alignment of a memory rtx.And You must aware
that you do so because you are sure this rtx IS aligned like this.
For arm machines, the base the virtual registers are aligned to 8
bytes.You can't just set_mem_align to get_best_mode(),because this is
not the fact.If you do so,the unalign accessing is still there.My
realignment code make sure the base of virtual registers is aligned to
SImode,and so I can use that fact to set_mem_align.

In a word,set_mem_align does not generate realignment code for
STRICT_ALIGNMENT machines.

As you say that's not the best mode on all machines.But let's fix one
bug at a time.You are focusing too much in that mode.
 
 So, that is what I mean: this patch makes the stack grow by
 32 bytes, just because the emit_stack_protection uses SImode,
 with unaligned addresses which is not possible for ARM, and
 not optimal for X86_64.
It's because of the realignment,as my comment said.I have to make room
for virtual registers's realignment,or they(or the spilled registers) will 
get damaged by the other subroutine.  


--
Regards
lin zuojian 


Re: [PATCH v4] PR middle-end/60281

2014-03-01 Thread lin zuojian
Hi Bernd,

You may send a patch too.Your idea will be more clear.
--
Regards
lin zuojian

On Sun, Mar 02, 2014 at 10:24:52AM +0800, lin zuojian wrote:
 Hi Bernd,
 
 set_mem_align is not working like magic.
 
 set_mem_align just set the alignment of a memory rtx.And You must aware
 that you do so because you are sure this rtx IS aligned like this.
 For arm machines, the base the virtual registers are aligned to 8
 bytes.You can't just set_mem_align to get_best_mode(),because this is
 not the fact.If you do so,the unalign accessing is still there.My
 realignment code make sure the base of virtual registers is aligned to
 SImode,and so I can use that fact to set_mem_align.
 
 In a word,set_mem_align does not generate realignment code for
 STRICT_ALIGNMENT machines.
 
 As you say that's not the best mode on all machines.But let's fix one
 bug at a time.You are focusing too much in that mode.
  
  So, that is what I mean: this patch makes the stack grow by
  32 bytes, just because the emit_stack_protection uses SImode,
  with unaligned addresses which is not possible for ARM, and
  not optimal for X86_64.
 It's because of the realignment,as my comment said.I have to make room
 for virtual registers's realignment,or they(or the spilled registers) will 
 get damaged by the other subroutine.  
 
 
 --
 Regards
 lin zuojian 


Re: [PATCH v4] PR middle-end/60281

2014-02-28 Thread lin zuojian
于 2014年02月28日 15:58, lin zuojian 写道:
 Hi Bernd,
 I agree you with the mode problem.

 And I have not change the stack alignment.What I change is the virtual
 register base's alignment.
 Realignment must be make in !STRICT_ALIGNMENT machine,or emitting the
 efficient code is impossible.
Sorry, it should be Realignment must be make in STRICT_ALIGNMENT machine.
 For example 4 set mem:QI X,REG:QI Y will not combine into one set mem:SI
 X1,REG:SI Y1,if X is not mentioned as SI mode aligned.
 To make sure X is SI mode algined,virtual register base must be realigned.

 For this patch,I only intent to make it right.Making it best is next task.
 --
 Regards
 lin zuojian.

 于 2014年02月28日 15:47, Bernd Edlinger 写道:
 Hi,

 I see the problem too.

 But I think it is not necessary to change the stack alignment
 to solve the problem.

 It appears to me that the code in asan_emit_stack_protection
 is just wrong. It uses SImode when the memory is not aligned
 enough for that mode. This would not happen if that code
 is rewritten to use get_best_mode, and by the way, even on
 x86_64 the emitted code is not optimal, because that target
 could work with DImode more efficiently.

 So, to fix that, it would be better to concentrate on that function,
 and use word_mode instead of SImode, and let get_best_mode
 choose the required mode.


 Regards
 Bernd Edlinger.



RE: [PATCH v4] PR middle-end/60281

2014-02-27 Thread Bernd Edlinger
Hi,

I see the problem too.

But I think it is not necessary to change the stack alignment
to solve the problem.

It appears to me that the code in asan_emit_stack_protection
is just wrong. It uses SImode when the memory is not aligned
enough for that mode. This would not happen if that code
is rewritten to use get_best_mode, and by the way, even on
x86_64 the emitted code is not optimal, because that target
could work with DImode more efficiently.

So, to fix that, it would be better to concentrate on that function,
and use word_mode instead of SImode, and let get_best_mode
choose the required mode.


Regards
Bernd Edlinger.   

Re: [PATCH v4] PR middle-end/60281

2014-02-27 Thread lin zuojian
Hi Bernd,
I agree you with the mode problem.

And I have not change the stack alignment.What I change is the virtual
register base's alignment.
Realignment must be make in !STRICT_ALIGNMENT machine,or emitting the
efficient code is impossible.
For example 4 set mem:QI X,REG:QI Y will not combine into one set mem:SI
X1,REG:SI Y1,if X is not mentioned as SI mode aligned.
To make sure X is SI mode algined,virtual register base must be realigned.

For this patch,I only intent to make it right.Making it best is next task.
--
Regards
lin zuojian.

于 2014年02月28日 15:47, Bernd Edlinger 写道:
 Hi,

 I see the problem too.

 But I think it is not necessary to change the stack alignment
 to solve the problem.

 It appears to me that the code in asan_emit_stack_protection
 is just wrong. It uses SImode when the memory is not aligned
 enough for that mode. This would not happen if that code
 is rewritten to use get_best_mode, and by the way, even on
 x86_64 the emitted code is not optimal, because that target
 could work with DImode more efficiently.

 So, to fix that, it would be better to concentrate on that function,
 and use word_mode instead of SImode, and let get_best_mode
 choose the required mode.


 Regards
 Bernd Edlinger. 



[PATCH v4] PR middle-end/60281

2014-02-24 Thread lin zuojian
Sorry,I have forgot setting another shadow_mem's align.And many strbs
bump up.
Here 's patch v4.(last one contains html,damn thunderbird).

--
Without aligning the asan stack base,this base will only 64-bit aligned
in ARM machines.
But asan require 256-bit aligned base because of this:
1.right shift take ASAN_SHADOW_SHIFT(which is 3) bits are zeros
2.store multiple/load multiple instructions require the other 2 bits are
zeros

that add up lowest 5 bits should be zeros.That means 32 bytes or 256
bits aligned.

* asan.c (asan_emit_stack_protection): Forcing the base to align to 256
bits if STRICT_ALIGNMENT.
And set shadow_mem align to 256 bits if STRICT_ALIGNMENT
* cfgexpand.c (expand_stack_vars): set base_align appropriately when
asan is on
(expand_used_vars): Leaving a space in the stack frame for alignment if
STRICT_ALIGNMENT
---
gcc/asan.c | 14 ++
gcc/cfgexpand.c | 13 -
2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index 53992a8..64898cd 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1017,8 +1017,16 @@ asan_emit_stack_protection (rtx base, rtx pbase,
unsigned int alignb,
base_align_bias = ((asan_frame_size + alignb - 1)
 ~(alignb - HOST_WIDE_INT_1)) - asan_frame_size;
}
+ /* Align base if target is STRICT_ALIGNMENT. */
+ if (STRICT_ALIGNMENT)
+ base = expand_binop (Pmode, and_optab, base,
+ gen_int_mode (-((GET_MODE_ALIGNMENT (SImode)  ASAN_SHADOW_SHIFT) /
BITS_PER_UNIT), Pmode),
+ NULL_RTX, 1, OPTAB_DIRECT);
+
if (use_after_return_class == -1  pbase)
emit_move_insn (pbase, base);
+
+
base = expand_binop (Pmode, add_optab, base,
gen_int_mode (base_offset - base_align_bias, Pmode),
NULL_RTX, 1, OPTAB_DIRECT);
@@ -1097,6 +1105,8 @@ asan_emit_stack_protection (rtx base, rtx pbase,
unsigned int alignb,
 (ASAN_RED_ZONE_SIZE  ASAN_SHADOW_SHIFT) == 4);
shadow_mem = gen_rtx_MEM (SImode, shadow_base);
set_mem_alias_set (shadow_mem, asan_shadow_set);
+ if (STRICT_ALIGNMENT)
+ set_mem_align(shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
prev_offset = base_offset;
for (l = length; l; l -= 2)
{
@@ -1186,6 +1196,10 @@ asan_emit_stack_protection (rtx base, rtx pbase,
unsigned int alignb,

shadow_mem = gen_rtx_MEM (BLKmode, shadow_base);
set_mem_alias_set (shadow_mem, asan_shadow_set);
+
+ if (STRICT_ALIGNMENT)
+ set_mem_align(shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
+
prev_offset = base_offset;
last_offset = base_offset;
last_size = 0;
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 06d494c..14fd1c2 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1013,10 +1013,18 @@ expand_stack_vars (bool (*pred) (size_t), struct
stack_vars_data *data)
if (data-asan_base == NULL)
data-asan_base = gen_reg_rtx (Pmode);
base = data-asan_base;
+
+ if (!STRICT_ALIGNMENT)
+ base_align = crtl-max_used_stack_slot_alignment;
+ else
+ base_align = MAX(crtl-max_used_stack_slot_alignment,
+ (GET_MODE_ALIGNMENT (SImode)  ASAN_SHADOW_SHIFT));
}
else
+ {
offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
- base_align = crtl-max_used_stack_slot_alignment;
+ base_align = crtl-max_used_stack_slot_alignment;
+ }
}
else
{
@@ -1843,6 +1851,9 @@ expand_used_vars (void)
= alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE);
data.asan_vec.safe_push (prev_offset);
data.asan_vec.safe_push (offset);
+ /* Leave a space for alignment if STRICT_ALIGNMENT. */
+ if (STRICT_ALIGNMENT)
+ alloc_stack_frame_space ((GET_MODE_ALIGNMENT (SImode) 
ASAN_SHADOW_SHIFT) / BITS_PER_UNIT , 1);

var_end_seq
= asan_emit_stack_protection (virtual_stack_vars_rtx,
-- 
1.8.3.2