arm memcpy of aligned data

2015-05-28 Thread Mike Stump
So, the arm memcpy code of aligned data isn’t as good as it can be.

void *memcpy(void *dest, const void *src, unsigned int n);

void foo(char *dst, int i) {
  memcpy (dst, &i, sizeof (i));
}

generates horrible code, but, it we are willing to notice the src or the 
destination are aligned, we can do much better:

$ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
-mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
$ cat t.s
[ … ]
foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 223842)
+++ gcc/config/arm/arm.c(working copy)
@@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
srcoffset + j * UNITS_PER_WORD - src_autoinc);
  mem = adjust_automodify_address (srcbase, SImode, addr,
   srcoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_loadsi (regs[j], mem));
+ if (src_aligned)
+   emit_move_insn (regs[j], mem);
+ else
+   emit_insn (gen_unaligned_loadsi (regs[j], mem));
}
   srcoffset += words * UNITS_PER_WORD;
 }
@@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
dstoffset + j * UNITS_PER_WORD - dst_autoinc);
  mem = adjust_automodify_address (dstbase, SImode, addr,
   dstoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_storesi (mem, regs[j]));
+ if (dst_aligned)
+   emit_move_insn (mem, regs[j]);
+ else
+   emit_insn (gen_unaligned_storesi (mem, regs[j]));
}
   dstoffset += words * UNITS_PER_WORD;
 }


Ok?

Can someone spin this through an arm test suite run for me, I was doing this by 
inspection and cross compile on a system with no arm bits.  Bonus points if you 
can check it in with the test case above marked up as appropriate.

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 223842)
+++ gcc/config/arm/arm.c(working copy)
@@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
srcoffset + j * UNITS_PER_WORD - src_autoinc);
  mem = adjust_automodify_address (srcbase, SImode, addr,
   srcoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_loadsi (regs[j], mem));
+ if (src_aligned)
+   emit_move_insn (regs[j], mem);
+ else
+   emit_insn (gen_unaligned_loadsi (regs[j], mem));
}
   srcoffset += words * UNITS_PER_WORD;
 }
@@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
dstoffset + j * UNITS_PER_WORD - dst_autoinc);
  mem = adjust_automodify_address (dstbase, SImode, addr,
   dstoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_storesi (mem, regs[j]));
+ if (dst_aligned)
+   emit_move_insn (mem, regs[j]);
+ else
+   emit_insn (gen_unaligned_storesi (mem, regs[j]));
}
   dstoffset += words * UNITS_PER_WORD;
 }


Re: arm memcpy of aligned data

2015-05-28 Thread Oleg Endo

On 28 May 2015, at 23:15, Mike Stump  wrote:

> So, the arm memcpy code of aligned data isn’t as good as it can be.
> 
> void *memcpy(void *dest, const void *src, unsigned int n);
> 
> void foo(char *dst, int i) {
>  memcpy (dst, &i, sizeof (i));
> }
> 
> generates horrible code, but, it we are willing to notice the src or the 
> destination are aligned, we can do much better:
> 

This looks like PR 50417, doesn't it?

Cheers,
Oleg





> $ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
> -mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
> $ cat t.s
> [ … ]
> foo:
>   @ args = 0, pretend = 0, frame = 4
>   @ frame_needed = 0, uses_anonymous_args = 0
>   @ link register save eliminated.
>   sub sp, sp, #4
>   str r1, [r0]@ unaligned
>   add sp, sp, #4
> 
> Index: gcc/config/arm/arm.c
> ===
> --- gcc/config/arm/arm.c  (revision 223842)
> +++ gcc/config/arm/arm.c  (working copy)
> @@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
>   srcoffset + j * UNITS_PER_WORD - src_autoinc);
> mem = adjust_automodify_address (srcbase, SImode, addr,
>  srcoffset + j * UNITS_PER_WORD);
> -   emit_insn (gen_unaligned_loadsi (regs[j], mem));
> +   if (src_aligned)
> + emit_move_insn (regs[j], mem);
> +   else
> + emit_insn (gen_unaligned_loadsi (regs[j], mem));
>   }
>   srcoffset += words * UNITS_PER_WORD;
> }
> @@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
>   dstoffset + j * UNITS_PER_WORD - dst_autoinc);
> mem = adjust_automodify_address (dstbase, SImode, addr,
>  dstoffset + j * UNITS_PER_WORD);
> -   emit_insn (gen_unaligned_storesi (mem, regs[j]));
> +   if (dst_aligned)
> + emit_move_insn (mem, regs[j]);
> +   else
> + emit_insn (gen_unaligned_storesi (mem, regs[j]));
>   }
>   dstoffset += words * UNITS_PER_WORD;
> }
> 
> 
> Ok?
> 
> Can someone spin this through an arm test suite run for me, I was doing this 
> by inspection and cross compile on a system with no arm bits.  Bonus points 
> if you can check it in with the test case above marked up as appropriate.
> 
> 



Re: arm memcpy of aligned data

2015-05-29 Thread Kyrill Tkachov

Hi Mike,

On 28/05/15 22:15, Mike Stump wrote:

So, the arm memcpy code of aligned data isn’t as good as it can be.

void *memcpy(void *dest, const void *src, unsigned int n);

void foo(char *dst, int i) {
   memcpy (dst, &i, sizeof (i));
}

generates horrible code, but, it we are willing to notice the src or the 
destination are aligned, we can do much better:

$ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
-mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
$ cat t.s
[ … ]
foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4


I think there's something to do with cpu tuning here as well.
For the code you've given compiled with -O2 -mcpu=cortex-a53 I get:
sub sp, sp, #8
mov r2, r0
add r3, sp, #8
str r1, [r3, #-4]!
ldr r0, [r3]@ unaligned
str r0, [r2]@ unaligned
add sp, sp, #8
@ sp needed
bx  lr

whereas for -O2 -mcpu=cortex-a57 I get the much better:
sub sp, sp, #8
str r1, [r0]@ unaligned
add sp, sp, #8
@ sp needed
bx  lr

Kyrill




Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 223842)
+++ gcc/config/arm/arm.c(working copy)
@@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
srcoffset + j * UNITS_PER_WORD - src_autoinc);
  mem = adjust_automodify_address (srcbase, SImode, addr,
   srcoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_loadsi (regs[j], mem));
+ if (src_aligned)
+   emit_move_insn (regs[j], mem);
+ else
+   emit_insn (gen_unaligned_loadsi (regs[j], mem));
}
srcoffset += words * UNITS_PER_WORD;
  }
@@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
dstoffset + j * UNITS_PER_WORD - dst_autoinc);
  mem = adjust_automodify_address (dstbase, SImode, addr,
   dstoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_storesi (mem, regs[j]));
+ if (dst_aligned)
+   emit_move_insn (mem, regs[j]);
+ else
+   emit_insn (gen_unaligned_storesi (mem, regs[j]));
}
dstoffset += words * UNITS_PER_WORD;
  }


Ok?

Can someone spin this through an arm test suite run for me, I was doing this by 
inspection and cross compile on a system with no arm bits.  Bonus points if you 
can check it in with the test case above marked up as appropriate.





Re: arm memcpy of aligned data

2015-05-29 Thread Kyrill Tkachov


On 29/05/15 10:08, Kyrill Tkachov wrote:

Hi Mike,

On 28/05/15 22:15, Mike Stump wrote:

So, the arm memcpy code of aligned data isn’t as good as it can be.

void *memcpy(void *dest, const void *src, unsigned int n);

void foo(char *dst, int i) {
memcpy (dst, &i, sizeof (i));
}

generates horrible code, but, it we are willing to notice the src or the 
destination are aligned, we can do much better:

$ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
-mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
$ cat t.s
[ … ]
foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4

I think there's something to do with cpu tuning here as well.


That being said, I do think this is a good idea.
I'll give it a test.

Kyrill


For the code you've given compiled with -O2 -mcpu=cortex-a53 I get:
  sub sp, sp, #8
  mov r2, r0
  add r3, sp, #8
  str r1, [r3, #-4]!
  ldr r0, [r3]@ unaligned
  str r0, [r2]@ unaligned
  add sp, sp, #8
  @ sp needed
  bx  lr

whereas for -O2 -mcpu=cortex-a57 I get the much better:
  sub sp, sp, #8
  str r1, [r0]@ unaligned
  add sp, sp, #8
  @ sp needed
  bx  lr

Kyrill



Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 223842)
+++ gcc/config/arm/arm.c(working copy)
@@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
srcoffset + j * UNITS_PER_WORD - src_autoinc);
  mem = adjust_automodify_address (srcbase, SImode, addr,
   srcoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_loadsi (regs[j], mem));
+ if (src_aligned)
+   emit_move_insn (regs[j], mem);
+ else
+   emit_insn (gen_unaligned_loadsi (regs[j], mem));
}
 srcoffset += words * UNITS_PER_WORD;
   }
@@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
dstoffset + j * UNITS_PER_WORD - dst_autoinc);
  mem = adjust_automodify_address (dstbase, SImode, addr,
   dstoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_storesi (mem, regs[j]));
+ if (dst_aligned)
+   emit_move_insn (mem, regs[j]);
+ else
+   emit_insn (gen_unaligned_storesi (mem, regs[j]));
}
 dstoffset += words * UNITS_PER_WORD;
   }


Ok?

Can someone spin this through an arm test suite run for me, I was doing this by 
inspection and cross compile on a system with no arm bits.  Bonus points if you 
can check it in with the test case above marked up as appropriate.





Re: arm memcpy of aligned data

2015-06-15 Thread Kyrill Tkachov


On 29/05/15 11:15, Kyrill Tkachov wrote:

On 29/05/15 10:08, Kyrill Tkachov wrote:

Hi Mike,

On 28/05/15 22:15, Mike Stump wrote:

So, the arm memcpy code of aligned data isn’t as good as it can be.

void *memcpy(void *dest, const void *src, unsigned int n);

void foo(char *dst, int i) {
 memcpy (dst, &i, sizeof (i));
}

generates horrible code, but, it we are willing to notice the src or the 
destination are aligned, we can do much better:

$ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
-mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
$ cat t.s
[ … ]
foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4

I think there's something to do with cpu tuning here as well.

That being said, I do think this is a good idea.
I'll give it a test.


The patch passes bootstrap and testing ok and I've seen it
improve codegen in a few places in SPEC.
I've added a testcase all marked up.

Mike, I'll commit the attached patch in 24 hours unless somebody objects.

Thanks,
Kyrill

2015-06-15  Mike Stump  

* config/arm/arm.c (arm_block_move_unaligned_straight):
Emit normal move instead of unaligned load when source or destination
are appropriately aligned.

2015-06-15 Mike Stump  
   Kyrylo Tkachov  

* gcc.target/arm/memcpy-aligned-1.c: New test.



Kyrill


For the code you've given compiled with -O2 -mcpu=cortex-a53 I get:
   sub sp, sp, #8
   mov r2, r0
   add r3, sp, #8
   str r1, [r3, #-4]!
   ldr r0, [r3]@ unaligned
   str r0, [r2]@ unaligned
   add sp, sp, #8
   @ sp needed
   bx  lr

whereas for -O2 -mcpu=cortex-a57 I get the much better:
   sub sp, sp, #8
   str r1, [r0]@ unaligned
   add sp, sp, #8
   @ sp needed
   bx  lr

Kyrill



Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 223842)
+++ gcc/config/arm/arm.c(working copy)
@@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
srcoffset + j * UNITS_PER_WORD - src_autoinc);
  mem = adjust_automodify_address (srcbase, SImode, addr,
   srcoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_loadsi (regs[j], mem));
+ if (src_aligned)
+   emit_move_insn (regs[j], mem);
+ else
+   emit_insn (gen_unaligned_loadsi (regs[j], mem));
}
  srcoffset += words * UNITS_PER_WORD;
}
@@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
dstoffset + j * UNITS_PER_WORD - dst_autoinc);
  mem = adjust_automodify_address (dstbase, SImode, addr,
   dstoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_storesi (mem, regs[j]));
+ if (dst_aligned)
+   emit_move_insn (mem, regs[j]);
+ else
+   emit_insn (gen_unaligned_storesi (mem, regs[j]));
}
  dstoffset += words * UNITS_PER_WORD;
}


Ok?

Can someone spin this through an arm test suite run for me, I was doing this by 
inspection and cross compile on a system with no arm bits.  Bonus points if you 
can check it in with the test case above marked up as appropriate.



commit 77191f4224c8729d014a9150bd9364f95ff704b0
Author: Kyrylo Tkachov 
Date:   Fri May 29 10:44:21 2015 +0100

    [ARM] arm memcpy of aligned data

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 638d659..3a33c26 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -14283,7 +14283,10 @@ arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase,
 srcoffset + j * UNITS_PER_WORD - src_autoinc);
 	  mem = adjust_automodify_address (srcbase, SImode, addr,
 	   srcoffset + j * UNITS_PER_WORD);
-	  emit_insn (gen_unaligned_loadsi (regs[j], mem));
+	  if (src_aligned)
+	emit_move_insn (regs[j], mem);
+	  else
+	emit_insn (gen_unaligned_loadsi (regs[j], mem));
 	}
   srcoffset += words * UNITS_PER_WORD;
 }
@@ -14302,7 +14305,10 @@ arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase,
 dstoffset + j * UNITS_PER_WORD - dst_autoinc);
 	  mem = adjust_automodify_address (dstbase, SImode, addr,
 	   dstoffset + j * UNITS_PER_WORD);
-	  emit_insn (gen_unaligned_storesi (mem, regs[j]));
+	  if (dst_aligned)
+	emit_move_insn (mem, regs[j]);
+	  else
+	emit_insn (gen_unaligned_storesi (mem, regs[j]));
 	}
   dstoffset += words * UNITS_PER_WORD;
 }
diff --git a/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c b/gcc/testsuite/gcc.target/ar

Re: arm memcpy of aligned data

2015-06-15 Thread Richard Earnshaw
On 15/06/15 15:30, Kyrill Tkachov wrote:
> 
> On 29/05/15 11:15, Kyrill Tkachov wrote:
>> On 29/05/15 10:08, Kyrill Tkachov wrote:
>>> Hi Mike,
>>>
>>> On 28/05/15 22:15, Mike Stump wrote:
>>>> So, the arm memcpy code of aligned data isn’t as good as it can be.
>>>>
>>>> void *memcpy(void *dest, const void *src, unsigned int n);
>>>>
>>>> void foo(char *dst, int i) {
>>>>  memcpy (dst, &i, sizeof (i));
>>>> }
>>>>
>>>> generates horrible code, but, it we are willing to notice the src or
>>>> the destination are aligned, we can do much better:
>>>>
>>>> $ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve
>>>> -mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
>>>> $ cat t.s
>>>> [ … ]
>>>> foo:
>>>> @ args = 0, pretend = 0, frame = 4
>>>> @ frame_needed = 0, uses_anonymous_args = 0
>>>> @ link register save eliminated.
>>>> subsp, sp, #4
>>>> strr1, [r0]@ unaligned
>>>> addsp, sp, #4
>>> I think there's something to do with cpu tuning here as well.
>> That being said, I do think this is a good idea.
>> I'll give it a test.
> 
> The patch passes bootstrap and testing ok and I've seen it
> improve codegen in a few places in SPEC.
> I've added a testcase all marked up.
> 
> Mike, I'll commit the attached patch in 24 hours unless somebody objects.
> 
> Thanks,
> Kyrill
> 
> 2015-06-15  Mike Stump  
> 
> * config/arm/arm.c (arm_block_move_unaligned_straight):
> Emit normal move instead of unaligned load when source or destination
> are appropriately aligned.
> 
> 2015-06-15 Mike Stump  
>Kyrylo Tkachov  
> 
> * gcc.target/arm/memcpy-aligned-1.c: New test.
> 

My only question would be whether this should be pushed down into
gen_unaligned_{load|store}si, so that all callers would benefit?

R.

>>
>> Kyrill
>>
>>> For the code you've given compiled with -O2 -mcpu=cortex-a53 I get:
>>>sub sp, sp, #8
>>>mov r2, r0
>>>add r3, sp, #8
>>>str r1, [r3, #-4]!
>>>ldr r0, [r3]@ unaligned
>>>str r0, [r2]@ unaligned
>>>add sp, sp, #8
>>>@ sp needed
>>>bx  lr
>>>
>>> whereas for -O2 -mcpu=cortex-a57 I get the much better:
>>>sub sp, sp, #8
>>>str r1, [r0]@ unaligned
>>>add sp, sp, #8
>>>@ sp needed
>>>bx  lr
>>>
>>> Kyrill
>>>
>>>
>>>> Index: gcc/config/arm/arm.c
>>>> ===
>>>> --- gcc/config/arm/arm.c(revision 223842)
>>>> +++ gcc/config/arm/arm.c(working copy)
>>>> @@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
>>>> srcoffset + j * UNITS_PER_WORD - src_autoinc);
>>>>   mem = adjust_automodify_address (srcbase, SImode, addr,
>>>>srcoffset + j * UNITS_PER_WORD);
>>>> -  emit_insn (gen_unaligned_loadsi (regs[j], mem));
>>>> +  if (src_aligned)
>>>> +emit_move_insn (regs[j], mem);
>>>> +  else
>>>> +emit_insn (gen_unaligned_loadsi (regs[j], mem));
>>>> }
>>>>   srcoffset += words * UNITS_PER_WORD;
>>>> }
>>>> @@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
>>>> dstoffset + j * UNITS_PER_WORD - dst_autoinc);
>>>>   mem = adjust_automodify_address (dstbase, SImode, addr,
>>>>dstoffset + j * UNITS_PER_WORD);
>>>> -  emit_insn (gen_unaligned_storesi (mem, regs[j]));
>>>> +  if (dst_aligned)
>>>> +emit_move_insn (mem, regs[j]);
>>>> +  else
>>>> +emit_insn (gen_unaligned_storesi (mem, regs[j]));
>>>> }
>>>>   dstoffset += words * UNITS_PER_WORD;
>>>> }
>>>>
>>>>
>>>> Ok?
>>>>
>>>> Can someone spin this through an arm test suite run for me, I was
>>>> doi

Re: arm memcpy of aligned data

2015-08-16 Thread Mike Stump
On Jun 15, 2015, at 7:30 AM, Kyrill Tkachov  wrote:
> 
> On 29/05/15 11:15, Kyrill Tkachov wrote:
>> On 29/05/15 10:08, Kyrill Tkachov wrote:
>>> Hi Mike,
>>> 
>>> On 28/05/15 22:15, Mike Stump wrote:
 So, the arm memcpy code of aligned data isn’t as good as it can be.
 
 void *memcpy(void *dest, const void *src, unsigned int n);
 
 void foo(char *dst, int i) {
 memcpy (dst, &i, sizeof (i));
 }
 
 generates horrible code, but, it we are willing to notice the src or the 
 destination are aligned, we can do much better:
 
 $ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
 -mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
 $ cat t.s
 [ … ]
 foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4
>>> I think there's something to do with cpu tuning here as well.
>> That being said, I do think this is a good idea.
>> I'll give it a test.
> 
> The patch passes bootstrap and testing ok and I've seen it
> improve codegen in a few places in SPEC.
> I've added a testcase all marked up.
> 
> Mike, I'll commit the attached patch in 24 hours unless somebody objects.

Was this ever applied?

Re: arm memcpy of aligned data

2015-08-17 Thread Kyrill Tkachov


On 16/08/15 20:01, Mike Stump wrote:

On Jun 15, 2015, at 7:30 AM, Kyrill Tkachov  wrote:

On 29/05/15 11:15, Kyrill Tkachov wrote:

On 29/05/15 10:08, Kyrill Tkachov wrote:

Hi Mike,

On 28/05/15 22:15, Mike Stump wrote:

So, the arm memcpy code of aligned data isn’t as good as it can be.

void *memcpy(void *dest, const void *src, unsigned int n);

void foo(char *dst, int i) {
 memcpy (dst, &i, sizeof (i));
}

generates horrible code, but, it we are willing to notice the src or the 
destination are aligned, we can do much better:

$ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
-mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
$ cat t.s
[ … ]
foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4

I think there's something to do with cpu tuning here as well.

That being said, I do think this is a good idea.
I'll give it a test.

The patch passes bootstrap and testing ok and I've seen it
improve codegen in a few places in SPEC.
I've added a testcase all marked up.

Mike, I'll commit the attached patch in 24 hours unless somebody objects.

Was this ever applied?


Sorry, slipped through the cracks.
Committed with r226935.

Thanks,
Kyrill