subject:"arm memcpy of aligned data"

arm memcpy of aligned data

2015-05-28 Thread Mike Stump

So, the arm memcpy code of aligned data isn’t as good as it can be.

void *memcpy(void *dest, const void *src, unsigned int n);

void foo(char *dst, int i) {
  memcpy (dst, &i, sizeof (i));
}

generates horrible code, but, it we are willing to notice the src or the 
destination are aligned, we can do much better:

$ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
-mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
$ cat t.s
[ … ]
foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 223842)
+++ gcc/config/arm/arm.c(working copy)
@@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
srcoffset + j * UNITS_PER_WORD - src_autoinc);
  mem = adjust_automodify_address (srcbase, SImode, addr,
   srcoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_loadsi (regs[j], mem));
+ if (src_aligned)
+   emit_move_insn (regs[j], mem);
+ else
+   emit_insn (gen_unaligned_loadsi (regs[j], mem));
}
   srcoffset += words * UNITS_PER_WORD;
 }
@@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
dstoffset + j * UNITS_PER_WORD - dst_autoinc);
  mem = adjust_automodify_address (dstbase, SImode, addr,
   dstoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_storesi (mem, regs[j]));
+ if (dst_aligned)
+   emit_move_insn (mem, regs[j]);
+ else
+   emit_insn (gen_unaligned_storesi (mem, regs[j]));
}
   dstoffset += words * UNITS_PER_WORD;
 }


Ok?

Can someone spin this through an arm test suite run for me, I was doing this by 
inspection and cross compile on a system with no arm bits.  Bonus points if you 
can check it in with the test case above marked up as appropriate.

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 223842)
+++ gcc/config/arm/arm.c(working copy)
@@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
srcoffset + j * UNITS_PER_WORD - src_autoinc);
  mem = adjust_automodify_address (srcbase, SImode, addr,
   srcoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_loadsi (regs[j], mem));
+ if (src_aligned)
+   emit_move_insn (regs[j], mem);
+ else
+   emit_insn (gen_unaligned_loadsi (regs[j], mem));
}
   srcoffset += words * UNITS_PER_WORD;
 }
@@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
dstoffset + j * UNITS_PER_WORD - dst_autoinc);
  mem = adjust_automodify_address (dstbase, SImode, addr,
   dstoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_storesi (mem, regs[j]));
+ if (dst_aligned)
+   emit_move_insn (mem, regs[j]);
+ else
+   emit_insn (gen_unaligned_storesi (mem, regs[j]));
}
   dstoffset += words * UNITS_PER_WORD;
 }

Re: arm memcpy of aligned data

2015-05-28 Thread Oleg Endo


On 28 May 2015, at 23:15, Mike Stump  wrote:

> So, the arm memcpy code of aligned data isn’t as good as it can be.
> 
> void *memcpy(void *dest, const void *src, unsigned int n);
> 
> void foo(char *dst, int i) {
>  memcpy (dst, &i, sizeof (i));
> }
> 
> generates horrible code, but, it we are willing to notice the src or the 
> destination are aligned, we can do much better:
> 

This looks like PR 50417, doesn't it?

Cheers,
Oleg





> $ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
> -mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
> $ cat t.s
> [ … ]
> foo:
>   @ args = 0, pretend = 0, frame = 4
>   @ frame_needed = 0, uses_anonymous_args = 0
>   @ link register save eliminated.
>   sub sp, sp, #4
>   str r1, [r0]@ unaligned
>   add sp, sp, #4
> 
> Index: gcc/config/arm/arm.c
> ===
> --- gcc/config/arm/arm.c  (revision 223842)
> +++ gcc/config/arm/arm.c  (working copy)
> @@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
>   srcoffset + j * UNITS_PER_WORD - src_autoinc);
> mem = adjust_automodify_address (srcbase, SImode, addr,
>  srcoffset + j * UNITS_PER_WORD);
> -   emit_insn (gen_unaligned_loadsi (regs[j], mem));
> +   if (src_aligned)
> + emit_move_insn (regs[j], mem);
> +   else
> + emit_insn (gen_unaligned_loadsi (regs[j], mem));
>   }
>   srcoffset += words * UNITS_PER_WORD;
> }
> @@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
>   dstoffset + j * UNITS_PER_WORD - dst_autoinc);
> mem = adjust_automodify_address (dstbase, SImode, addr,
>  dstoffset + j * UNITS_PER_WORD);
> -   emit_insn (gen_unaligned_storesi (mem, regs[j]));
> +   if (dst_aligned)
> + emit_move_insn (mem, regs[j]);
> +   else
> + emit_insn (gen_unaligned_storesi (mem, regs[j]));
>   }
>   dstoffset += words * UNITS_PER_WORD;
> }
> 
> 
> Ok?
> 
> Can someone spin this through an arm test suite run for me, I was doing this 
> by inspection and cross compile on a system with no arm bits.  Bonus points 
> if you can check it in with the test case above marked up as appropriate.
> 
>

Re: arm memcpy of aligned data

2015-05-29 Thread Kyrill Tkachov


Hi Mike,

On 28/05/15 22:15, Mike Stump wrote:

So, the arm memcpy code of aligned data isn’t as good as it can be.

void *memcpy(void *dest, const void *src, unsigned int n);

void foo(char *dst, int i) {
   memcpy (dst, &i, sizeof (i));
}

generates horrible code, but, it we are willing to notice the src or the 
destination are aligned, we can do much better:

$ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
-mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
$ cat t.s
[ … ]
foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4


I think there's something to do with cpu tuning here as well.
For the code you've given compiled with -O2 -mcpu=cortex-a53 I get:
sub sp, sp, #8
mov r2, r0
add r3, sp, #8
str r1, [r3, #-4]!
ldr r0, [r3]@ unaligned
str r0, [r2]@ unaligned
add sp, sp, #8
@ sp needed
bx  lr

whereas for -O2 -mcpu=cortex-a57 I get the much better:
sub sp, sp, #8
str r1, [r0]@ unaligned
add sp, sp, #8
@ sp needed
bx  lr

Kyrill




Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 223842)
+++ gcc/config/arm/arm.c(working copy)
@@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
srcoffset + j * UNITS_PER_WORD - src_autoinc);
  mem = adjust_automodify_address (srcbase, SImode, addr,
   srcoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_loadsi (regs[j], mem));
+ if (src_aligned)
+   emit_move_insn (regs[j], mem);
+ else
+   emit_insn (gen_unaligned_loadsi (regs[j], mem));
}
srcoffset += words * UNITS_PER_WORD;
  }
@@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
dstoffset + j * UNITS_PER_WORD - dst_autoinc);
  mem = adjust_automodify_address (dstbase, SImode, addr,
   dstoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_storesi (mem, regs[j]));
+ if (dst_aligned)
+   emit_move_insn (mem, regs[j]);
+ else
+   emit_insn (gen_unaligned_storesi (mem, regs[j]));
}
dstoffset += words * UNITS_PER_WORD;
  }


Ok?

Can someone spin this through an arm test suite run for me, I was doing this by 
inspection and cross compile on a system with no arm bits.  Bonus points if you 
can check it in with the test case above marked up as appropriate.

Re: arm memcpy of aligned data

2015-05-29 Thread Kyrill Tkachov



On 29/05/15 10:08, Kyrill Tkachov wrote:

Hi Mike,

On 28/05/15 22:15, Mike Stump wrote:

So, the arm memcpy code of aligned data isn’t as good as it can be.

void *memcpy(void *dest, const void *src, unsigned int n);

void foo(char *dst, int i) {
memcpy (dst, &i, sizeof (i));
}

generates horrible code, but, it we are willing to notice the src or the 
destination are aligned, we can do much better:

$ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
-mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
$ cat t.s
[ … ]
foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4

I think there's something to do with cpu tuning here as well.


That being said, I do think this is a good idea.
I'll give it a test.

Kyrill


For the code you've given compiled with -O2 -mcpu=cortex-a53 I get:
  sub sp, sp, #8
  mov r2, r0
  add r3, sp, #8
  str r1, [r3, #-4]!
  ldr r0, [r3]@ unaligned
  str r0, [r2]@ unaligned
  add sp, sp, #8
  @ sp needed
  bx  lr

whereas for -O2 -mcpu=cortex-a57 I get the much better:
  sub sp, sp, #8
  str r1, [r0]@ unaligned
  add sp, sp, #8
  @ sp needed
  bx  lr

Kyrill



Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 223842)
+++ gcc/config/arm/arm.c(working copy)
@@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
srcoffset + j * UNITS_PER_WORD - src_autoinc);
  mem = adjust_automodify_address (srcbase, SImode, addr,
   srcoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_loadsi (regs[j], mem));
+ if (src_aligned)
+   emit_move_insn (regs[j], mem);
+ else
+   emit_insn (gen_unaligned_loadsi (regs[j], mem));
}
 srcoffset += words * UNITS_PER_WORD;
   }
@@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
dstoffset + j * UNITS_PER_WORD - dst_autoinc);
  mem = adjust_automodify_address (dstbase, SImode, addr,
   dstoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_storesi (mem, regs[j]));
+ if (dst_aligned)
+   emit_move_insn (mem, regs[j]);
+ else
+   emit_insn (gen_unaligned_storesi (mem, regs[j]));
}
 dstoffset += words * UNITS_PER_WORD;
   }


Ok?

Can someone spin this through an arm test suite run for me, I was doing this by 
inspection and cross compile on a system with no arm bits.  Bonus points if you 
can check it in with the test case above marked up as appropriate.

Re: arm memcpy of aligned data

2015-06-15 Thread Kyrill Tkachov



On 29/05/15 11:15, Kyrill Tkachov wrote:

On 29/05/15 10:08, Kyrill Tkachov wrote:

Hi Mike,

On 28/05/15 22:15, Mike Stump wrote:

So, the arm memcpy code of aligned data isn’t as good as it can be.

void *memcpy(void *dest, const void *src, unsigned int n);

void foo(char *dst, int i) {
 memcpy (dst, &i, sizeof (i));
}

generates horrible code, but, it we are willing to notice the src or the 
destination are aligned, we can do much better:

$ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
-mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
$ cat t.s
[ … ]
foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4

I think there's something to do with cpu tuning here as well.

That being said, I do think this is a good idea.
I'll give it a test.


The patch passes bootstrap and testing ok and I've seen it
improve codegen in a few places in SPEC.
I've added a testcase all marked up.

Mike, I'll commit the attached patch in 24 hours unless somebody objects.

Thanks,
Kyrill

2015-06-15  Mike Stump  

* config/arm/arm.c (arm_block_move_unaligned_straight):
Emit normal move instead of unaligned load when source or destination
are appropriately aligned.

2015-06-15 Mike Stump  
   Kyrylo Tkachov  

* gcc.target/arm/memcpy-aligned-1.c: New test.



Kyrill


For the code you've given compiled with -O2 -mcpu=cortex-a53 I get:
   sub sp, sp, #8
   mov r2, r0
   add r3, sp, #8
   str r1, [r3, #-4]!
   ldr r0, [r3]@ unaligned
   str r0, [r2]@ unaligned
   add sp, sp, #8
   @ sp needed
   bx  lr

whereas for -O2 -mcpu=cortex-a57 I get the much better:
   sub sp, sp, #8
   str r1, [r0]@ unaligned
   add sp, sp, #8
   @ sp needed
   bx  lr

Kyrill



Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 223842)
+++ gcc/config/arm/arm.c(working copy)
@@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
srcoffset + j * UNITS_PER_WORD - src_autoinc);
  mem = adjust_automodify_address (srcbase, SImode, addr,
   srcoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_loadsi (regs[j], mem));
+ if (src_aligned)
+   emit_move_insn (regs[j], mem);
+ else
+   emit_insn (gen_unaligned_loadsi (regs[j], mem));
}
  srcoffset += words * UNITS_PER_WORD;
}
@@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
dstoffset + j * UNITS_PER_WORD - dst_autoinc);
  mem = adjust_automodify_address (dstbase, SImode, addr,
   dstoffset + j * UNITS_PER_WORD);
- emit_insn (gen_unaligned_storesi (mem, regs[j]));
+ if (dst_aligned)
+   emit_move_insn (mem, regs[j]);
+ else
+   emit_insn (gen_unaligned_storesi (mem, regs[j]));
}
  dstoffset += words * UNITS_PER_WORD;
}


Ok?

Can someone spin this through an arm test suite run for me, I was doing this by 
inspection and cross compile on a system with no arm bits.  Bonus points if you 
can check it in with the test case above marked up as appropriate.



commit 77191f4224c8729d014a9150bd9364f95ff704b0
Author: Kyrylo Tkachov 
Date:   Fri May 29 10:44:21 2015 +0100

    [ARM] arm memcpy of aligned data

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 638d659..3a33c26 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -14283,7 +14283,10 @@ arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase,
 srcoffset + j * UNITS_PER_WORD - src_autoinc);
 	  mem = adjust_automodify_address (srcbase, SImode, addr,
 	   srcoffset + j * UNITS_PER_WORD);
-	  emit_insn (gen_unaligned_loadsi (regs[j], mem));
+	  if (src_aligned)
+	emit_move_insn (regs[j], mem);
+	  else
+	emit_insn (gen_unaligned_loadsi (regs[j], mem));
 	}
   srcoffset += words * UNITS_PER_WORD;
 }
@@ -14302,7 +14305,10 @@ arm_block_move_unaligned_straight (rtx dstbase, rtx srcbase,
 dstoffset + j * UNITS_PER_WORD - dst_autoinc);
 	  mem = adjust_automodify_address (dstbase, SImode, addr,
 	   dstoffset + j * UNITS_PER_WORD);
-	  emit_insn (gen_unaligned_storesi (mem, regs[j]));
+	  if (dst_aligned)
+	emit_move_insn (mem, regs[j]);
+	  else
+	emit_insn (gen_unaligned_storesi (mem, regs[j]));
 	}
   dstoffset += words * UNITS_PER_WORD;
 }
diff --git a/gcc/testsuite/gcc.target/arm/memcpy-aligned-1.c b/gcc/testsuite/gcc.target/ar

Re: arm memcpy of aligned data

2015-06-15 Thread Richard Earnshaw

On 15/06/15 15:30, Kyrill Tkachov wrote:
> 
> On 29/05/15 11:15, Kyrill Tkachov wrote:
>> On 29/05/15 10:08, Kyrill Tkachov wrote:
>>> Hi Mike,
>>>
>>> On 28/05/15 22:15, Mike Stump wrote:
>>>> So, the arm memcpy code of aligned data isn’t as good as it can be.
>>>>
>>>> void *memcpy(void *dest, const void *src, unsigned int n);
>>>>
>>>> void foo(char *dst, int i) {
>>>>  memcpy (dst, &i, sizeof (i));
>>>> }
>>>>
>>>> generates horrible code, but, it we are willing to notice the src or
>>>> the destination are aligned, we can do much better:
>>>>
>>>> $ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve
>>>> -mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
>>>> $ cat t.s
>>>> [ … ]
>>>> foo:
>>>> @ args = 0, pretend = 0, frame = 4
>>>> @ frame_needed = 0, uses_anonymous_args = 0
>>>> @ link register save eliminated.
>>>> subsp, sp, #4
>>>> strr1, [r0]@ unaligned
>>>> addsp, sp, #4
>>> I think there's something to do with cpu tuning here as well.
>> That being said, I do think this is a good idea.
>> I'll give it a test.
> 
> The patch passes bootstrap and testing ok and I've seen it
> improve codegen in a few places in SPEC.
> I've added a testcase all marked up.
> 
> Mike, I'll commit the attached patch in 24 hours unless somebody objects.
> 
> Thanks,
> Kyrill
> 
> 2015-06-15  Mike Stump  
> 
> * config/arm/arm.c (arm_block_move_unaligned_straight):
> Emit normal move instead of unaligned load when source or destination
> are appropriately aligned.
> 
> 2015-06-15 Mike Stump  
>Kyrylo Tkachov  
> 
> * gcc.target/arm/memcpy-aligned-1.c: New test.
> 

My only question would be whether this should be pushed down into
gen_unaligned_{load|store}si, so that all callers would benefit?

R.

>>
>> Kyrill
>>
>>> For the code you've given compiled with -O2 -mcpu=cortex-a53 I get:
>>>sub sp, sp, #8
>>>mov r2, r0
>>>add r3, sp, #8
>>>str r1, [r3, #-4]!
>>>ldr r0, [r3]@ unaligned
>>>str r0, [r2]@ unaligned
>>>add sp, sp, #8
>>>@ sp needed
>>>bx  lr
>>>
>>> whereas for -O2 -mcpu=cortex-a57 I get the much better:
>>>sub sp, sp, #8
>>>str r1, [r0]@ unaligned
>>>add sp, sp, #8
>>>@ sp needed
>>>bx  lr
>>>
>>> Kyrill
>>>
>>>
>>>> Index: gcc/config/arm/arm.c
>>>> ===
>>>> --- gcc/config/arm/arm.c(revision 223842)
>>>> +++ gcc/config/arm/arm.c(working copy)
>>>> @@ -14376,7 +14376,10 @@ arm_block_move_unaligned_straight (rtx d
>>>> srcoffset + j * UNITS_PER_WORD - src_autoinc);
>>>>   mem = adjust_automodify_address (srcbase, SImode, addr,
>>>>srcoffset + j * UNITS_PER_WORD);
>>>> -  emit_insn (gen_unaligned_loadsi (regs[j], mem));
>>>> +  if (src_aligned)
>>>> +emit_move_insn (regs[j], mem);
>>>> +  else
>>>> +emit_insn (gen_unaligned_loadsi (regs[j], mem));
>>>> }
>>>>   srcoffset += words * UNITS_PER_WORD;
>>>> }
>>>> @@ -14395,7 +14398,10 @@ arm_block_move_unaligned_straight (rtx d
>>>> dstoffset + j * UNITS_PER_WORD - dst_autoinc);
>>>>   mem = adjust_automodify_address (dstbase, SImode, addr,
>>>>dstoffset + j * UNITS_PER_WORD);
>>>> -  emit_insn (gen_unaligned_storesi (mem, regs[j]));
>>>> +  if (dst_aligned)
>>>> +emit_move_insn (mem, regs[j]);
>>>> +  else
>>>> +emit_insn (gen_unaligned_storesi (mem, regs[j]));
>>>> }
>>>>   dstoffset += words * UNITS_PER_WORD;
>>>> }
>>>>
>>>>
>>>> Ok?
>>>>
>>>> Can someone spin this through an arm test suite run for me, I was
>>>> doi

Re: arm memcpy of aligned data

2015-08-16 Thread Mike Stump

On Jun 15, 2015, at 7:30 AM, Kyrill Tkachov  wrote:
> 
> On 29/05/15 11:15, Kyrill Tkachov wrote:
>> On 29/05/15 10:08, Kyrill Tkachov wrote:
>>> Hi Mike,
>>> 
>>> On 28/05/15 22:15, Mike Stump wrote:
 So, the arm memcpy code of aligned data isn’t as good as it can be.
 
 void *memcpy(void *dest, const void *src, unsigned int n);
 
 void foo(char *dst, int i) {
 memcpy (dst, &i, sizeof (i));
 }
 
 generates horrible code, but, it we are willing to notice the src or the 
 destination are aligned, we can do much better:
 
 $ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
 -mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
 $ cat t.s
 [ … ]
 foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4
>>> I think there's something to do with cpu tuning here as well.
>> That being said, I do think this is a good idea.
>> I'll give it a test.
> 
> The patch passes bootstrap and testing ok and I've seen it
> improve codegen in a few places in SPEC.
> I've added a testcase all marked up.
> 
> Mike, I'll commit the attached patch in 24 hours unless somebody objects.

Was this ever applied?

Re: arm memcpy of aligned data

2015-08-17 Thread Kyrill Tkachov



On 16/08/15 20:01, Mike Stump wrote:

On Jun 15, 2015, at 7:30 AM, Kyrill Tkachov  wrote:

On 29/05/15 11:15, Kyrill Tkachov wrote:

On 29/05/15 10:08, Kyrill Tkachov wrote:

Hi Mike,

On 28/05/15 22:15, Mike Stump wrote:

So, the arm memcpy code of aligned data isn’t as good as it can be.

void *memcpy(void *dest, const void *src, unsigned int n);

void foo(char *dst, int i) {
 memcpy (dst, &i, sizeof (i));
}

generates horrible code, but, it we are willing to notice the src or the 
destination are aligned, we can do much better:

$ ./cc1 -fschedule-fusion -fdump-tree-all-all -da -march=armv7ve 
-mcpu=cortex-m4 -fomit-frame-pointer -quiet -O2 /tmp/t.c -o t.s
$ cat t.s
[ … ]
foo:
@ args = 0, pretend = 0, frame = 4
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
sub sp, sp, #4
str r1, [r0]@ unaligned
add sp, sp, #4

I think there's something to do with cpu tuning here as well.

That being said, I do think this is a good idea.
I'll give it a test.

The patch passes bootstrap and testing ok and I've seen it
improve codegen in a few places in SPEC.
I've added a testcase all marked up.

Mike, I'll commit the attached patch in 24 hours unless somebody objects.

Was this ever applied?


Sorry, slipped through the cracks.
Committed with r226935.

Thanks,
Kyrill

arm memcpy of aligned data

Re: arm memcpy of aligned data

Re: arm memcpy of aligned data

Re: arm memcpy of aligned data

Re: arm memcpy of aligned data

Re: arm memcpy of aligned data

Re: arm memcpy of aligned data

Re: arm memcpy of aligned data

8 matches

Site Navigation

Mail list logo

Footer information