On TILE-Gx, I'm observing a degradation in inlined memcpy/memset in gcc 4.6 and later versus gcc 4.4. Though I find the problem on TILE-Gx, I think this is a problem for any architectures with SLOW_UNALIGNED_ACCESS set to 1.
Consider the following program: struct foo { int x; }; void copy(struct foo* f0, struct foo* f1) { memcpy (f0, f1, sizeof(struct foo)); } In gcc 4.4, I get the desired inline memcpy: copy: ld4s r1, r1 st4 r0, r1 jrp lr In gcc 4.7, however, I get inlined byte-by-byte copies: copy: ld1u_add r10, r1, 1 st1_add r0, r10, 1 ld1u_add r10, r1, 1 st1_add r0, r10, 1 ld1u_add r10, r1, 1 st1_add r0, r10, 1 ld1u r10, r1 st1 r0, r10 jrp lr The inlining of memcpy is done in expand_builtin_memcpy in builtins.c. Tracing through that, I see that the alignment of src_align and dest_align, which is computed by get_pointer_alignment, has degraded: in gcc 4.4 they are 32 bits, but in gcc 4.7 they are 8 bits. This causes the loads generated by the inlined memcopy to be per-byte instead of per-4-byte. Looking further, gcc 4.7 uses the "align" field in "struct ptr_info_def" to compute the alignment. This field appears to be initialized in get_ptr_info in tree-ssanames.c but it is always initialized to 1 byte and does not appear to change. gcc 4.4 computes its alignment information differently. I get the same byte-copies with gcc 4.8 and gcc 4.6. I see a couple related open PRs: 50417, 53535, but no suggested fixes for them yet. Can anyone advise on how this can be fixed? Should I file a new bug, or add this info to one of the existing PRs? Thanks, Walter