unaligned memory access

James Greenhalgh Wed, 30 Aug 2017 08:56:48 -0700

Hi Tamar,

I think the AArch64 parts of this patch can be substantially simplified.


On Mon, Jul 03, 2017 at 07:17:51AM +0100, Tamar Christina wrote:
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> ab1bdc0233afe7a3c41501cb724a5c4c719123b8..1cf6c4b891571745f740d7dbd03754bbba9264a6
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13466,7 +13466,6 @@ aarch64_copy_one_block_and_progress_pointers (rtx 
> *src, rtx *dst,
>  
>  /* Expand movmem, as if from a __builtin_memcpy.  Return true if
>     we succeed, otherwise return false.  */
> -
>  bool
>  aarch64_expand_movmem (rtx *operands)
>  {
> @@ -13498,16 +13497,55 @@ aarch64_expand_movmem (rtx *operands)
>    base = copy_to_mode_reg (Pmode, XEXP (src, 0));
>    src = adjust_automodify_address (src, VOIDmode, base, 0);
>  
> +  /* Optimize routines for MEM to REG copies.
> +     This particular function only handles copying to two
> +     destination types: 1) a regular register and 2) the stack.
> +     When writing to a regular register we may end up writting too much in 
> cases
> +     where the register already contains a live value or when no data 
> padding is
> +     happening so we disallow it.  */
> +  if (n < 8 && !REG_P (XEXP (operands[0], 0)))

I don't understand how this comment lines up with the condition on this
code path. On the one hand you say you permit regular registers, but on the
other hand you say you disallow regular registers because you may overwrite.


> +   {
> +     machine_mode mov_mode, dest_mode
> +       = smallest_mode_for_size (n * BITS_PER_UNIT, MODE_INT);
> +     rtx result = gen_reg_rtx (dest_mode);
> +     emit_insn (gen_move_insn (result, GEN_INT (0)));
> +
> +     unsigned int shift_cnt = 0;

Any reason not to just initialise the shift_cnt in place here?

> +     for (; n > shift_cnt; shift_cnt += GET_MODE_SIZE (mov_mode))


     for (unsigned int shift_cnt = 0;
          shift_cnt <=  n;
          shift_cnt += GET_MODE_SIZE (mov_mode))

Having the loop counter first in the comparison is personal preference.

mov_mode looks uninitialised, but I guess by the time you check the loop
condition you have actually initialized it.

> +       {
> +      int nearest = 0;

This isn't required, we could do the loop below with one 

> +      /* Find the mode to use.  */
> +      for (unsigned max = 1; max <= (n - shift_cnt); max *= 2)
> +           nearest = max;

Wrong formatting.

So, when this loop terminates for the first time (shift_cnt == 0) nearest
is the first power of two greater than n.

> +
> +       mov_mode = smallest_mode_for_size (nearest * BITS_PER_UNIT, MODE_INT);

That means this is always a mode of at least the right size, which in turn
means that we can't execute round the loop again (GET_MODE_SIZE (mov_mode)
will always be greater than n). So, we can be sure this loop only executes
once, and we can fold mov_mode and dest_mode to be equal.

> +       rtx reg = gen_reg_rtx (mov_mode);
> +
> +       src = adjust_address (src, mov_mode, 0);
> +       emit_insn (gen_move_insn (reg, src));
> +       src = aarch64_progress_pointer (src);
> +
> +       reg = gen_rtx_ZERO_EXTEND (dest_mode, reg);
> +       reg = gen_rtx_ASHIFT (dest_mode, reg,
> +                             GEN_INT (shift_cnt * BITS_PER_UNIT));
> +       result = gen_rtx_IOR (dest_mode, reg, result);
> +       }
> +
> +    dst = adjust_address (dst, dest_mode, 0);
> +    emit_insn (gen_move_insn (dst, result));
> +    return true;
> +  }
> +
>    /* Simple cases.  Copy 0-3 bytes, as (if applicable) a 2-byte, then a
>       1-byte chunk.  */
>    if (n < 4)
>      {
>        if (n >= 2)
> -     {
> -       aarch64_copy_one_block_and_progress_pointers (&src, &dst, HImode);
> -       n -= 2;
> -     }
>  
> +      {
> +     aarch64_copy_one_block_and_progress_pointers (&src, &dst, HImode);
> +     n -= 2;
> +      }

These formatting changes leave a blank newline between if (n >= 2) and the
remainder of this block. Why?

>        if (n == 1)
>       aarch64_copy_one_block_and_progress_pointers (&src, &dst, QImode);
>  

Thanks,
James

Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access

Reply via email to