Ping: Some remodelling of the ARM vld and vst patterns

2011-04-12 Thread Richard Sandiford
Ping for this change to the NEON vldN and vstN patterns:

http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html

Thanks,
Richard


Re: Ping: Some remodelling of the ARM vld and vst patterns

2011-04-12 Thread Nick Clifton

Hi Richard,

 gcc/
* config/arm/arm.c (arm_print_operand): Use MEM_SIZE to get the
size of a '%A' memory reference.
(T_DREG, T_QREG): New neon_builtin_type_bits.
(arm_init_neon_builtins): Assert that the load and store operands
are neon_struct_operands.
(locate_neon_builtin_icode): Provide the neon_builtin_type_bits.
(NEON_ARG_MEMORY): New builtin_arg.
(neon_dereference_pointer): New function.
(arm_expand_neon_args): Add a neon_builtin_type_bits argument.
Handle NEON_ARG_MEMORY.
(arm_expand_neon_builtin): Update after above interface changes.
Use NEON_ARG_MEMORY for loads and stores.
* config/arm/predicates.md (neon_struct_operand): New predicate.
* config/arm/iterators.md (V_two_elem): Tweak formatting.
(V_three_elem): Use BLKmode for accesses that have no associated mode.
(V_four_elem): Tweak formatting.
* config/arm/neon.md (neon_vld1mode, neon_vld1_dupmode)
(neon_vst1_lanemode, neon_vst1mode, neon_vld2mode)
(neon_vld2_lanemode, neon_vld2_dupmode, neon_vst2mode)
(neon_vst2_lanemode, neon_vld3mode, neon_vld3_lanemode)
(neon_vld3_dupmode, neon_vst3mode, neon_vst3_lanemode)
(neon_vld4mode, neon_vld4_lanemode, neon_vld4_dupmode)
(neon_vst4mode): Replace pointer operand with a memory operand.
Use %A in the output template.
(neon_vld3qamode, neon_vld3qbmode, neon_vst3qamode)
(neon_vst3qbmode, neon_vld4qamode, neon_vld4qbmode)
(neon_vst4qamode, neon_vst4qbmode): Likewise, but halve
the width of the memory access.  Remove post-increment.
* config/arm/neon-testgen.ml: Allow addresses to have an alignment.

 gcc/testsuite/
* gcc.target/arm/neon-vld3-1.c: New test.
* gcc.target/arm/neon-vst3-1.c: New test.
* gcc.target/arm/neon/v*.c: Regenerate.

Approved - please apply.

Cheers
  Nick


Re: Some remodelling of the ARM vld and vst patterns

2011-03-30 Thread Richard Sandiford
Richard Sandiford richard.sandif...@linaro.org writes:
   The ??? is saying that the V8QI-derived MEM is really a 3-byte access,
   not a 4-byte (SI) access, and so on.  The comment makes the mode sound
   like a representational niceity, but really, there's no such thing as
   a conservatively wrong memory size here.  If a store's mode is too
   small, dependent loads could be deleted as dead.  If it's too big,
   unrelated live loads could be deleted as dead.

In case it isn't obvious, I meant unrelated live stores.

Richard


Some remodelling of the ARM vld and vst patterns

2011-03-29 Thread Richard Sandiford
The patterns for the Neon vld and vst intrinsics use the following sort
of construct to refer to memory:

(mem:FOO (match_operand:SI X register_operand r))

This patch changes them to use:

(match_operand:FOO' X neon_struct_operand (=)Um)

instead.  This has some performance benefits:

- It allows the loads to use post-increment addresses as well
  as bare registers.

- If:

  /* FIXME: vld1 allows register post-modify.  */

  were fixed, it would allow register post-modify addresses too.

- It allows alignment hints to be generated.

It also more closely matches the form that future autovectorisation
optabs would have.

There are a couple of correctness fixes too:

- The old v{ld,st}{3,4}q patterns generated two individual instructions,
  each post-incrementing the address.  The problem is the expander passed
  the original register input operand to both patterns, instead of passing
  a temporary register.  We could therefore end up post-incrementing a live
  register variable.  E.g. for:

void __attribute__((noinline))
foo (uint32_t *a)
{
  uint32x4x3_t x;

  x = vld3q_u32 (a);
  x.val[0] = vaddq_u32 (x.val[0], x.val[1]);
  vst3q_u32 (a, x);
}

  the vld3q_u32 moves a forward 12 elements, so the vst3q_u32 stores
  to the wrong address.

  After the above change, we don't need to encode the post-increment
  directly.  We can just leave the auto-inc-dec pass to figure out
  a good sequence (which it does seem to do in practice).

  [tested by neon-vld3-1.c]

- At the moment, we use this mode attribute to set the modes of
  three-element loads and stores:

;; Similar, for three elements.
;; ??? Should we define extra modes so that sizes of all three-element
;; accesses can be accurately represented?
(define_mode_attr V_three_elem [(V8QI SI)   (V16QI SI)
(V4HI V4HI) (V8HI V4HI)
(V2SI V4SI) (V4SI V4SI)
(V2SF V4SF) (V4SF V4SF)
(DI EI) (V2DI EI)])

  The ??? is saying that the V8QI-derived MEM is really a 3-byte access,
  not a 4-byte (SI) access, and so on.  The comment makes the mode sound
  like a representational niceity, but really, there's no such thing as
  a conservatively wrong memory size here.  If a store's mode is too
  small, dependent loads could be deleted as dead.  If it's too big,
  unrelated live loads could be deleted as dead.

  The approach taken in the patch means that we can use BLKmode here,
  and rely on MEM_SIZE to specify the size of the access.

  One problem with using BLKmode is that it stops pre- and
  post-modifications being used.  Seeing as that wasn't possible
  before the patch either, I'd like to leave it as future work.

  [tested by neon-vst3-1.c]

At the moment, it isn't safe to use the natural alias set, because
arm_neon.h uses the same built-in function for both signed and
unsigned operations.  If this patch is OK, we could in principle
go further and add separate signed and unsigned built-in functions.
It all depends on whether uses of the API implemented by arm_neon.h
are expected to be alias-safe or not.

The patch applies on top of:

  http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01634.html

(unreviewed).

Tested on arm-linux-gnueabi.  OK to install?

Richard


gcc/
* config/arm/arm.c (arm_print_operand): Use MEM_SIZE to get the
size of a '%A' memory reference.
(T_DREG, T_QREG): New neon_builtin_type_bits.
(arm_init_neon_builtins): Assert that the load and store operands
are neon_struct_operands.
(locate_neon_builtin_icode): Provide the neon_builtin_type_bits.
(NEON_ARG_MEMORY): New builtin_arg.
(neon_dereference_pointer): New function.
(arm_expand_neon_args): Add a neon_builtin_type_bits argument.
Handle NEON_ARG_MEMORY.
(arm_expand_neon_builtin): Update after above interface changes.
Use NEON_ARG_MEMORY for loads and stores.
* config/arm/predicates.md (neon_struct_operand): New predicate.
* config/arm/iterators.md (V_two_elem): Tweak formatting.
(V_three_elem): Use BLKmode for accesses that have no associated mode.
(V_four_elem): Tweak formatting.
* config/arm/neon.md (neon_vld1mode, neon_vld1_dupmode)
(neon_vst1_lanemode, neon_vst1mode, neon_vld2mode)
(neon_vld2_lanemode, neon_vld2_dupmode, neon_vst2mode)
(neon_vst2_lanemode, neon_vld3mode, neon_vld3_lanemode)
(neon_vld3_dupmode, neon_vst3mode, neon_vst3_lanemode)
(neon_vld4mode, neon_vld4_lanemode, neon_vld4_dupmode)
(neon_vst4mode): Replace pointer operand with a memory operand.
Use %A in the output template.
(neon_vld3qamode, neon_vld3qbmode, neon_vst3qamode)
(neon_vst3qbmode, neon_vld4qamode, neon_vld4qbmode)
(neon_vst4qamode, neon_vst4qbmode): Likewise, but halve
the width of the memory access.