[PATCH, ARM] Constant vector permute for the Neon vext insn

2012-08-24 Thread Christophe Lyon
Hi,

The patch below enables GCC for ARM to implement relevant constant
vector permutations using the Neon vext instruction, by extending the
support currently in place for vrev, vzip, vunzip and vtrn.

For the cases where vext and vrev would lead to the same result, I
have chosen to keep using vrev to avoid updating the testsuite when
both are equivalent (1 cycle) or when vrev is faster (1 cycle when
operating on Qn vs 2 cycles for vext).

Tested with qemu-arm on arm-none-linux-gnueabi.

Christophe.

2012-08-23  Christophe Lyon  

gcc/
* config/arm/arm.c (arm_evpc_neon_vext): New
function.
(arm_expand_vec_perm_const_1): Add call to
arm_evpc_neon_vext.

gcc/testsuite/
* gcc.target/arm/neon-vext.c: New tests.


gcc-vec-permute-vext.patch
Description: Binary data


Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-08-24 Thread Richard Earnshaw
On 24/08/12 08:45, Christophe Lyon wrote:
> Hi,
> 
> The patch below enables GCC for ARM to implement relevant constant
> vector permutations using the Neon vext instruction, by extending the
> support currently in place for vrev, vzip, vunzip and vtrn.
> 
> For the cases where vext and vrev would lead to the same result, I
> have chosen to keep using vrev to avoid updating the testsuite when
> both are equivalent (1 cycle) or when vrev is faster (1 cycle when
> operating on Qn vs 2 cycles for vext).
> 
> Tested with qemu-arm on arm-none-linux-gnueabi.
> 
> Christophe.
> 
> 2012-08-23  Christophe Lyon  
> 
> gcc/
> * config/arm/arm.c (arm_evpc_neon_vext): New
> function.
> (arm_expand_vec_perm_const_1): Add call to
> arm_evpc_neon_vext.
> 
> gcc/testsuite/
> * gcc.target/arm/neon-vext.c: New tests.=
> 

Has this been tested for big-endian?

R.






Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-08-24 Thread Christophe Lyon
On 24 August 2012 10:40, Richard Earnshaw  wrote:
>
> Has this been tested for big-endian?
>
> R.

No. I'll give a look at it and let you know.

Christophe.


Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-08-27 Thread Christophe Lyon
[ Richard, sorry for the duplicate message where I omitted the mailing-list]

On 24 August 2012 10:40, Richard Earnshaw  wrote:
>
> Has this been tested for big-endian?
>

Hi,
While improving my tests and trying to turn them into execution tests,
I realized that vector initialization such as:
  uint16x4_t vec1 = {0x1234, 0x5678, 0x9abc, 0xdef0};
is endianness dependent.

However, I have noticed that other tests (such as neon-vrev.c,
neon-vset_lanes8.c, pr48252) do use such constructs and the last
two ones fail at execution in big-endian mode (the 1st one is only
compiled).

I guess that the 'right' (portable) was of initializing a vector is to
load it from an array, right?

Thanks,

Christophe.


Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-08-27 Thread Janis Johnson
On 08/27/2012 08:02 AM, Christophe Lyon wrote:
> [ Richard, sorry for the duplicate message where I omitted the mailing-list]
> 
> On 24 August 2012 10:40, Richard Earnshaw  wrote:
>>
>> Has this been tested for big-endian?
>>
> 
> Hi,
> While improving my tests and trying to turn them into execution tests,
> I realized that vector initialization such as:
>   uint16x4_t vec1 = {0x1234, 0x5678, 0x9abc, 0xdef0};
> is endianness dependent.
> 
> However, I have noticed that other tests (such as neon-vrev.c,
> neon-vset_lanes8.c, pr48252) do use such constructs and the last
> two ones fail at execution in big-endian mode (the 1st one is only
> compiled).
> 
> I guess that the 'right' (portable) was of initializing a vector is to
> load it from an array, right?

See http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01114.html for Richard
Earnshaw's suggestion on how to fix neon-vset_lanes8.c, and an alternate
suggestion for changing the compiler.

Janis


Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-08-28 Thread Christophe Lyon
On 27 August 2012 21:29, Janis Johnson  wrote:
> On 08/27/2012 08:02 AM, Christophe Lyon wrote:
>> I guess that the 'right' (portable) was of initializing a vector is to
>> load it from an array, right?
>
> See http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01114.html for Richard
> Earnshaw's suggestion on how to fix neon-vset_lanes8.c, and an alternate
> suggestion for changing the compiler.
>

Thanks for the pointer, which confirms it is much more complex than I
anticipated.

However, I still need to initialize the mask vector with GCC-vector
notation, not using vld1 otherwise the compiler cannot detect if the
mask vector is contant (and therefore suitable for an optimization).

This makes writing exhaustive, portable (big and little endian),
executable tests a painful task.

Thanks


Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-08-28 Thread Christophe Lyon
On 28 August 2012 16:20, Christophe Lyon  wrote:
> This makes writing exhaustive, portable (big and little endian),
> executable tests a painful task.
>
For instance, considering the attached sample code, I obtain a
different result in big-endian vs little-endian, while the input
values are the same.
Indeed, in both cases, the program prints:
__a[0] = 0 . __a[7] = 7
__b[0] = 8  __b[7] = 15
__mask1[0] = 2  __mask1[7] = 9
but in Little-endian, the result of builtin_shuffle(__a, __b, __mask1) is
mem[0] = 2, mem[1] = 3  mem[7] = 9
while in big-endian it is:
mem[0] =5, mem[1] = 4,  mem[5] = 0, mem[6] = 15, mem[7] = 14

What am I missing?

Thanks,

Christophe.
/* { dg-do run } */
/* { dg-require-effective-target arm_neon_ok } */
/* { dg-options "-O2" } */
/* { dg-add-options arm_neon } */

#include 
#include 
#include 

uint8x8_t
tst_vext_u8 (uint8x8_t __a, uint8x8_t __b)
{
#ifdef __ARMEL__
  uint8x8_t __mask1 = {2, 3, 4, 5, 6, 7, 8, 9};
#else
  uint8x8_t __mask1 = {9, 8, 7, 6, 5, 4, 3, 2};
#endif

  union {uint8x8_t v; uint8_t buf[8];} mem_u8x8;
  int i;

  vst1_u8(mem_u8x8.buf, __a);
  for(i=0; i<8; i++) {
fprintf(stderr, "__a[%d]=%d\n", i, mem_u8x8.buf[i]);
  }
  vst1_u8(mem_u8x8.buf, __b);
  for(i=0; i<8; i++) {
fprintf(stderr, "__b[%d]=%d\n", i, mem_u8x8.buf[i]);
  }
  vst1_u8(mem_u8x8.buf, __mask1);
  for(i=0; i<8; i++) {
fprintf(stderr, "__mask1[%d]=%d\n", i, mem_u8x8.buf[i]);
  }
  return __builtin_shuffle ( __a, __b, __mask1) ;
}

int main(void)
{
  uint8_t arr_u8x8[] = {0, 1, 2, 3, 4, 5, 6, 7};
  uint8_t arr2_u8x8[] = {8, 9, 10, 11, 12, 13, 14, 15};

  uint8x8_t vec_u8x8 = vld1_u8(arr_u8x8);
  uint8x8_t vec2_u8x8 = vld1_u8(arr2_u8x8);

  uint8x8_t result_u8x8;

  union {uint8x8_t v; uint8_t buf[8];} mem_u8x8;

  int i;

  result_u8x8 = tst_vext_u8 (vec_u8x8, vec2_u8x8);
  vst1_u8(mem_u8x8.buf, result_u8x8);

  for (i=0; i<8; i++) {
printf("mem_u8x8[%d]=%d\n", i, mem_u8x8.buf[i]);
  }

  return 0;
}


Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-08-31 Thread Christophe Lyon
On 24 August 2012 10:54, Christophe Lyon  wrote:
> On 24 August 2012 10:40, Richard Earnshaw  wrote:
>>
>> Has this been tested for big-endian?
>>
>> R.
>
> No. I'll give a look at it and let you know.
>
> Christophe.

Here is an updated patch, which now does no optimization in the
big-endian case. Given the current status of big-endian + neon
support, I guess it is not a serious problem.

I have also added runtime tests.

I will later post an additional patch (on top of this one), which
enhances the tests so that they can be run in big-endian mode too.

Christophe.

2012-08-31  Christophe Lyon  

gcc/
* config/arm/arm.c (arm_evpc_neon_vext): New
function.
(arm_expand_vec_perm_const_1): Add call to
arm_evpc_neon_vext.


gcc/testsuite/
* gcc.target/arm/neon-vext.c
gcc.target/arm/neon-vext-execute.c:
New tests.


gcc-vec-permute-vext.changelog
Description: Binary data


Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-08-31 Thread Christophe Lyon
This time with the actual patch attached.



On 31 August 2012 15:23, Christophe Lyon  wrote:
> On 24 August 2012 10:54, Christophe Lyon  wrote:
>> On 24 August 2012 10:40, Richard Earnshaw  wrote:
>>>
>>> Has this been tested for big-endian?
>>>
>>> R.
>>
>> No. I'll give a look at it and let you know.
>>
>> Christophe.
>
> Here is an updated patch, which now does no optimization in the
> big-endian case. Given the current status of big-endian + neon
> support, I guess it is not a serious problem.
>
> I have also added runtime tests.
>
> I will later post an additional patch (on top of this one), which
> enhances the tests so that they can be run in big-endian mode too.
>
> Christophe.
>
> 2012-08-31  Christophe Lyon  
>
> gcc/
> * config/arm/arm.c (arm_evpc_neon_vext): New
> function.
> (arm_expand_vec_perm_const_1): Add call to
> arm_evpc_neon_vext.
>
>
> gcc/testsuite/
> * gcc.target/arm/neon-vext.c
> gcc.target/arm/neon-vext-execute.c:
> New tests.


gcc-vec-permute-vext.patch
Description: Binary data


Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-08-31 Thread Richard Henderson
On 2012-08-31 07:25, Christophe Lyon wrote:
> +  offset = gen_rtx_CONST_INT (VOIDmode, location);

Never call gen_rtx_CONST_INT directly.  Use GEN_INT.


r~


Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-09-03 Thread Christophe Lyon
On 31 August 2012 17:59, Richard Henderson  wrote:
> On 2012-08-31 07:25, Christophe Lyon wrote:
>> +  offset = gen_rtx_CONST_INT (VOIDmode, location);
>
> Never call gen_rtx_CONST_INT directly.  Use GEN_INT.
>
>
Here is an updated patch with that small change.
For the record, there are quite a few existing calls to
gen_rtx_CONST_INT, maybe a cleanup pass is needed?

Thanks,

Christophe.

2012-09-03  Christophe Lyon  

gcc/
* config/arm/arm.c (arm_evpc_neon_vext): New
function.
(arm_expand_vec_perm_const_1): Add call to
arm_evpc_neon_vext.


gcc/testsuite/
* gcc.target/arm/neon-vext.c
gcc.target/arm/neon-vext-execute.c:
New tests.


gcc-vec-permute-vext.patch
Description: Binary data


Re: [PATCH, ARM] Constant vector permute for the Neon vext insn

2012-09-03 Thread Ramana Radhakrishnan

On 09/03/12 09:59, Christophe Lyon wrote:

On 31 August 2012 17:59, Richard Henderson  wrote:

On 2012-08-31 07:25, Christophe Lyon wrote:

+  offset = gen_rtx_CONST_INT (VOIDmode, location);


Never call gen_rtx_CONST_INT directly.  Use GEN_INT.



Here is an updated patch with that small change.
For the record, there are quite a few existing calls to
gen_rtx_CONST_INT, maybe a cleanup pass is needed?


A set of cleanup patches are welcome.

This looks OK - thanks.

Ramana