Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

2020-03-09 Thread Christophe Lyon
On Fri, 6 Mar 2020 at 11:46, Kyrill Tkachov  wrote:
>
> Hi Delia,
>
> On 3/5/20 4:38 PM, Delia Burduv wrote:
> > Hi,
> >
> > This is the latest version of the patch. I am forcing -mfloat-abi=hard
> > because the code generated is slightly differently depending on the
> > float-abi used.
>
>
> Thanks, I've pushed it with an updated ChangeLog.
>
> 2020-03-06  Delia Burduv  
>
>  * config/arm/arm_neon.h (vld2_bf16): New.
>  (vld2q_bf16): New.
>  (vld3_bf16): New.
>  (vld3q_bf16): New.
>  (vld4_bf16): New.
>  (vld4q_bf16): New.
>  (vld2_dup_bf16): New.
>  (vld2q_dup_bf16): New.
>  (vld3_dup_bf16): New.
>  (vld3q_dup_bf16): New.
>  (vld4_dup_bf16): New.
>  (vld4q_dup_bf16): New.
>  * config/arm/arm_neon_builtins.def
>  (vld2): Changed to VAR13 and added v4bf, v8bf
>  (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>  (vld3): Changed to VAR13 and added v4bf, v8bf
>  (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>  (vld4): Changed to VAR13 and added v4bf, v8bf
>  (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>  * config/arm/iterators.md (VDXBF2): New iterator.
>  *config/arm/neon.md (neon_vld2): Use new iterators.
>  (neon_vld2_dup  (neon_vld3): Likewise.
>  (neon_vld3qa): Likewise.
>  (neon_vld3qb): Likewise.
>  (neon_vld3_dup): Likewise.
>  (neon_vld4): Likewise.
>  (neon_vld4qa): Likewise.
>  (neon_vld4qb): Likewise.
>  (neon_vld4_dup): Likewise.
>  (neon_vld2_dupv8bf): New.
>  (neon_vld3_dupv8bf): Likewise.
>  (neon_vld4_dupv8bf): Likewise.
>
> Kyrill

Hi!

There's a problem with the arm_neon.h update.
on arm-none-linux-gnueabihf, there is a regression on
g++.dg/other/pr54300.C and g++.dg/other/pr55073.C, because:
FAIL: g++.dg/other/pr54300.C  -std=gnu++98 (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19565:39:
error: cannot convert 'const short int*' to 'const __bf16*'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:19574:39:
error: cannot convert 'const short int*' to 'const __bf16*'
[]

The same problem makes a lot (~365) of tests become unsupported on
arm-none-linux-gnueabi:
g++.dg/abi/mangle-arm-crypto.C
g++.dg/abi/mangle-neon.C

Can you fix it?

Thanks

Christophe

>
>
> >
> > Thanks,
> > Delia
> >
> > On 3/4/20 5:20 PM, Kyrill Tkachov wrote:
> >> Hi Delia,
> >>
> >> On 3/4/20 2:05 PM, Delia Burduv wrote:
> >>> Hi,
> >>>
> >>> The previous version of this patch shared part of its code with the
> >>> store intrinsics patch
> >>> (https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
> >>> any duplicated code. This patch now depends on the previously mentioned
> >>> store intrinsics patch.
> >>>
> >>> Here is the latest version and the updated ChangeLog.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> 2019-03-04  Delia Burduv  
> >>>
> >>> * config/arm/arm_neon.h (bfloat16_t): New typedef.
> >>>  (vld2_bf16): New.
> >>> (vld2q_bf16): New.
> >>> (vld3_bf16): New.
> >>> (vld3q_bf16): New.
> >>> (vld4_bf16): New.
> >>> (vld4q_bf16): New.
> >>> (vld2_dup_bf16): New.
> >>> (vld2q_dup_bf16): New.
> >>>  (vld3_dup_bf16): New.
> >>> (vld3q_dup_bf16): New.
> >>> (vld4_dup_bf16): New.
> >>> (vld4q_dup_bf16): New.
> >>>  * config/arm/arm_neon_builtins.def
> >>>  (vld2): Changed to VAR13 and added v4bf, v8bf
> >>>  (vld2_dup): Changed to VAR8 and added v4bf, v8bf
> >>>  (vld3): Changed to VAR13 and added v4bf, v8bf
> >>>  (vld3_dup): Changed to VAR8 and added v4bf, v8bf
> >>>  (vld4): Changed to VAR13 and added v4bf, v8bf
> >>>  (vld4_dup): Changed to VAR8 and added v4bf, v8bf
> >>>  * config/arm/iterators.md (VDXBF): New iterator.
> >>>  (VQ2BF): New iterator.
> >>>  *config/arm/neon.md (vld2): Used new iterators.
> >>>  (vld2_dup): Used new iterators.
> >>>  (vld2_dupv8bf): New.
> >>>  (vst3): Used new iterators.
> >>>  (vst3qa): Used new iterators.
> >>>  (vst3qb): Used new iterators.
> >>>  (vld3_dup): Used new iterators.
> >>>  (vld3_dupv8bf): New.
> >>>  (vst4): Used new iterators.
> >>>  (vst4qa): Used new iterators.
> >>>  (vst4qb): Used new iterators.
> >>>  (vld4_dup): Used new iterators.
> >>>  (vld4_dupv8bf): New.
> >>>
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> 2019-03-04  Delia Burduv  
> >>>
> >>> * gcc.target/arm/simd/bf16_vldn_1.c: New test.
> >>>
> >>> Thanks,
> >>> Delia
> >>>
> >>> On 2/19/20 5:25 PM, Delia Burduv wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > Here is the latest version of the patch. It just has some minor
> >>> > formatting changes that were brought up by Richard Sandiford in the
> >>> > AArch64 patches
> >>> >
> >>> > 

Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

2020-03-06 Thread Kyrill Tkachov

Hi Delia,

On 3/5/20 4:38 PM, Delia Burduv wrote:

Hi,

This is the latest version of the patch. I am forcing -mfloat-abi=hard 
because the code generated is slightly differently depending on the 
float-abi used.



Thanks, I've pushed it with an updated ChangeLog.

2020-03-06  Delia Burduv  

    * config/arm/arm_neon.h (vld2_bf16): New.
    (vld2q_bf16): New.
    (vld3_bf16): New.
    (vld3q_bf16): New.
    (vld4_bf16): New.
    (vld4q_bf16): New.
    (vld2_dup_bf16): New.
    (vld2q_dup_bf16): New.
    (vld3_dup_bf16): New.
    (vld3q_dup_bf16): New.
    (vld4_dup_bf16): New.
    (vld4q_dup_bf16): New.
    * config/arm/arm_neon_builtins.def
    (vld2): Changed to VAR13 and added v4bf, v8bf
    (vld2_dup): Changed to VAR8 and added v4bf, v8bf
    (vld3): Changed to VAR13 and added v4bf, v8bf
    (vld3_dup): Changed to VAR8 and added v4bf, v8bf
    (vld4): Changed to VAR13 and added v4bf, v8bf
    (vld4_dup): Changed to VAR8 and added v4bf, v8bf
    * config/arm/iterators.md (VDXBF2): New iterator.
    *config/arm/neon.md (neon_vld2): Use new iterators.
    (neon_vld2_dup): Likewise.
    (neon_vld3qa): Likewise.
    (neon_vld3qb): Likewise.
    (neon_vld3_dup): Likewise.
    (neon_vld4): Likewise.
    (neon_vld4qa): Likewise.
    (neon_vld4qb): Likewise.
    (neon_vld4_dup): Likewise.
    (neon_vld2_dupv8bf): New.
    (neon_vld3_dupv8bf): Likewise.
    (neon_vld4_dupv8bf): Likewise.

Kyrill




Thanks,
Delia

On 3/4/20 5:20 PM, Kyrill Tkachov wrote:

Hi Delia,

On 3/4/20 2:05 PM, Delia Burduv wrote:

Hi,

The previous version of this patch shared part of its code with the
store intrinsics patch
(https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
any duplicated code. This patch now depends on the previously mentioned
store intrinsics patch.

Here is the latest version and the updated ChangeLog.

gcc/ChangeLog:

2019-03-04  Delia Burduv  

    * config/arm/arm_neon.h (bfloat16_t): New typedef.
 (vld2_bf16): New.
    (vld2q_bf16): New.
    (vld3_bf16): New.
    (vld3q_bf16): New.
    (vld4_bf16): New.
    (vld4q_bf16): New.
    (vld2_dup_bf16): New.
    (vld2q_dup_bf16): New.
 (vld3_dup_bf16): New.
    (vld3q_dup_bf16): New.
    (vld4_dup_bf16): New.
    (vld4q_dup_bf16): New.
 * config/arm/arm_neon_builtins.def
 (vld2): Changed to VAR13 and added v4bf, v8bf
 (vld2_dup): Changed to VAR8 and added v4bf, v8bf
 (vld3): Changed to VAR13 and added v4bf, v8bf
 (vld3_dup): Changed to VAR8 and added v4bf, v8bf
 (vld4): Changed to VAR13 and added v4bf, v8bf
 (vld4_dup): Changed to VAR8 and added v4bf, v8bf
 * config/arm/iterators.md (VDXBF): New iterator.
 (VQ2BF): New iterator.
 *config/arm/neon.md (vld2): Used new iterators.
 (vld2_dup): Used new iterators.
 (vld2_dupv8bf): New.
 (vst3): Used new iterators.
 (vst3qa): Used new iterators.
 (vst3qb): Used new iterators.
 (vld3_dup): Used new iterators.
 (vld3_dupv8bf): New.
 (vst4): Used new iterators.
 (vst4qa): Used new iterators.
 (vst4qb): Used new iterators.
 (vld4_dup): Used new iterators.
 (vld4_dupv8bf): New.


gcc/testsuite/ChangeLog:

2019-03-04  Delia Burduv  

    * gcc.target/arm/simd/bf16_vldn_1.c: New test.

Thanks,
Delia

On 2/19/20 5:25 PM, Delia Burduv wrote:
>
> Hi,
>
> Here is the latest version of the patch. It just has some minor
> formatting changes that were brought up by Richard Sandiford in the
> AArch64 patches
>
> Thanks,
> Delia
>
> On 1/22/20 5:31 PM, Delia Burduv wrote:
>> Ping.
>>
>> I will change the tests to use the exact input and output 
registers as

>> Richard Sandiford suggested for the AArch64 patches.
>>
>> On 12/20/19 6:48 PM, Delia Burduv wrote:
>>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics
>>> vld{q}_bf16 as part of the BFloat16 extension.
>>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


>>>
>>> The intrinsics are declared in arm_neon.h .
>>> A new test is added to check assembler output.
>>>
>>> This patch depends on the Arm back-end patche.
>>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>>
>>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>>> have commit rights, so if this is ok can someone please commit 
it for

>>> me?
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-11-14  Delia Burduv 
>>>
>>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>>  (bfloat16x4x2_t): New typedef.
>>>  (bfloat16x8x2_t): New typedef.
>>>  (bfloat16x4x3_t): New typedef.
>>>  (bfloat16x8x3_t): New typedef.
>>>  (bfloat16x4x4_t): New typedef.
>>>  (bfloat16x8x4_t): New typedef.
>>>  (vld2_bf16): New.
>>>  (vld2q_bf16): New.
>>>  (vld3_bf16): New.
>>>  (vld3q_bf16): New.
>>>  (vld4_bf16): New.
>>>  (vld4q_bf16): New.
>>> 

Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

2020-03-05 Thread Delia Burduv

Hi,

This is the latest version of the patch. I am forcing -mfloat-abi=hard 
because the code generated is slightly differently depending on the 
float-abi used.


Thanks,
Delia

On 3/4/20 5:20 PM, Kyrill Tkachov wrote:

Hi Delia,

On 3/4/20 2:05 PM, Delia Burduv wrote:

Hi,

The previous version of this patch shared part of its code with the
store intrinsics patch
(https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
any duplicated code. This patch now depends on the previously mentioned
store intrinsics patch.

Here is the latest version and the updated ChangeLog.

gcc/ChangeLog:

2019-03-04  Delia Burduv  

    * config/arm/arm_neon.h (bfloat16_t): New typedef.
 (vld2_bf16): New.
    (vld2q_bf16): New.
    (vld3_bf16): New.
    (vld3q_bf16): New.
    (vld4_bf16): New.
    (vld4q_bf16): New.
    (vld2_dup_bf16): New.
    (vld2q_dup_bf16): New.
 (vld3_dup_bf16): New.
    (vld3q_dup_bf16): New.
    (vld4_dup_bf16): New.
    (vld4q_dup_bf16): New.
 * config/arm/arm_neon_builtins.def
 (vld2): Changed to VAR13 and added v4bf, v8bf
 (vld2_dup): Changed to VAR8 and added v4bf, v8bf
 (vld3): Changed to VAR13 and added v4bf, v8bf
 (vld3_dup): Changed to VAR8 and added v4bf, v8bf
 (vld4): Changed to VAR13 and added v4bf, v8bf
 (vld4_dup): Changed to VAR8 and added v4bf, v8bf
 * config/arm/iterators.md (VDXBF): New iterator.
 (VQ2BF): New iterator.
 *config/arm/neon.md (vld2): Used new iterators.
 (vld2_dup): Used new iterators.
 (vld2_dupv8bf): New.
 (vst3): Used new iterators.
 (vst3qa): Used new iterators.
 (vst3qb): Used new iterators.
 (vld3_dup): Used new iterators.
 (vld3_dupv8bf): New.
 (vst4): Used new iterators.
 (vst4qa): Used new iterators.
 (vst4qb): Used new iterators.
 (vld4_dup): Used new iterators.
 (vld4_dupv8bf): New.


gcc/testsuite/ChangeLog:

2019-03-04  Delia Burduv  

    * gcc.target/arm/simd/bf16_vldn_1.c: New test.

Thanks,
Delia

On 2/19/20 5:25 PM, Delia Burduv wrote:
>
> Hi,
>
> Here is the latest version of the patch. It just has some minor
> formatting changes that were brought up by Richard Sandiford in the
> AArch64 patches
>
> Thanks,
> Delia
>
> On 1/22/20 5:31 PM, Delia Burduv wrote:
>> Ping.
>>
>> I will change the tests to use the exact input and output registers as
>> Richard Sandiford suggested for the AArch64 patches.
>>
>> On 12/20/19 6:48 PM, Delia Burduv wrote:
>>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics
>>> vld{q}_bf16 as part of the BFloat16 extension.
>>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


>>>
>>> The intrinsics are declared in arm_neon.h .
>>> A new test is added to check assembler output.
>>>
>>> This patch depends on the Arm back-end patche.
>>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>>
>>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>>> have commit rights, so if this is ok can someone please commit it for
>>> me?
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-11-14  Delia Burduv 
>>>
>>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>>  (bfloat16x4x2_t): New typedef.
>>>  (bfloat16x8x2_t): New typedef.
>>>  (bfloat16x4x3_t): New typedef.
>>>  (bfloat16x8x3_t): New typedef.
>>>  (bfloat16x4x4_t): New typedef.
>>>  (bfloat16x8x4_t): New typedef.
>>>  (vld2_bf16): New.
>>>  (vld2q_bf16): New.
>>>  (vld3_bf16): New.
>>>  (vld3q_bf16): New.
>>>  (vld4_bf16): New.
>>>  (vld4q_bf16): New.
>>>  (vld2_dup_bf16): New.
>>>  (vld2q_dup_bf16): New.
>>>   (vld3_dup_bf16): New.
>>>  (vld3q_dup_bf16): New.
>>>  (vld4_dup_bf16): New.
>>>  (vld4q_dup_bf16): New.
>>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>>  (VAR13): New.
>>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>>  * config/arm/arm-modes.def (V2BF): New mode.
>>>  * config/arm/arm-simd-builtin-types.def
>>>  (Bfloat16x2_t): New entry.
>>>  * config/arm/arm_neon_builtins.def
>>>  (vld2): Changed to VAR13 and added v4bf, v8bf
>>>  (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>>>  (vld3): Changed to VAR13 and added v4bf, v8bf
>>>  (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>>>  (vld4): Changed to VAR13 and added v4bf, v8bf
>>>  (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>>>  * config/arm/iterators.md (VDXBF): New iterator.
>>>  (VQ2BF): New iterator.
>>>  (V_elem): Added V4BF, V8BF.
>>>  (V_sz_elem): Added V4BF, V8BF.
>>>  (V_mode_nunits): Added V4BF, V8BF.
>>>  (q): Added V4BF, V8BF.
>>>  *config/arm/neon.md (vld2): Used new iterators.
>>>  (vld2_dup): Used new ite

Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

2020-03-04 Thread Kyrill Tkachov

Hi Delia,

On 3/4/20 2:05 PM, Delia Burduv wrote:

Hi,

The previous version of this patch shared part of its code with the
store intrinsics patch
(https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
any duplicated code. This patch now depends on the previously mentioned
store intrinsics patch.

Here is the latest version and the updated ChangeLog.

gcc/ChangeLog:

2019-03-04  Delia Burduv  

    * config/arm/arm_neon.h (bfloat16_t): New typedef.
 (vld2_bf16): New.
    (vld2q_bf16): New.
    (vld3_bf16): New.
    (vld3q_bf16): New.
    (vld4_bf16): New.
    (vld4q_bf16): New.
    (vld2_dup_bf16): New.
    (vld2q_dup_bf16): New.
 (vld3_dup_bf16): New.
    (vld3q_dup_bf16): New.
    (vld4_dup_bf16): New.
    (vld4q_dup_bf16): New.
 * config/arm/arm_neon_builtins.def
 (vld2): Changed to VAR13 and added v4bf, v8bf
 (vld2_dup): Changed to VAR8 and added v4bf, v8bf
 (vld3): Changed to VAR13 and added v4bf, v8bf
 (vld3_dup): Changed to VAR8 and added v4bf, v8bf
 (vld4): Changed to VAR13 and added v4bf, v8bf
 (vld4_dup): Changed to VAR8 and added v4bf, v8bf
 * config/arm/iterators.md (VDXBF): New iterator.
 (VQ2BF): New iterator.
 *config/arm/neon.md (vld2): Used new iterators.
 (vld2_dup): Used new iterators.
 (vld2_dupv8bf): New.
 (vst3): Used new iterators.
 (vst3qa): Used new iterators.
 (vst3qb): Used new iterators.
 (vld3_dup): Used new iterators.
 (vld3_dupv8bf): New.
 (vst4): Used new iterators.
 (vst4qa): Used new iterators.
 (vst4qb): Used new iterators.
 (vld4_dup): Used new iterators.
 (vld4_dupv8bf): New.


gcc/testsuite/ChangeLog:

2019-03-04  Delia Burduv  

    * gcc.target/arm/simd/bf16_vldn_1.c: New test.

Thanks,
Delia

On 2/19/20 5:25 PM, Delia Burduv wrote:
>
> Hi,
>
> Here is the latest version of the patch. It just has some minor
> formatting changes that were brought up by Richard Sandiford in the
> AArch64 patches
>
> Thanks,
> Delia
>
> On 1/22/20 5:31 PM, Delia Burduv wrote:
>> Ping.
>>
>> I will change the tests to use the exact input and output registers as
>> Richard Sandiford suggested for the AArch64 patches.
>>
>> On 12/20/19 6:48 PM, Delia Burduv wrote:
>>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics
>>> vld{q}_bf16 as part of the BFloat16 extension.
>>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


>>>
>>> The intrinsics are declared in arm_neon.h .
>>> A new test is added to check assembler output.
>>>
>>> This patch depends on the Arm back-end patche.
>>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>>
>>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>>> have commit rights, so if this is ok can someone please commit it for
>>> me?
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-11-14  Delia Burduv 
>>>
>>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>>  (bfloat16x4x2_t): New typedef.
>>>  (bfloat16x8x2_t): New typedef.
>>>  (bfloat16x4x3_t): New typedef.
>>>  (bfloat16x8x3_t): New typedef.
>>>  (bfloat16x4x4_t): New typedef.
>>>  (bfloat16x8x4_t): New typedef.
>>>  (vld2_bf16): New.
>>>  (vld2q_bf16): New.
>>>  (vld3_bf16): New.
>>>  (vld3q_bf16): New.
>>>  (vld4_bf16): New.
>>>  (vld4q_bf16): New.
>>>  (vld2_dup_bf16): New.
>>>  (vld2q_dup_bf16): New.
>>>   (vld3_dup_bf16): New.
>>>  (vld3q_dup_bf16): New.
>>>  (vld4_dup_bf16): New.
>>>  (vld4q_dup_bf16): New.
>>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>>  (VAR13): New.
>>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>>  * config/arm/arm-modes.def (V2BF): New mode.
>>>  * config/arm/arm-simd-builtin-types.def
>>>  (Bfloat16x2_t): New entry.
>>>  * config/arm/arm_neon_builtins.def
>>>  (vld2): Changed to VAR13 and added v4bf, v8bf
>>>  (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>>>  (vld3): Changed to VAR13 and added v4bf, v8bf
>>>  (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>>>  (vld4): Changed to VAR13 and added v4bf, v8bf
>>>  (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>>>  * config/arm/iterators.md (VDXBF): New iterator.
>>>  (VQ2BF): New iterator.
>>>  (V_elem): Added V4BF, V8BF.
>>>  (V_sz_elem): Added V4BF, V8BF.
>>>  (V_mode_nunits): Added V4BF, V8BF.
>>>  (q): Added V4BF, V8BF.
>>>  *config/arm/neon.md (vld2): Used new iterators.
>>>  (vld2_dup): Used new iterators.
>>>  (vld2_dupv8bf): New.
>>>  (vst3): Used new iterators.
>>>  (vst3qa): Used new iterators.
>>>  (vst3qb): Used new iterators.
>>>  (vld3_dup): Used new iterators.
>>>    

Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

2020-03-04 Thread Delia Burduv

Hi,

The previous version of this patch shared part of its code with the 
store intrinsics patch 
(https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed 
any duplicated code. This patch now depends on the previously mentioned 
store intrinsics patch.


Here is the latest version and the updated ChangeLog.

gcc/ChangeLog:

2019-03-04  Delia Burduv  

* config/arm/arm_neon.h (bfloat16_t): New typedef.
(vld2_bf16): New.
(vld2q_bf16): New.
(vld3_bf16): New.
(vld3q_bf16): New.
(vld4_bf16): New.
(vld4q_bf16): New.
(vld2_dup_bf16): New.
(vld2q_dup_bf16): New.
(vld3_dup_bf16): New.
(vld3q_dup_bf16): New.
(vld4_dup_bf16): New.
(vld4q_dup_bf16): New.
* config/arm/arm_neon_builtins.def
(vld2): Changed to VAR13 and added v4bf, v8bf
(vld2_dup): Changed to VAR8 and added v4bf, v8bf
(vld3): Changed to VAR13 and added v4bf, v8bf
(vld3_dup): Changed to VAR8 and added v4bf, v8bf
(vld4): Changed to VAR13 and added v4bf, v8bf
(vld4_dup): Changed to VAR8 and added v4bf, v8bf
* config/arm/iterators.md (VDXBF): New iterator.
(VQ2BF): New iterator.
*config/arm/neon.md (vld2): Used new iterators.
(vld2_dup): Used new iterators.
(vld2_dupv8bf): New.
(vst3): Used new iterators.
(vst3qa): Used new iterators.
(vst3qb): Used new iterators.
(vld3_dup): Used new iterators.
(vld3_dupv8bf): New.
(vst4): Used new iterators.
(vst4qa): Used new iterators.
(vst4qb): Used new iterators.
(vld4_dup): Used new iterators.
(vld4_dupv8bf): New.


gcc/testsuite/ChangeLog:

2019-03-04  Delia Burduv  

* gcc.target/arm/simd/bf16_vldn_1.c: New test.

Thanks,
Delia

On 2/19/20 5:25 PM, Delia Burduv wrote:


Hi,

Here is the latest version of the patch. It just has some minor 
formatting changes that were brought up by Richard Sandiford in the 
AArch64 patches


Thanks,
Delia

On 1/22/20 5:31 PM, Delia Burduv wrote:

Ping.

I will change the tests to use the exact input and output registers as 
Richard Sandiford suggested for the AArch64 patches.


On 12/20/19 6:48 PM, Delia Burduv wrote:
This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics 
vld{q}_bf16 as part of the BFloat16 extension.
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


The intrinsics are declared in arm_neon.h .
A new test is added to check assembler output.

This patch depends on the Arm back-end patche. 
(https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)


Tested for regression on arm-none-eabi and armeb-none-eabi. I don't 
have commit rights, so if this is ok can someone please commit it for 
me?


gcc/ChangeLog:

2019-11-14  Delia Burduv  

 * config/arm/arm_neon.h (bfloat16_t): New typedef.
 (bfloat16x4x2_t): New typedef.
 (bfloat16x8x2_t): New typedef.
 (bfloat16x4x3_t): New typedef.
 (bfloat16x8x3_t): New typedef.
 (bfloat16x4x4_t): New typedef.
 (bfloat16x8x4_t): New typedef.
 (vld2_bf16): New.
 (vld2q_bf16): New.
 (vld3_bf16): New.
 (vld3q_bf16): New.
 (vld4_bf16): New.
 (vld4q_bf16): New.
 (vld2_dup_bf16): New.
 (vld2q_dup_bf16): New.
  (vld3_dup_bf16): New.
 (vld3q_dup_bf16): New.
 (vld4_dup_bf16): New.
 (vld4q_dup_bf16): New.
 * config/arm/arm-builtins.c (E_V2BFmode): New mode.
 (VAR13): New.
 (arm_simd_types[Bfloat16x2_t]):New type.
 * config/arm/arm-modes.def (V2BF): New mode.
 * config/arm/arm-simd-builtin-types.def
 (Bfloat16x2_t): New entry.
 * config/arm/arm_neon_builtins.def
 (vld2): Changed to VAR13 and added v4bf, v8bf
 (vld2_dup): Changed to VAR8 and added v4bf, v8bf
 (vld3): Changed to VAR13 and added v4bf, v8bf
 (vld3_dup): Changed to VAR8 and added v4bf, v8bf
 (vld4): Changed to VAR13 and added v4bf, v8bf
 (vld4_dup): Changed to VAR8 and added v4bf, v8bf
 * config/arm/iterators.md (VDXBF): New iterator.
 (VQ2BF): New iterator.
 (V_elem): Added V4BF, V8BF.
 (V_sz_elem): Added V4BF, V8BF.
 (V_mode_nunits): Added V4BF, V8BF.
 (q): Added V4BF, V8BF.
 *config/arm/neon.md (vld2): Used new iterators.
 (vld2_dup): Used new iterators.
 (vld2_dupv8bf): New.
 (vst3): Used new iterators.
 (vst3qa): Used new iterators.
 (vst3qb): Used new iterators.
 (vld3_dup): Used new iterators.
 (vld3_dupv8bf): New.
 (vst4): Used new iterators.
 (vst4qa): Used new iterators.
 (vst4qb): Used new iterators.
 (vld4_dup): Used new iterators.
 (vld4_dupv8bf): New.


gcc/testsuite/ChangeLog:

2019-11-14  Delia Burduv  

 * gcc.target/arm/simd/bf16_vldn_1.c: New test.
diff --git a/gcc/conf

Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

2020-02-19 Thread Delia Burduv


Hi,

Here is the latest version of the patch. It just has some minor 
formatting changes that were brought up by Richard Sandiford in the 
AArch64 patches


Thanks,
Delia

On 1/22/20 5:31 PM, Delia Burduv wrote:

Ping.

I will change the tests to use the exact input and output registers as 
Richard Sandiford suggested for the AArch64 patches.


On 12/20/19 6:48 PM, Delia Burduv wrote:
This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics 
vld{q}_bf16 as part of the BFloat16 extension.
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


The intrinsics are declared in arm_neon.h .
A new test is added to check assembler output.

This patch depends on the Arm back-end patche. 
(https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)


Tested for regression on arm-none-eabi and armeb-none-eabi. I don't 
have commit rights, so if this is ok can someone please commit it for me?


gcc/ChangeLog:

2019-11-14  Delia Burduv  

 * config/arm/arm_neon.h (bfloat16_t): New typedef.
 (bfloat16x4x2_t): New typedef.
 (bfloat16x8x2_t): New typedef.
 (bfloat16x4x3_t): New typedef.
 (bfloat16x8x3_t): New typedef.
 (bfloat16x4x4_t): New typedef.
 (bfloat16x8x4_t): New typedef.
 (vld2_bf16): New.
 (vld2q_bf16): New.
 (vld3_bf16): New.
 (vld3q_bf16): New.
 (vld4_bf16): New.
 (vld4q_bf16): New.
 (vld2_dup_bf16): New.
 (vld2q_dup_bf16): New.
  (vld3_dup_bf16): New.
 (vld3q_dup_bf16): New.
 (vld4_dup_bf16): New.
 (vld4q_dup_bf16): New.
 * config/arm/arm-builtins.c (E_V2BFmode): New mode.
 (VAR13): New.
 (arm_simd_types[Bfloat16x2_t]):New type.
 * config/arm/arm-modes.def (V2BF): New mode.
 * config/arm/arm-simd-builtin-types.def
 (Bfloat16x2_t): New entry.
 * config/arm/arm_neon_builtins.def
 (vld2): Changed to VAR13 and added v4bf, v8bf
 (vld2_dup): Changed to VAR8 and added v4bf, v8bf
 (vld3): Changed to VAR13 and added v4bf, v8bf
 (vld3_dup): Changed to VAR8 and added v4bf, v8bf
 (vld4): Changed to VAR13 and added v4bf, v8bf
 (vld4_dup): Changed to VAR8 and added v4bf, v8bf
 * config/arm/iterators.md (VDXBF): New iterator.
 (VQ2BF): New iterator.
 (V_elem): Added V4BF, V8BF.
 (V_sz_elem): Added V4BF, V8BF.
 (V_mode_nunits): Added V4BF, V8BF.
 (q): Added V4BF, V8BF.
 *config/arm/neon.md (vld2): Used new iterators.
 (vld2_dup): Used new iterators.
 (vld2_dupv8bf): New.
 (vst3): Used new iterators.
 (vst3qa): Used new iterators.
 (vst3qb): Used new iterators.
 (vld3_dup): Used new iterators.
 (vld3_dupv8bf): New.
 (vst4): Used new iterators.
 (vst4qa): Used new iterators.
 (vst4qb): Used new iterators.
 (vld4_dup): Used new iterators.
 (vld4_dupv8bf): New.


gcc/testsuite/ChangeLog:

2019-11-14  Delia Burduv  

 * gcc.target/arm/simd/bf16_vldn_1.c: New test.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 7f279cca6688c6f11948159666ee647ae533c61d..44c6f46fd63d5eaa1c3c84340d9acd017bb663e4 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -318,6 +318,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define v4bf_UP  E_V4BFmode
 #define v2si_UP  E_V2SImode
 #define v2sf_UP  E_V2SFmode
+#define v2bf_UP  E_V2BFmode
 #define di_UPE_DImode
 #define v16qi_UP E_V16QImode
 #define v8hi_UP  E_V8HImode
@@ -381,6 +382,9 @@ typedef struct {
 #define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
+#define VAR13(T, N, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  VAR12 (T, N, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR1 (T, N, M)
 
 /* The builtin data can be found in arm_neon_builtins.def, arm_vfp_builtins.def
and arm_acle_builtins.def.  The entries in arm_neon_builtins.def require
@@ -1013,6 +1017,7 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
   /* Init Bfloat vector types with underlying __bf16 scalar type.  */
+  arm_simd_types[Bfloat16x2_t].eltype = arm_bf16_type_node;
   arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
   arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
 
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index ea92ef35723f979c8bb1f6bfb4fbeb6cd1e4b6e9..6e48223b63d98fcbe38960700dd0949d74629f7f 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -80,6 +80,7 @@ VECTOR_MODE (FLOAT, HF, 2);   /* V2HF */
 
 FLOAT_MODE (BF, 2, 0);
 ADJUST_FLOAT_FORMAT (BF, &arm_bfloat_half_format);
+VECTOR_MODE (FLOAT, BF, 2);   /* V2BF.  */
 VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
 VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
 
diff --git a/gcc/config

Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

2020-01-28 Thread Delia Burduv
Ping.

From: Delia Burduv 
Sent: 22 January 2020 17:31
To: gcc-patches@gcc.gnu.org 
Cc: ni...@redhat.com ; Richard Earnshaw 
; Kyrylo Tkachov ; Ramana 
Radhakrishnan 
Subject: Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

Ping.

I will change the tests to use the exact input and output registers as
Richard Sandiford suggested for the AArch64 patches.

On 12/20/19 6:48 PM, Delia Burduv wrote:
> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics vld{q}_bf16
> as part of the BFloat16 extension.
> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
>
> The intrinsics are declared in arm_neon.h .
> A new test is added to check assembler output.
>
> This patch depends on the Arm back-end patche.
> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>
> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have
> commit rights, so if this is ok can someone please commit it for me?
>
> gcc/ChangeLog:
>
> 2019-11-14  Delia Burduv  
>
>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>  (bfloat16x4x2_t): New typedef.
>  (bfloat16x8x2_t): New typedef.
>  (bfloat16x4x3_t): New typedef.
>  (bfloat16x8x3_t): New typedef.
>  (bfloat16x4x4_t): New typedef.
>  (bfloat16x8x4_t): New typedef.
>  (vld2_bf16): New.
>  (vld2q_bf16): New.
>  (vld3_bf16): New.
>  (vld3q_bf16): New.
>  (vld4_bf16): New.
>  (vld4q_bf16): New.
>  (vld2_dup_bf16): New.
>  (vld2q_dup_bf16): New.
>   (vld3_dup_bf16): New.
>  (vld3q_dup_bf16): New.
>  (vld4_dup_bf16): New.
>  (vld4q_dup_bf16): New.
>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>  (VAR13): New.
>  (arm_simd_types[Bfloat16x2_t]):New type.
>  * config/arm/arm-modes.def (V2BF): New mode.
>  * config/arm/arm-simd-builtin-types.def
>  (Bfloat16x2_t): New entry.
>  * config/arm/arm_neon_builtins.def
>  (vld2): Changed to VAR13 and added v4bf, v8bf
>  (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>  (vld3): Changed to VAR13 and added v4bf, v8bf
>  (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>  (vld4): Changed to VAR13 and added v4bf, v8bf
>  (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>  * config/arm/iterators.md (VDXBF): New iterator.
>  (VQ2BF): New iterator.
>  (V_elem): Added V4BF, V8BF.
>  (V_sz_elem): Added V4BF, V8BF.
>  (V_mode_nunits): Added V4BF, V8BF.
>  (q): Added V4BF, V8BF.
>  *config/arm/neon.md (vld2): Used new iterators.
>  (vld2_dup): Used new iterators.
>  (vld2_dupv8bf): New.
>  (vst3): Used new iterators.
>  (vst3qa): Used new iterators.
>  (vst3qb): Used new iterators.
>  (vld3_dup): Used new iterators.
>  (vld3_dupv8bf): New.
>  (vst4): Used new iterators.
>  (vst4qa): Used new iterators.
>  (vst4qb): Used new iterators.
>  (vld4_dup): Used new iterators.
>  (vld4_dupv8bf): New.
>
>
> gcc/testsuite/ChangeLog:
>
> 2019-11-14  Delia Burduv  
>
>  * gcc.target/arm/simd/bf16_vldn_1.c: New test.


Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

2020-01-22 Thread Delia Burduv
Ping.

I will change the tests to use the exact input and output registers as 
Richard Sandiford suggested for the AArch64 patches.

On 12/20/19 6:48 PM, Delia Burduv wrote:
> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics vld{q}_bf16 
> as part of the BFloat16 extension.
> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
>  
> 
> The intrinsics are declared in arm_neon.h .
> A new test is added to check assembler output.
> 
> This patch depends on the Arm back-end patche. 
> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
> 
> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have 
> commit rights, so if this is ok can someone please commit it for me?
> 
> gcc/ChangeLog:
> 
> 2019-11-14  Delia Burduv  
> 
>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>      (bfloat16x4x2_t): New typedef.
>      (bfloat16x8x2_t): New typedef.
>      (bfloat16x4x3_t): New typedef.
>      (bfloat16x8x3_t): New typedef.
>      (bfloat16x4x4_t): New typedef.
>      (bfloat16x8x4_t): New typedef.
>      (vld2_bf16): New.
>  (vld2q_bf16): New.
>  (vld3_bf16): New.
>  (vld3q_bf16): New.
>  (vld4_bf16): New.
>  (vld4q_bf16): New.
>  (vld2_dup_bf16): New.
>  (vld2q_dup_bf16): New.
>   (vld3_dup_bf16): New.
>  (vld3q_dup_bf16): New.
>  (vld4_dup_bf16): New.
>  (vld4q_dup_bf16): New.
>      * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>      (VAR13): New.
>      (arm_simd_types[Bfloat16x2_t]):New type.
>      * config/arm/arm-modes.def (V2BF): New mode.
>      * config/arm/arm-simd-builtin-types.def
>      (Bfloat16x2_t): New entry.
>      * config/arm/arm_neon_builtins.def
>      (vld2): Changed to VAR13 and added v4bf, v8bf
>      (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>      (vld3): Changed to VAR13 and added v4bf, v8bf
>      (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>      (vld4): Changed to VAR13 and added v4bf, v8bf
>      (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>      * config/arm/iterators.md (VDXBF): New iterator.
>      (VQ2BF): New iterator.
>      (V_elem): Added V4BF, V8BF.
>      (V_sz_elem): Added V4BF, V8BF.
>      (V_mode_nunits): Added V4BF, V8BF.
>      (q): Added V4BF, V8BF.
>      *config/arm/neon.md (vld2): Used new iterators.
>      (vld2_dup): Used new iterators.
>      (vld2_dupv8bf): New.
>      (vst3): Used new iterators.
>      (vst3qa): Used new iterators.
>      (vst3qb): Used new iterators.
>      (vld3_dup): Used new iterators.
>      (vld3_dupv8bf): New.
>      (vst4): Used new iterators.
>      (vst4qa): Used new iterators.
>      (vst4qb): Used new iterators.
>      (vld4_dup): Used new iterators.
>      (vld4_dupv8bf): New.
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-11-14  Delia Burduv  
> 
>  * gcc.target/arm/simd/bf16_vldn_1.c: New test.