Re: [AArch64] Fix vqtb[lx][234] on big-endian

2015-11-07 Thread James Greenhalgh
On Fri, Nov 06, 2015 at 09:37:17PM +0100, Christophe Lyon wrote:
> On 6 November 2015 at 18:03, James Greenhalgh  
> wrote:
> > On Fri, Nov 06, 2015 at 02:49:38PM +0100, Christophe Lyon wrote:
> >> Hi,
> >>
> >> As mentioned by James a few weeks ago, the vqtbl[lx][234] intrinsics
> >> are failing on aarch64_be.
> >>
> >> The attached patch fixes them, and rewrites them using new builtins
> >> instead of inline assembly.
> >>
> >> I wondered about the names of the new builtins, I hope I got them
> >> right: qtbl3, qtbl4, qtbx3, qtbx4 with v8qi and v16qi modes.
> >>
> >> I have modified the existing aarch64_tbl3v8qi and aarch64_tbx4v8qi to
> >> use  and share the code with the v16qi variants.
> >>
> >> In arm_neon.h, I moved the rewritten intrinsics to the bottom of the
> >> file, in alphabetical order, although the comment says "Start of
> >> optimal implementations in approved order": the previous ones really
> >> seem to be in alphabetical order.
> >>
> >> And I added a new testcase, skipped for arm* targets.
> >>
> >> This has been tested on aarch64-none-elf and aarch64_be-none-elf
> >> targets, using the Foundation model.
> >>
> >> OK?
> >
> > Hi Christophe,
> >
> > Thanks for this. With this patch I think we can finally say that
> > aarch64_be Neon intrinsics are in as good a state as aarch64 Neon
> > intrinsics. On our internal testsuite the pass rate is now equivalent
> > between the two. I'm very grateful for your work in this area!
> 
> Thanks for the quick review, committed as r229886.
> 
> We are still missing many tests for most of the armv8 intrinsics.
> A significant effort, apparently not worth it since you say your
> internal testsuite is now clean.

The internal testsuiite is of no use to the rest community and is
unlikely to be feasible to submit upstream, so I wouldn't write off
extending the (excellent) set of GCC tests you've been adding so far
as "not worth it". Certainly they were a big help for the big-endian
work.

> Actually, you say the pass rate is equivalent on little and
> big-endian: does it mean that it not 100%?

Yes, I picked my words carefully :-)

The remaining failures are missing intrinsics and conformance issues when
the intrinsics are combined and folded. For an idea of what is missing,
take a look at the LLVM test-suite I pointed you at a few weeks ago:


/SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c

I'll try to get some of the "folding" examples in to the upstream
bugzilla - generally they are issues where the semantics of the
intrinsic are well defined for signed overflow, but our use of C
constructs means the midend considers signed overflow undefined, and
performs more aggressive optimisation.

Thanks,
James

> >
> > This patch is OK for trunk.
> >
> > Thanks again,
> > James
> >
> >>
> >> Christophe.
> >
> >> 2015-11-06  Christophe Lyon  
> >>
> >>   gcc/testsuite/
> >>   * gcc.target/aarch64/advsimd-intrinsics/vqtbX.c: New test.
> >>
> >>   gcc/
> >>   * config/aarch64/aarch64-simd-builtins.def: Update builtins
> >>   tables: add tbl3v16qi, qtbl[34]*, tbx4v16qi, qtbx[34]*.
> >>   * config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): Rename to...
> >>   (aarch64_tbl3) ... this, which supports v16qi too.
> >>   (aarch64_tbx4v8qi): Rename to...
> >>   aarch64_tbx4): ... this.
> >>   (aarch64_qtbl3): New pattern.
> >>   (aarch64_qtbx3): New pattern.
> >>   (aarch64_qtbl4): New pattern.
> >>   (aarch64_qtbx4): New pattern.
> >>   * config/aarch64/arm_neon.h (vqtbl2_s8, vqtbl2_u8, vqtbl2_p8)
> >>   (vqtbl2q_s8, vqtbl2q_u8, vqtbl2q_p8, vqtbl3_s8, vqtbl3_u8)
> >>   (vqtbl3_p8, vqtbl3q_s8, vqtbl3q_u8, vqtbl3q_p8, vqtbl4_s8)
> >>   (vqtbl4_u8, vqtbl4_p8, vqtbl4q_s8, vqtbl4q_u8, vqtbl4q_p8)
> >>   (vqtbx2_s8, vqtbx2_u8, vqtbx2_p8, vqtbx2q_s8, vqtbx2q_u8)
> >>   (vqtbx2q_p8, vqtbx3_s8, vqtbx3_u8, vqtbx3_p8, vqtbx3q_s8)
> >>   (vqtbx3q_u8, vqtbx3q_p8, vqtbx4_s8, vqtbx4_u8, vqtbx4_p8)
> >>   (vqtbx4q_s8, vqtbx4q_u8, vqtbx4q_p8): Rewrite using builtin
> >>   functions.
> >
> 


Re: [AArch64] Fix vqtb[lx][234] on big-endian

2015-11-06 Thread Christophe Lyon
On 6 November 2015 at 18:03, James Greenhalgh  wrote:
> On Fri, Nov 06, 2015 at 02:49:38PM +0100, Christophe Lyon wrote:
>> Hi,
>>
>> As mentioned by James a few weeks ago, the vqtbl[lx][234] intrinsics
>> are failing on aarch64_be.
>>
>> The attached patch fixes them, and rewrites them using new builtins
>> instead of inline assembly.
>>
>> I wondered about the names of the new builtins, I hope I got them
>> right: qtbl3, qtbl4, qtbx3, qtbx4 with v8qi and v16qi modes.
>>
>> I have modified the existing aarch64_tbl3v8qi and aarch64_tbx4v8qi to
>> use  and share the code with the v16qi variants.
>>
>> In arm_neon.h, I moved the rewritten intrinsics to the bottom of the
>> file, in alphabetical order, although the comment says "Start of
>> optimal implementations in approved order": the previous ones really
>> seem to be in alphabetical order.
>>
>> And I added a new testcase, skipped for arm* targets.
>>
>> This has been tested on aarch64-none-elf and aarch64_be-none-elf
>> targets, using the Foundation model.
>>
>> OK?
>
> Hi Christophe,
>
> Thanks for this. With this patch I think we can finally say that
> aarch64_be Neon intrinsics are in as good a state as aarch64 Neon
> intrinsics. On our internal testsuite the pass rate is now equivalent
> between the two. I'm very grateful for your work in this area!

Thanks for the quick review, committed as r229886.

We are still missing many tests for most of the armv8 intrinsics.
A significant effort, apparently not worth it since you say your
internal testsuite is now clean.

Actually, you say the pass rate is equivalent on little and
big-endian: does it mean that it not 100%?


>
> This patch is OK for trunk.
>
> Thanks again,
> James
>
>>
>> Christophe.
>
>> 2015-11-06  Christophe Lyon  
>>
>>   gcc/testsuite/
>>   * gcc.target/aarch64/advsimd-intrinsics/vqtbX.c: New test.
>>
>>   gcc/
>>   * config/aarch64/aarch64-simd-builtins.def: Update builtins
>>   tables: add tbl3v16qi, qtbl[34]*, tbx4v16qi, qtbx[34]*.
>>   * config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): Rename to...
>>   (aarch64_tbl3) ... this, which supports v16qi too.
>>   (aarch64_tbx4v8qi): Rename to...
>>   aarch64_tbx4): ... this.
>>   (aarch64_qtbl3): New pattern.
>>   (aarch64_qtbx3): New pattern.
>>   (aarch64_qtbl4): New pattern.
>>   (aarch64_qtbx4): New pattern.
>>   * config/aarch64/arm_neon.h (vqtbl2_s8, vqtbl2_u8, vqtbl2_p8)
>>   (vqtbl2q_s8, vqtbl2q_u8, vqtbl2q_p8, vqtbl3_s8, vqtbl3_u8)
>>   (vqtbl3_p8, vqtbl3q_s8, vqtbl3q_u8, vqtbl3q_p8, vqtbl4_s8)
>>   (vqtbl4_u8, vqtbl4_p8, vqtbl4q_s8, vqtbl4q_u8, vqtbl4q_p8)
>>   (vqtbx2_s8, vqtbx2_u8, vqtbx2_p8, vqtbx2q_s8, vqtbx2q_u8)
>>   (vqtbx2q_p8, vqtbx3_s8, vqtbx3_u8, vqtbx3_p8, vqtbx3q_s8)
>>   (vqtbx3q_u8, vqtbx3q_p8, vqtbx4_s8, vqtbx4_u8, vqtbx4_p8)
>>   (vqtbx4q_s8, vqtbx4q_u8, vqtbx4q_p8): Rewrite using builtin
>>   functions.
>


Re: [AArch64] Fix vqtb[lx][234] on big-endian

2015-11-06 Thread James Greenhalgh
On Fri, Nov 06, 2015 at 02:49:38PM +0100, Christophe Lyon wrote:
> Hi,
> 
> As mentioned by James a few weeks ago, the vqtbl[lx][234] intrinsics
> are failing on aarch64_be.
> 
> The attached patch fixes them, and rewrites them using new builtins
> instead of inline assembly.
> 
> I wondered about the names of the new builtins, I hope I got them
> right: qtbl3, qtbl4, qtbx3, qtbx4 with v8qi and v16qi modes.
> 
> I have modified the existing aarch64_tbl3v8qi and aarch64_tbx4v8qi to
> use  and share the code with the v16qi variants.
> 
> In arm_neon.h, I moved the rewritten intrinsics to the bottom of the
> file, in alphabetical order, although the comment says "Start of
> optimal implementations in approved order": the previous ones really
> seem to be in alphabetical order.
> 
> And I added a new testcase, skipped for arm* targets.
> 
> This has been tested on aarch64-none-elf and aarch64_be-none-elf
> targets, using the Foundation model.
> 
> OK?

Hi Christophe,

Thanks for this. With this patch I think we can finally say that
aarch64_be Neon intrinsics are in as good a state as aarch64 Neon
intrinsics. On our internal testsuite the pass rate is now equivalent
between the two. I'm very grateful for your work in this area!

This patch is OK for trunk.

Thanks again,
James

> 
> Christophe.

> 2015-11-06  Christophe Lyon  
> 
>   gcc/testsuite/
>   * gcc.target/aarch64/advsimd-intrinsics/vqtbX.c: New test.
> 
>   gcc/
>   * config/aarch64/aarch64-simd-builtins.def: Update builtins
>   tables: add tbl3v16qi, qtbl[34]*, tbx4v16qi, qtbx[34]*.
>   * config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): Rename to...
>   (aarch64_tbl3) ... this, which supports v16qi too.
>   (aarch64_tbx4v8qi): Rename to...
>   aarch64_tbx4): ... this.
>   (aarch64_qtbl3): New pattern.
>   (aarch64_qtbx3): New pattern.
>   (aarch64_qtbl4): New pattern.
>   (aarch64_qtbx4): New pattern.
>   * config/aarch64/arm_neon.h (vqtbl2_s8, vqtbl2_u8, vqtbl2_p8)
>   (vqtbl2q_s8, vqtbl2q_u8, vqtbl2q_p8, vqtbl3_s8, vqtbl3_u8)
>   (vqtbl3_p8, vqtbl3q_s8, vqtbl3q_u8, vqtbl3q_p8, vqtbl4_s8)
>   (vqtbl4_u8, vqtbl4_p8, vqtbl4q_s8, vqtbl4q_u8, vqtbl4q_p8)
>   (vqtbx2_s8, vqtbx2_u8, vqtbx2_p8, vqtbx2q_s8, vqtbx2q_u8)
>   (vqtbx2q_p8, vqtbx3_s8, vqtbx3_u8, vqtbx3_p8, vqtbx3q_s8)
>   (vqtbx3q_u8, vqtbx3q_p8, vqtbx4_s8, vqtbx4_u8, vqtbx4_p8)
>   (vqtbx4q_s8, vqtbx4q_u8, vqtbx4q_p8): Rewrite using builtin
>   functions.