Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-03 Thread James Greenhalgh
On Fri, Feb 03, 2017 at 06:57:54AM +, Hurugalawadi, Naveen wrote:
> Hi Andrew,
> 
> Thanks for clearing the confusion.
> 
> > I don't understand this comment and how it relates to your updated patch
> 
> foo, foo1 and foo2 generates calls to "popcountdi2" which should have
> been "popcountsi2" for foo1. When Kyrill commented on using the
> popcountsi2; I was confused :).
> 
> Hence, the testcase generally checks for the absence of call to "popcount"
> and the presence of "cnt" instruction instead.
> 
> >> Now of course what should change still is the argument 
> >> types to foo1/foo2 
> 
> The arguments to foo1 and foo2 are modified as required.
> 
> Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.
> 
> Please let us know if its okay for stage 1?

This looks OK to me now. The change is self contained, looks safe, and
has only been held up for the testcase issue.

But, please give Richard/Marcus a reasonable time to object (say, end of day
Tuesday) before committing.

Thanks,
James

> 
> Thanks,
> Naveen

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index a693a3b..684a833 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3778,6 +3778,39 @@
>}
>  )
>  
> +;; Pop count be done via the "CNT" instruction in AdvSIMD.
> +;;
> +;; MOV   v.1d, x0
> +;; CNT   v1.8b, v.8b
> +;; ADDV b2, v1.8b
> +;; MOV   w0, v2.b[0]
> +
> +(define_expand "popcount2"
> +  [(match_operand:GPI 0 "register_operand")
> +   (match_operand:GPI 1 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx v = gen_reg_rtx (V8QImode);
> +  rtx v1 = gen_reg_rtx (V8QImode);
> +  rtx r = gen_reg_rtx (QImode);
> +  rtx in = operands[1];
> +  rtx out = operands[0];
> +  if(mode == SImode)
> +{
> +  rtx tmp;
> +  tmp = gen_reg_rtx (DImode);
> +  /* If we have SImode, zero extend to DImode, pop count does
> + not change if we have extra zeros. */
> +  emit_insn (gen_zero_extendsidi2 (tmp, in));
> +  in = tmp;
> +}
> +  emit_move_insn (v, gen_lowpart (V8QImode, in));
> +  emit_insn (gen_popcountv8qi2 (v1, v));
> +  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
> +  emit_insn (gen_zero_extendqi2 (out, r));
> +  DONE;
> +})
> +
>  (define_insn "clrsb2"
>[(set (match_operand:GPI 0 "register_operand" "=r")
>  (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> new file mode 100644
> index 000..7e95796
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +foo (int x)
> +{
> +  return __builtin_popcount (x);
> +}
> +
> +long
> +foo1 (long x)
> +{
> +  return __builtin_popcountl (x);
> +}
> +
> +long long
> +foo2 (long long x)
> +{
> +  return __builtin_popcountll (x);
> +}
> +
> +/* { dg-final { scan-assembler-not "popcount" } } */
> +/* { dg-final { scan-assembler-times "cnt\t" 3 } } */




Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-03 Thread Kyrill Tkachov


On 03/02/17 06:57, Hurugalawadi, Naveen wrote:

Hi Andrew,

Thanks for clearing the confusion.


I don't understand this comment and how it relates to your updated patch

foo, foo1 and foo2 generates calls to "popcountdi2" which should have
been "popcountsi2" for foo1. When Kyrill commented on using the
popcountsi2; I was confused :).

Hence, the testcase generally checks for the absence of call to "popcount"
and the presence of "cnt" instruction instead.



Sorry for the confusion, I should have been more explicit that I was talking
about the optab.


Now of course what should change still is the argument
types to foo1/foo2

The arguments to foo1 and foo2 are modified as required.

Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.

Please let us know if its okay for stage 1?


This looks good to me, but you'll need James (or another aarch64 maintainer) to 
approve.

Thanks,
Kyrill


Thanks,
Naveen




Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-02 Thread Hurugalawadi, Naveen
Hi Andrew,

Thanks for clearing the confusion.

> I don't understand this comment and how it relates to your updated patch

foo, foo1 and foo2 generates calls to "popcountdi2" which should have
been "popcountsi2" for foo1. When Kyrill commented on using the
popcountsi2; I was confused :).

Hence, the testcase generally checks for the absence of call to "popcount"
and the presence of "cnt" instruction instead.

>> Now of course what should change still is the argument 
>> types to foo1/foo2 

The arguments to foo1 and foo2 are modified as required.

Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.

Please let us know if its okay for stage 1?

Thanks,
Naveendiff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a693a3b..684a833 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3778,6 +3778,39 @@
   }
 )
 
+;; Pop count be done via the "CNT" instruction in AdvSIMD.
+;;
+;; MOV	v.1d, x0
+;; CNT	v1.8b, v.8b
+;; ADDV b2, v1.8b
+;; MOV	w0, v2.b[0]
+
+(define_expand "popcount2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(mode == SImode)
+{
+  rtx tmp;
+  tmp = gen_reg_rtx (DImode);
+  /* If we have SImode, zero extend to DImode, pop count does
+ not change if we have extra zeros. */
+  emit_insn (gen_zero_extendsidi2 (tmp, in));
+  in = tmp;
+}
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi2 (out, r));
+  DONE;
+})
+
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c b/gcc/testsuite/gcc.target/aarch64/popcnt.c
new file mode 100644
index 000..7e95796
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (int x)
+{
+  return __builtin_popcount (x);
+}
+
+long
+foo1 (long x)
+{
+  return __builtin_popcountl (x);
+}
+
+long long
+foo2 (long long x)
+{
+  return __builtin_popcountll (x);
+}
+
+/* { dg-final { scan-assembler-not "popcount" } } */
+/* { dg-final { scan-assembler-times "cnt\t" 3 } } */


Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-02 Thread Andrew Pinski
On Thu, Feb 2, 2017 at 3:55 AM, James Greenhalgh
 wrote:
> On Thu, Feb 02, 2017 at 04:03:42AM +, Hurugalawadi, Naveen wrote:
>> Hi James and Kyrill,
>>
>> Thanks for the review and comments on the patch.
>>
>> >> On ILP32 systems this would still use the SImode patterns,
>> >> so I suggest you use __builtin_popcountll and
>> >> an unsigned long long return type to ensure you always exercise the 
>> >> 64-bit code.
>>
>> Sorry for not commenting on this part.
>> The issue is that code generates "__popcountdi2" for all the codes in 
>> testcase
>> for LP64 and ILP32 variants.
>> __builtin_popcount, __builtin_popcountl and __builtin_popcount.
>>
>> Hence, modified the patch to check for "popcount".
>
> I don't understand this comment and how it relates to your updated
> patch (which still checks for CNT).

Now I understand Kyrill's comments as Naveen and I did not understand
before.  What Naveen was trying to test was that there was no longer
any call to __popcountdi2 or __popcountsi2 any more for all three:
__builtin_popcount, __builtin_popcountl and __builtin_popcountll.
Kyrill was asking him to change the test for __builtin_popcountl to
__builtin_popcountll even though there was already one test in that
file already.  Now of course what should change still is the argument
types to foo1/foo2 so they take long and long long respectively.  He
will post a new version of the patch soon.

>
>> Bootstrapped and regression tested on AArch64-Thunderx-Linux machine.
>>
>> Please find attached the modified patch and let us know if its okay?
>
> Could you please clarify what you meant above? The patch looks fine (though
> possibly not for Stage 4), but I'm not sure of your intent.

What he meant was he tested on aarch64-linux-gnu with no regressions.
Is it ok for when stage 1 opens with the change of the testcase?

Thanks,
Andrew

>
> Thanks,
> James


Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-02 Thread James Greenhalgh
On Thu, Feb 02, 2017 at 04:03:42AM +, Hurugalawadi, Naveen wrote:
> Hi James and Kyrill,
> 
> Thanks for the review and comments on the patch.
> 
> >> On ILP32 systems this would still use the SImode patterns, 
> >> so I suggest you use __builtin_popcountll and
> >> an unsigned long long return type to ensure you always exercise the 64-bit 
> >> code.
> 
> Sorry for not commenting on this part.
> The issue is that code generates "__popcountdi2" for all the codes in testcase
> for LP64 and ILP32 variants.
> __builtin_popcount, __builtin_popcountl and __builtin_popcount.
> 
> Hence, modified the patch to check for "popcount".

I don't understand this comment and how it relates to your updated
patch (which still checks for CNT).

> Bootstrapped and regression tested on AArch64-Thunderx-Linux machine.
> 
> Please find attached the modified patch and let us know if its okay?

Could you please clarify what you meant above? The patch looks fine (though
possibly not for Stage 4), but I'm not sure of your intent.

Thanks,
James


Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-01 Thread Hurugalawadi, Naveen
Hi James and Kyrill,

Thanks for the review and comments on the patch.

>> On ILP32 systems this would still use the SImode patterns, 
>> so I suggest you use __builtin_popcountll and
>> an unsigned long long return type to ensure you always exercise the 64-bit 
>> code.

Sorry for not commenting on this part.
The issue is that code generates "__popcountdi2" for all the codes in testcase
for LP64 and ILP32 variants.
__builtin_popcount, __builtin_popcountl and __builtin_popcount.

Hence, modified the patch to check for "popcount".

Bootstrapped and regression tested on AArch64-Thunderx-Linux machine.

Please find attached the modified patch and let us know if its okay?

Thanks,
Naveen



popcount-2.patch
Description: popcount-2.patch


Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-01 Thread James Greenhalgh
On Tue, Dec 13, 2016 at 11:59:36AM +, Kyrill Tkachov wrote:
> Hi Naveen,
> 
> On 13/12/16 11:51, Hurugalawadi, Naveen wrote:
> >Hi Kyrill,
> >
> >Thanks for reviewing the patch and your useful comments.
> >
> >>>looks good to me if it has gone through the normal required
> >>>bootstrap and testing, but I can't approve.
> >Bootstrapped and Regression Tested on aarch64-thunderx-linux.
> >
> >>>The rest of the MD file uses the term AdvSIMD. Also, the instrurction
> >>>is CNT rather than "pop count".
> >Done.
> >
> >>>__builtin_popcount takes an unsigned int, so this should be
> >>>scanning for absence of popcountsi2 instead?
> >Done.
> >
> >Please find attached the modified patch as per review comments
> >and let me know if its okay for Stage-1 or current branch.
> 
> This looks much better, thanks.
> I still have a minor nit about the testcase.
> 
> +long
> +foo1 (int x)
> +{
> +  return __builtin_popcountl (x);
> +}
> 
> On ILP32 systems this would still use the SImode patterns, so I suggest you 
> use __builtin_popcountll and
> an unsigned long long return type to ensure you always exercise the 64-bit 
> code.
> 
> +
> +/* { dg-final { scan-assembler-not "popcount" } } */
> 
> 
> This looks ok to me otherwise, but you'll need an ok from the aarch64 folk.

I didn't see a follow-up patch posted with Kyrill's comments addressed?

Thanks,
James



Re: [PATCH] [AArch64] Implement popcount pattern

2016-12-13 Thread Kyrill Tkachov

Hi Naveen,

On 13/12/16 11:51, Hurugalawadi, Naveen wrote:

Hi Kyrill,

Thanks for reviewing the patch and your useful comments.


looks good to me if it has gone through the normal required
bootstrap and testing, but I can't approve.

Bootstrapped and Regression Tested on aarch64-thunderx-linux.


The rest of the MD file uses the term AdvSIMD. Also, the instrurction
is CNT rather than "pop count".

Done.


__builtin_popcount takes an unsigned int, so this should be
scanning for absence of popcountsi2 instead?

Done.

Please find attached the modified patch as per review comments
and let me know if its okay for Stage-1 or current branch.


This looks much better, thanks.
I still have a minor nit about the testcase.

+long
+foo1 (int x)
+{
+  return __builtin_popcountl (x);
+}

On ILP32 systems this would still use the SImode patterns, so I suggest you use 
__builtin_popcountll and
an unsigned long long return type to ensure you always exercise the 64-bit code.

+
+/* { dg-final { scan-assembler-not "popcount" } } */


This looks ok to me otherwise, but you'll need an ok from the aarch64 folk.
Kyrill


Thanks,
Naveen




Re: [PATCH] [AArch64] Implement popcount pattern

2016-12-13 Thread Hurugalawadi, Naveen
Hi Kyrill,

Thanks for reviewing the patch and your useful comments.

>> looks good to me if it has gone through the normal required
>> bootstrap and testing, but I can't approve.

Bootstrapped and Regression Tested on aarch64-thunderx-linux.

>> The rest of the MD file uses the term AdvSIMD. Also, the instrurction
>> is CNT rather than "pop count".

Done.

>> __builtin_popcount takes an unsigned int, so this should be 
>> scanning for absence of popcountsi2 instead?

Done.

Please find attached the modified patch as per review comments
and let me know if its okay for Stage-1 or current branch.

Thanks,
Naveendiff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 65eb326..0acb3f0 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3785,6 +3785,39 @@
   }
 )
 
+;; Pop count be done via the "CNT" instruction in AdvSIMD.
+;;
+;; MOV	v.1d, x0
+;; CNT	v1.8b, v.8b
+;; ADDV b2, v1.8b
+;; MOV	w0, v2.b[0]
+
+(define_expand "popcount2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(mode == SImode)
+{
+  rtx tmp;
+  tmp = gen_reg_rtx (DImode);
+  /* If we have SImode, zero extend to DImode, pop count does
+ not change if we have extra zeros. */
+  emit_insn (gen_zero_extendsidi2 (tmp, in));
+  in = tmp;
+}
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi2 (out, r));
+  DONE;
+})
+
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c b/gcc/testsuite/gcc.target/aarch64/popcnt.c
new file mode 100644
index 000..37cf4b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (int x)
+{
+  return __builtin_popcount (x);
+}
+
+long
+foo1 (int x)
+{
+  return __builtin_popcountl (x);
+}
+
+/* { dg-final { scan-assembler-not "popcount" } } */
+/* { dg-final { scan-assembler-times "cnt\t" 2 } } */


Re: [PATCH] [AArch64] Implement popcount pattern

2016-12-12 Thread Kyrill Tkachov

Hi Naveen,

On 12/12/16 03:16, Hurugalawadi, Naveen wrote:

Hi,

Please find attached the patch that implements the support for popcount
patterns in AArch64.

The implementation improves OVS-DPDK on ThunderX by 3%. It would have a
similar effect on other AArch64 targets.

Please review the patch and let us know if its okay?


Besides a few comments below this looks good to me if it has gone through
the normal required bootstrap and testing, but I can't approve.
It's at the discretion of the target maintainers/reviewers if they want to
accept it at this stage.


2016-12-12  Andrew Pinski  

gcc
* config/aarch64/aarch64.md (popcount2): New pattern.

gcc/testsuite
* gcc.target/aarch64/popcount.c : New Testcase.


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 65eb326..c688ddc 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3785,6 +3785,39 @@
   }
 )
 
+/* Pop count be done via the pop count instruction in NEON. */


The rest of the MD file uses the term AdvSIMD. Also, the instrurction
is CNT rather than "pop count".

+/*
+  mov v.1d, x0
+  Cnt v1.8b, v.8b
+  Addv b2, v1.8b
+  Mov w0, v2.b[0]
+*/

Minor nit, but please use semicolons for MD file comments. The convention (at 
least in the aarch64 md files)
is to use ";;" at the beginning of the line. Also, please use uppercase letters 
for the assembly instructions
in the comment.

+(define_expand "popcount2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(mode == SImode)
+{
+  rtx tmp;
+  tmp = gen_reg_rtx (DImode);
+  /* If we have SImode, zero extend to DImode, pop count does
+ not change if we have extra zeros. */
+  emit_insn (gen_zero_extendsidi2 (tmp, in));
+  in = tmp;
+}
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi2 (out, r));
+  DONE;
+})
+
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcount.c 
b/gcc/testsuite/gcc.target/aarch64/popcount.c
new file mode 100644
index 000..2d71168
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcount.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo(int x)
+{
+  return __builtin_popcount(x);
+}
+
+/* { dg-final { scan-assembler-not "popcountdi2" } } */

__builtin_popcount takes an unsigned int, so this should be scanning for 
absence of popcountsi2 instead?
I suggest you also add a test for popcountdi2 using__builtin_popcountll.
Thanks,
Kyrill

+/* { dg-final { scan-assembler "cnt\t" } } */



[PATCH] [AArch64] Implement popcount pattern

2016-12-11 Thread Hurugalawadi, Naveen
Hi,

Please find attached the patch that implements the support for popcount
patterns in AArch64.

The implementation improves OVS-DPDK on ThunderX by 3%. It would have a
similar effect on other AArch64 targets.

Please review the patch and let us know if its okay?

2016-12-12  Andrew Pinski  

gcc
* config/aarch64/aarch64.md (popcount2): New pattern.

gcc/testsuite
* gcc.target/aarch64/popcount.c : New Testcase.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 65eb326..c688ddc 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3785,6 +3785,39 @@
   }
 )
 
+/* Pop count be done via the pop count instruction in NEON. */
+/*
+  mov v.1d, x0
+  Cnt v1.8b, v.8b
+  Addv b2, v1.8b
+  Mov w0, v2.b[0]
+*/
+(define_expand "popcount2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(mode == SImode)
+{
+  rtx tmp;
+  tmp = gen_reg_rtx (DImode);
+  /* If we have SImode, zero extend to DImode, pop count does
+ not change if we have extra zeros. */
+  emit_insn (gen_zero_extendsidi2 (tmp, in));
+  in = tmp;
+}
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi2 (out, r));
+  DONE;
+})
+
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcount.c b/gcc/testsuite/gcc.target/aarch64/popcount.c
new file mode 100644
index 000..2d71168
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcount.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo(int x)
+{
+  return __builtin_popcount(x);
+}
+
+/* { dg-final { scan-assembler-not "popcountdi2" } } */
+/* { dg-final { scan-assembler "cnt\t" } } */