On Thu, Sep 5, 2019 at 5:35 PM Dmitrij Pochepko
<dmitrij.poche...@bell-sw.com> wrote:
>
> This patch adds matching for Hamming weight (popcount) implementation. The 
> following sources:
>
> int
> foo64 (unsigned long long a)
> {
>     unsigned long long b = a;
>     b -= ((b>>1) & 0x5555555555555555ULL);
>     b = ((b>>2) & 0x3333333333333333ULL) + (b & 0x3333333333333333ULL);
>     b = ((b>>4) + b) & 0x0F0F0F0F0F0F0F0FULL;
>     b *= 0x0101010101010101ULL;
>     return (int)(b >> 56);
> }
>
> and
>
> int
> foo32 (unsigned int a)
> {
>     unsigned long b = a;
>     b -= ((b>>1) & 0x55555555UL);
>     b = ((b>>2) & 0x33333333UL) + (b & 0x33333333UL);
>     b = ((b>>4) + b) & 0x0F0F0F0FUL;
>     b *= 0x01010101UL;
>     return (int)(b >> 24);
> }
>
> and equivalents are now recognized as popcount for platforms with hw popcount 
> support. Bootstrapped and tested on x86_64-pc-linux-gnu and aarch64-linux-gnu 
> systems with no regressions.
>
> (I have no write access to repo)

+(simplify
+  (convert
+    (rshift
+      (mult

is the outer convert really necessary?  That is, if we change
the simplification result to

 (convert (BUILT_IN_POPCOUNT @0))

wouldn't that be correct as well?

Is the Hamming weight popcount
faster than the libgcc table-based approach?  I wonder if we really
need to restrict this conversion to the case where the target
has an expander.

+      (mult
+       (bit_and:c

this doesn't need :c (second operand is a constant).

+       int shift = TYPE_PRECISION (long_long_unsigned_type_node) - prec;
+       const unsigned long long c1 = 0x0101010101010101ULL >> shift,

I think this mixes host and target properties.  I guess intead of
'const unsigned long long' you want to use 'const uint64_t' and
instead of TYPE_PRECISION (long_long_unsigned_type_node) 64?
Since you are later comparing with unsigned HOST_WIDE_INT
eventually unsigned HOST_WIDE_INT is better (that's always 64bit as well).

You are using tree_to_uhwi but nowhere verifying if @0 is unsigned.
What happens if 'prec' is > 64?  (__int128 ...).  Ah, I guess the
final selection will simply select nothing...

Otherwise the patch looks reasonable, even if the pattern
is a bit unwieldly... ;)

Does it work for targets where 'unsigned int' is smaller than 32bit?

Thanks,
Richard.
>
> Thanks,
> Dmitrij
>
>
> gcc/ChangeLog:
>
>         PR tree-optimization/90836
>
>         * gcc/match.pd (popcount): New pattern.
>
> gcc/testsuite/ChangeLog:
>
>         PR tree-optimization/90836
>
>         * lib/target-supports.exp (check_effective_target_popcount)
>         (check_effective_target_popcountll): New effective targets.
>         * gcc.dg/tree-ssa/popcount4.c: New test.
>         * gcc.dg/tree-ssa/popcount4l.c: New test.
>         * gcc.dg/tree-ssa/popcount4ll.c: New test.

Reply via email to