On Thu, Sep 5, 2019 at 5:35 PM Dmitrij Pochepko <dmitrij.poche...@bell-sw.com> wrote: > > This patch adds matching for Hamming weight (popcount) implementation. The > following sources: > > int > foo64 (unsigned long long a) > { > unsigned long long b = a; > b -= ((b>>1) & 0x5555555555555555ULL); > b = ((b>>2) & 0x3333333333333333ULL) + (b & 0x3333333333333333ULL); > b = ((b>>4) + b) & 0x0F0F0F0F0F0F0F0FULL; > b *= 0x0101010101010101ULL; > return (int)(b >> 56); > } > > and > > int > foo32 (unsigned int a) > { > unsigned long b = a; > b -= ((b>>1) & 0x55555555UL); > b = ((b>>2) & 0x33333333UL) + (b & 0x33333333UL); > b = ((b>>4) + b) & 0x0F0F0F0FUL; > b *= 0x01010101UL; > return (int)(b >> 24); > } > > and equivalents are now recognized as popcount for platforms with hw popcount > support. Bootstrapped and tested on x86_64-pc-linux-gnu and aarch64-linux-gnu > systems with no regressions. > > (I have no write access to repo)
+(simplify + (convert + (rshift + (mult is the outer convert really necessary? That is, if we change the simplification result to (convert (BUILT_IN_POPCOUNT @0)) wouldn't that be correct as well? Is the Hamming weight popcount faster than the libgcc table-based approach? I wonder if we really need to restrict this conversion to the case where the target has an expander. + (mult + (bit_and:c this doesn't need :c (second operand is a constant). + int shift = TYPE_PRECISION (long_long_unsigned_type_node) - prec; + const unsigned long long c1 = 0x0101010101010101ULL >> shift, I think this mixes host and target properties. I guess intead of 'const unsigned long long' you want to use 'const uint64_t' and instead of TYPE_PRECISION (long_long_unsigned_type_node) 64? Since you are later comparing with unsigned HOST_WIDE_INT eventually unsigned HOST_WIDE_INT is better (that's always 64bit as well). You are using tree_to_uhwi but nowhere verifying if @0 is unsigned. What happens if 'prec' is > 64? (__int128 ...). Ah, I guess the final selection will simply select nothing... Otherwise the patch looks reasonable, even if the pattern is a bit unwieldly... ;) Does it work for targets where 'unsigned int' is smaller than 32bit? Thanks, Richard. > > Thanks, > Dmitrij > > > gcc/ChangeLog: > > PR tree-optimization/90836 > > * gcc/match.pd (popcount): New pattern. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/90836 > > * lib/target-supports.exp (check_effective_target_popcount) > (check_effective_target_popcountll): New effective targets. > * gcc.dg/tree-ssa/popcount4.c: New test. > * gcc.dg/tree-ssa/popcount4l.c: New test. > * gcc.dg/tree-ssa/popcount4ll.c: New test.