https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86544
--- Comment #1 from kugan at gcc dot gnu.org --- (In reply to ktkachov from comment #0) > Great to see that GCC now detects the popcount loop in PR 82479! > I am seeing some curious differences between gcc and g++ though. > int > pc (unsigned long long b) > { > int c = 0; > > while (b) { > b &= b - 1; > c++; > } > > return c; > } > > If compiled with gcc -O3 on aarch64 this gives: > pc: > fmov d0, x0 > cnt v0.8b, v0.8b > addv b0, v0.8b > umov w0, v0.b[0] > ret > > whereas if compiled with g++ -O3 it gives: > _Z2pcy: > .LFB0: > .cfi_startproc > fmov d0, x0 > cmp x0, 0 > cnt v0.8b, v0.8b > addv b0, v0.8b > umov w0, v0.b[0] > and x0, x0, 255 > csel w0, w0, wzr, ne > ret > > which is suboptimal. It seems that phiopt3 manages to optimise the C version > better. The GIMPLE dumps just before the phiopt pass are: > For the C (good version): > > int c; > int _7; > > <bb 2> [local count: 118111601]: > if (b_4(D) != 0) > goto <bb 3>; [89.00%] > else > goto <bb 4>; [11.00%] > > <bb 3> [local count: 105119324]: > _7 = __builtin_popcountl (b_4(D)); > > <bb 4> [local count: 118111601]: > # c_12 = PHI <0(2), _7(3)> > return c_12; > > > For the C++ (bad version): > > int c; > int _7; > > <bb 2> [local count: 118111601]: > if (b_4(D) == 0) > goto <bb 4>; [11.00%] > else > goto <bb 3>; [89.00%] > > <bb 3> [local count: 105119324]: > _7 = __builtin_popcountl (b_4(D)); > > <bb 4> [local count: 118111601]: > # c_12 = PHI <0(2), _7(3)> > return c_12; > > As you can see the order of the gotos and the jump conditions is inverted. > > It seems to me that the two are equivalent and GCC could be doing a better > job of optimising. > > Can we improve phiopt to handle this more effectively? Thanks for the test case. I will look at it.