On Mon, Jan 26, 2026 at 10:41 PM Nathan Bossart <[email protected]> wrote: > > On Thu, Jan 22, 2026 at 11:50:38AM -0600, Nathan Bossart wrote: > > On Thu, Jan 22, 2026 at 04:50:26PM +0700, John Naylor wrote: > >> 1) Nowadays, the only global call sites of the word-sized functions > >> are select_best_grantor() and in bitmapsets. The latter calls the > >> word-sized functions in a loop (could be just one word). It may be > >> more efficient to calculate the size in bytes and call pg_popcount(). > > > > Yeah, these seem like obvious places to use pg_popcount(). Note that > > bms_member_index() does a final popcount on a masked version of the last > > word. We could swap that with pg_popcount(), too, but it might be slower > > than just calling the word-sized function. However, it could be hard to > > tell the difference, as we'd be trading a function or function pointer call > > with an inlined loop over pg_number_of_ones. And even if it is slower, I'm > > not sure it matters all that much in the grand scheme of things. > > I added a 0003 that swaps that final popcount with pg_popcount().
I'm not sure either if this part matters much, but it makes more sense to me to continue using single word functions for that last part. Since they have very few call sites anymore, we can make them inline without bloating the binary on x86. > >> Then we could get rid of all the pointer indirection for the > >> word-sized functions. > > > > Do you mean that we'd just keep the portable ones around? I see some code > > in pgvector that might be negatively impacted by that, but if I understand > > correctly it would require an unusual setup. > > I added a 0004 that removes the architecture-specific word-sized functions. Right, just the portable ones. Here, too, inlining them everywhere would mitigate any impact. Further, -int -pg_popcount64(uint64 word) +static inline int +pg_popcount64_neon(uint64 word) ...if they were inlined from the header, I think we wouldn't need this separate neon function in this file at all. Currently, we rely on __builtin_popcountl for the portable function outside this file. We could either keep using that or switch to neon if there's a portability difference. 0001, 2, and 4 look good to me otherwise. Looking at commit a8c09daa8bb1, there are some planner benchmarks in the discussion thread that stress bitmapsets, but I'm not sure if they hit the paths affected by 0001 at all. -- John Naylor Amazon Web Services
