On Mon, Jan 26, 2026 at 10:41 PM Nathan Bossart
<[email protected]> wrote:
>
> On Thu, Jan 22, 2026 at 11:50:38AM -0600, Nathan Bossart wrote:
> > On Thu, Jan 22, 2026 at 04:50:26PM +0700, John Naylor wrote:
> >> 1) Nowadays, the only global call sites of the word-sized functions
> >> are select_best_grantor() and in bitmapsets. The latter calls the
> >> word-sized functions in a loop (could be just one word). It may be
> >> more efficient to calculate the size in bytes and call pg_popcount().
> >
> > Yeah, these seem like obvious places to use pg_popcount().  Note that
> > bms_member_index() does a final popcount on a masked version of the last
> > word.  We could swap that with pg_popcount(), too, but it might be slower
> > than just calling the word-sized function.  However, it could be hard to
> > tell the difference, as we'd be trading a function or function pointer call
> > with an inlined loop over pg_number_of_ones.  And even if it is slower, I'm
> > not sure it matters all that much in the grand scheme of things.
>
> I added a 0003 that swaps that final popcount with pg_popcount().

I'm not sure either if this part matters much, but it makes more sense
to me to continue using single word functions for that last part.
Since they have very few call sites anymore, we can make them inline
without bloating the binary on x86.

> >> Then we could get rid of all the pointer indirection for the
> >> word-sized functions.
> >
> > Do you mean that we'd just keep the portable ones around?  I see some code
> > in pgvector that might be negatively impacted by that, but if I understand
> > correctly it would require an unusual setup.
>
> I added a 0004 that removes the architecture-specific word-sized functions.

Right, just the portable ones. Here, too, inlining them everywhere
would mitigate any impact. Further,

-int
-pg_popcount64(uint64 word)
+static inline int
+pg_popcount64_neon(uint64 word)

...if they were inlined from the header, I think we wouldn't need this
separate neon function in this file at all. Currently, we rely on
__builtin_popcountl for the portable function outside this file. We
could either keep using that or switch to neon if there's a
portability difference.

0001, 2, and 4 look good to me otherwise. Looking at commit
a8c09daa8bb1, there are some planner benchmarks in the discussion
thread that stress bitmapsets, but I'm not sure if they hit the paths
affected by 0001 at all.

--
John Naylor
Amazon Web Services


Reply via email to