On Sat, Aug 24, 2019 at 01:01:02PM +0300, Denis Efremov wrote:
> This patch open codes the bitmap_weight() call. The direct
> invocation of hweight_long() allows to remove the BUG_ON and
> excessive "longs to bits, bits to longs" conversion.

Honestly, that's not the problem with this function.  Take a look
at https://danluu.com/assembly-intrinsics/ for a _benchmarked_
set of problems with popcnt.

> BUG_ON was required to check that bitmap_weight() will return
> a correct value, i.e. the computed weight will fit the int type
> of the return value.

What?  No.  Look at the _arguments_ of bitmap_weight():

static __always_inline int bitmap_weight(const unsigned long *src, unsigned int 
nbits)

> With this patch memweight() controls the
> computation directly with size_t type everywhere. Thus, the BUG_ON
> becomes unnecessary.

Why are you bothering?  How are you allocating half a gigabyte of memory?
Why are you calling memweight() on half a gigabyte of memory?

>       if (longs) {
> -             BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
> -             ret += bitmap_weight((unsigned long *)bitmap,
> -                             longs * BITS_PER_LONG);
> +             const unsigned long *bitmap_long =
> +                     (const unsigned long *)bitmap;
> +
>               bytes -= longs * sizeof(long);
> -             bitmap += longs * sizeof(long);
> +             for (; longs > 0; longs--, bitmap_long++)
> +                     ret += hweight_long(*bitmap_long);
> +             bitmap = (const unsigned char *)bitmap_long;
>       }

If you really must change anything, I'd rather see this turned into a
loop:

        while (longs) {
                unsigned int nbits;

                if (longs >= INT_MAX / BITS_PER_LONG)
                        nbits = INT_MAX + 1;
                else
                        nbits = longs * BITS_PER_LONG;

                ret += bitmap_weight((unsigned long *)bitmap, sz);
                bytes -= nbits / 8;
                bitmap += nbits / 8;
                longs -= nbits / BITS_PER_LONG;
        }

then we only have to use Dan Luu's optimisation in bitmap_weight()
and not in memweight() as well.

Also, why does the trailer do this:

        for (; bytes > 0; bytes--, bitmap++)
                ret += hweight8(*bitmap);

instead of calling hweight_long on *bitmap & mask?

Reply via email to