Re: [PATCH 1/5] clump_bits: Introduce the for_each_set_clump macro

2020-12-28 Thread William Breathitt Gray
On Sun, Dec 27, 2020 at 11:03:06PM +0100, Arnd Bergmann wrote:
> On Sat, Dec 26, 2020 at 7:42 AM Syed Nayyar Waris  
> wrote:
> >
> > This macro iterates for each group of bits (clump) with set bits,
> > within a bitmap memory region. For each iteration, "start" is set to
> > the bit offset of the found clump, while the respective clump value is
> > stored to the location pointed by "clump". Additionally, the
> > bitmap_get_value() and bitmap_set_value() functions are introduced to
> > respectively get and set a value of n-bits in a bitmap memory region.
> > The n-bits can have any size from 1 to BITS_PER_LONG. size less
> > than 1 or more than BITS_PER_LONG causes undefined behaviour.
> > Moreover, during setting value of n-bit in bitmap, if a situation arise
> > that the width of next n-bit is exceeding the word boundary, then it
> > will divide itself such that some portion of it is stored in that word,
> > while the remaining portion is stored in the next higher word. Similar
> > situation occurs while retrieving the value from bitmap.
> >
> > GCC gives warning in bitmap_set_value(): https://godbolt.org/z/rjx34r
> > Add explicit check to see if the value being written into the bitmap
> > does not fall outside the bitmap.
> > The situation that it is falling outside would never be possible in the
> > code because the boundaries are required to be correct before the
> > function is called. The responsibility is on the caller for ensuring the
> > boundaries are correct.
> > The code change is simply to silence the GCC warning messages
> > because GCC is not aware that the boundaries have already been checked.
> > As such, we're better off using __builtin_unreachable() here because we
> > can avoid the latency of the conditional check entirely.
> 
> Didn't the __builtin_unreachable() end up leading to an objtool
> warning about incorrect stack frames for the code path that leads
> into the undefined behavior? I thought I saw a message from the 0day
> build bot about that and didn't expect to see it again after that.
> 
> Can you actually measure any performance difference compared
> to BUG_ON() that avoids the undefined behavior? Practically
> all CPUs from the past 20 years have branch predictors that should
> completely hide measurable overhead from this.
> 
>   Arnd

When I initially recommended using __builtin_unreachable(), I was
anticipating the use of bitmap_set_value() in kernel at large -- so the
possible performance hit from a conditional check was a concern for me.
However, now that we're restricting the scope of bitmap_set_value() to
only the GPIO subsystem, such optimization is no longer a major concern
I feel: gpio-xilinx is the only driver utilizing bitmap_set_value() --
and we know it won't be called in a loop -- so whatever hypothetical
performance hit there might be is inconsequential in the end.

Instead, we should focus on code clarity now. I believe it makes sense
given the new scope of this function to revert back to the earlier
suggestion of passing in and checking the boundary explicitly, and to
remove the __builtin_unreachable() call for now. If bitmap_set_value()
becomes available to the rest of the kernel in the future, we can
reconsider whether or not to use __builtin_unreachable().

William Breathitt Gray


signature.asc
Description: PGP signature


Re: [PATCH 1/5] clump_bits: Introduce the for_each_set_clump macro

2020-12-27 Thread Arnd Bergmann
On Sat, Dec 26, 2020 at 7:42 AM Syed Nayyar Waris  wrote:
>
> This macro iterates for each group of bits (clump) with set bits,
> within a bitmap memory region. For each iteration, "start" is set to
> the bit offset of the found clump, while the respective clump value is
> stored to the location pointed by "clump". Additionally, the
> bitmap_get_value() and bitmap_set_value() functions are introduced to
> respectively get and set a value of n-bits in a bitmap memory region.
> The n-bits can have any size from 1 to BITS_PER_LONG. size less
> than 1 or more than BITS_PER_LONG causes undefined behaviour.
> Moreover, during setting value of n-bit in bitmap, if a situation arise
> that the width of next n-bit is exceeding the word boundary, then it
> will divide itself such that some portion of it is stored in that word,
> while the remaining portion is stored in the next higher word. Similar
> situation occurs while retrieving the value from bitmap.
>
> GCC gives warning in bitmap_set_value(): https://godbolt.org/z/rjx34r
> Add explicit check to see if the value being written into the bitmap
> does not fall outside the bitmap.
> The situation that it is falling outside would never be possible in the
> code because the boundaries are required to be correct before the
> function is called. The responsibility is on the caller for ensuring the
> boundaries are correct.
> The code change is simply to silence the GCC warning messages
> because GCC is not aware that the boundaries have already been checked.
> As such, we're better off using __builtin_unreachable() here because we
> can avoid the latency of the conditional check entirely.

Didn't the __builtin_unreachable() end up leading to an objtool
warning about incorrect stack frames for the code path that leads
into the undefined behavior? I thought I saw a message from the 0day
build bot about that and didn't expect to see it again after that.

Can you actually measure any performance difference compared
to BUG_ON() that avoids the undefined behavior? Practically
all CPUs from the past 20 years have branch predictors that should
completely hide measurable overhead from this.

  Arnd


[PATCH 1/5] clump_bits: Introduce the for_each_set_clump macro

2020-12-25 Thread Syed Nayyar Waris
This macro iterates for each group of bits (clump) with set bits,
within a bitmap memory region. For each iteration, "start" is set to
the bit offset of the found clump, while the respective clump value is
stored to the location pointed by "clump". Additionally, the
bitmap_get_value() and bitmap_set_value() functions are introduced to
respectively get and set a value of n-bits in a bitmap memory region.
The n-bits can have any size from 1 to BITS_PER_LONG. size less
than 1 or more than BITS_PER_LONG causes undefined behaviour.
Moreover, during setting value of n-bit in bitmap, if a situation arise
that the width of next n-bit is exceeding the word boundary, then it
will divide itself such that some portion of it is stored in that word,
while the remaining portion is stored in the next higher word. Similar
situation occurs while retrieving the value from bitmap.

GCC gives warning in bitmap_set_value(): https://godbolt.org/z/rjx34r
Add explicit check to see if the value being written into the bitmap
does not fall outside the bitmap.
The situation that it is falling outside would never be possible in the
code because the boundaries are required to be correct before the
function is called. The responsibility is on the caller for ensuring the
boundaries are correct.
The code change is simply to silence the GCC warning messages
because GCC is not aware that the boundaries have already been checked.
As such, we're better off using __builtin_unreachable() here because we
can avoid the latency of the conditional check entirely.

Cc: Linus Walleij 
Cc: Arnd Bergmann 
Cc: William Breathitt Gray 
Cc: Andy Shevchenko 
Signed-off-by: Syed Nayyar Waris 
---
 drivers/gpio/clump_bits.h | 101 ++
 1 file changed, 101 insertions(+)
 create mode 100644 drivers/gpio/clump_bits.h

diff --git a/drivers/gpio/clump_bits.h b/drivers/gpio/clump_bits.h
new file mode 100644
index ..72ef772b83c8
--- /dev/null
+++ b/drivers/gpio/clump_bits.h
@@ -0,0 +1,101 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __CLUMP_BITS_H
+#define __CLUMP_BITS_H
+
+/**
+ * find_next_clump - find next clump with set bits in a memory region
+ * @clump: location to store copy of found clump
+ * @addr: address to base the search on
+ * @size: bitmap size in number of bits
+ * @offset: bit offset at which to start searching
+ * @clump_size: clump size in bits
+ *
+ * Returns the bit offset for the next set clump; the found clump value is
+ * copied to the location pointed by @clump. If no bits are set, returns @size.
+ */
+extern unsigned long find_next_clump(unsigned long *clump,
+ const unsigned long *addr,
+ unsigned long size, unsigned long offset,
+ unsigned long clump_size);
+
+#define find_first_clump(clump, bits, size, clump_size) \
+   find_next_clump((clump), (bits), (size), 0, (clump_size))
+
+/**
+ * bitmap_get_value - get a value of n-bits from the memory region
+ * @map: address to the bitmap memory region
+ * @start: bit offset of the n-bit value
+ * @nbits: size of value in bits (must be between 1 and BITS_PER_LONG 
inclusive).
+ *
+ * Returns value of nbits located at the @start bit offset within the @map
+ * memory region.
+ */
+static inline unsigned long bitmap_get_value(const unsigned long *map,
+ unsigned long start,
+ unsigned long nbits)
+{
+   const size_t index = BIT_WORD(start);
+   const unsigned long offset = start % BITS_PER_LONG;
+   const unsigned long ceiling = round_up(start + 1, BITS_PER_LONG);
+   const unsigned long space = ceiling - start;
+   unsigned long value_low, value_high;
+
+   if (space >= nbits)
+   return (map[index] >> offset) & GENMASK(nbits - 1, 0);
+   else {
+   value_low = map[index] & BITMAP_FIRST_WORD_MASK(start);
+   value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + 
nbits);
+   return (value_low >> offset) | (value_high << space);
+   }
+}
+
+/**
+ * bitmap_set_value - set value within a memory region
+ * @map: address to the bitmap memory region
+ * @nbits: size of map in bits
+ * @value: value of clump
+ * @value_width: size of value in bits (must be between 1 and BITS_PER_LONG 
inclusive)
+ * @start: bit offset of the value
+ */
+static inline void bitmap_set_value(unsigned long *map, unsigned long nbits,
+   unsigned long value, unsigned long 
value_width,
+   unsigned long start)
+{
+   const unsigned long index = BIT_WORD(start);
+   const unsigned long length = BIT_WORD(nbits);
+   const unsigned long offset = start % BITS_PER_LONG;
+   const unsigned long ceiling = round_up(start + 1, BITS_PER_LONG);
+   const unsigned long space = ceiling - start;
+
+   value &= GENMASK(v