On Wed, Feb 13, 2019 at 7:16 AM Frederic Weisbecker <frede...@kernel.org> wrote: > > > > If "vectors" only has the high hit set, you end up with "fs" having > > the value "64". > > > > And then "vectors >>= fs" is undefined and won't actually do anything > > at all on x86. > > Oh! ok didn't know that...
So in general, shift counts >= width of the type (or negative) are undefined. They can sometimes happen to work (that's the "undefined" part ;), but it's not reliable or portable. It's why you occasionally see things like drivers/block/sx8.c: tmp = (blk_rq_pos(rq) >> 16) >> 16; to get the upper 32 bits of the value. It is written with that odd double shift, rather than being written as ">> 32". That way it works even if the sector type happens to be 32-bit (and the compiler will just end up turning it into a zero if it's an unsigned 32-bit type since it's compile-time obvious). > I see, perhaps I should use for_each_set_bit() that should take care about > those > details? That would _work_, but don't do that. "for_each_set_bit()" works on bitmaps in memory, and is slow for a simple word case. In addition to being slow, it uses the Linux tradition of working on bitmaps that are comprised of "unsigned long". So it has byte order issues too. So for_each_set_bit() is useful when you have real arrays of bits and are using the "set_bit()" etc interfaces. When you're actually working on just a single variable, your "__ffs()" model works fine, you just need to be careful to _not_ do the "+1" and then use it for shifts. Also, it actually turns out that if you want to be really clever, you can play tricks if you don't care about the exact bit *number*. For example, this expression: v = a & (a-1); will remove the lowest bit set from 'a' very cheaply. So what you can do is something like this: void for_each_bit_in_mask(u64 mask) { while (mask) { u64 newmask = mask & (mask-1); u64 onebit = mask ^ newmask; mask = newmask; do_something_with(onebit); } } to do some operation on each bit set, and quite efficiently and without any undefined behavior or expensive shifts. But the above trick does require that you are happy to just see the bit *mask*, not the bit *number*. I'm not sure any of your cases match that. Linus