On Sun, Feb 18, 2018 at 05:05:06AM -0800, Tejun Heo wrote:
> sbitmap is used to allocate tags. The free and alloc paths use a
> memory ordering scheme similar to the one used by waitqueue, where the
> waiter and waker synchronize around set_current_state().
>
> This doesn't seem sufficient for sbitmap given that a tag may get
> released and re-acquired without the allocator blocking at all. Once
> the bit for the tag is cleared, the tag may be reused immediately and
> due to the lack of memory ordering around bit clearing, its memory
> accesses may race against the ones from before the clearing.
>
> Given that the bits are the primary synchronization mechanism, they
> should be ordered memory-wise. This patch replaces waitqueue-style
> memory barriers with clear_bit_unlock() in sbitmap_clear_bit() and
> test_and_set_bit_lock() in __sbitmap_get_word().
>
> Signed-off-by: Tejun Heo <[email protected]>
> ---
> Hello,
>
> Spotted this while verifying the timeout fix. I didn't check the
> whole code, so although unlikely it's possible that the removed mb's
> are needed from elsewhere, so the RFC. Only boot tested.
>
> Thanks.
>
> include/linux/sbitmap.h | 3 ++-
> lib/sbitmap.c | 17 ++++++-----------
> 2 files changed, 8 insertions(+), 12 deletions(-)
>
> --- a/include/linux/sbitmap.h
> +++ b/include/linux/sbitmap.h
> @@ -297,7 +297,8 @@ static inline void sbitmap_set_bit(struc
>
> static inline void sbitmap_clear_bit(struct sbitmap *sb, unsigned int bitnr)
> {
> - clear_bit(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr));
> + /* paired with test_and_set_bit_lock() in __sbitmap_get_word() */
> + clear_bit_unlock(SB_NR_TO_BIT(sb, bitnr), __sbitmap_word(sb, bitnr));
> }
I agree that we want this, but for a different reason than you described
in your changelog: a sbitmap can be used without a sbitmap_queue, so it
should provide the proper memory ordering. For the sbitmap_queue case,
the compiler/processor could also reorder something after the
clear_bit() and before the smp_mb__after_atomic(), which is also wrong.
> static inline int sbitmap_test_bit(struct sbitmap *sb, unsigned int bitnr)
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -100,7 +100,8 @@ static int __sbitmap_get_word(unsigned l
> return -1;
> }
>
> - if (!test_and_set_bit(nr, word))
> + /* paired with clear_bit_unlock() in sbitmap_clear_bit() */
> + if (!test_and_set_bit_lock(nr, word))
> break;
test_and_set_bit_lock() is an ACQUIRE operation which has weaker
guarantees than the full barrier implied by test_and_set_bit(), but
ACQUIRE is enough here, so I agree with this part, too.
> hint = nr + 1;
> @@ -432,14 +433,9 @@ static void sbq_wake_up(struct sbitmap_q
> int wait_cnt;
>
> /*
> - * Pairs with the memory barrier in set_current_state() to ensure the
> - * proper ordering of clear_bit()/waitqueue_active() in the waker and
> - * test_and_set_bit()/prepare_to_wait()/finish_wait() in the waiter. See
> - * the comment on waitqueue_active(). This is __after_atomic because we
> - * just did clear_bit() in the caller.
> + * Memory ordering is handled by sbitmap_clear_bit() and
> + * __sbitmap_get_word().
> */
> - smp_mb__after_atomic();
> -
This, I'm not convinced that we can get rid of. clear_bit_unlock() is a
RELEASE operation, not a full barrier, so the waitqueue_active() read
can be reordered before the clear_bit() store. Imagine we get this
interleaving:
waitqueue_active() -> false |
| /* bitmap is full, allocation fails */
| prepare_to_wait()
clear_bit_unlock() |
We should've woken up the waiter, but we didn't.
> ws = sbq_wake_ptr(sbq);
> if (!ws)
> return;
> @@ -481,10 +477,9 @@ void sbitmap_queue_wake_all(struct sbitm
> int i, wake_index;
>
> /*
> - * Pairs with the memory barrier in set_current_state() like in
> - * sbq_wake_up().
> + * Memory ordering is handled by sbitmap_clear_bit() and
> + * __sbitmap_get_word().
> */
> - smp_mb();
Same idea here.
> wake_index = atomic_read(&sbq->wake_index);
> for (i = 0; i < SBQ_WAIT_QUEUES; i++) {
> struct sbq_wait_state *ws = &sbq->ws[wake_index];
So I think we want a patch for the test_and_set_bit_lock() and
clear_bit_unlock(), but the rest should stay as-is.