On Thu, 2026-06-18 at 20:26 -0400, Tamir Duberstein wrote:
> After consuming the last visible record, ringbuf_process_ring()
> publishes the consumer position and checks the producer position. These
> operations lack a full StoreLoad barrier. A producer can therefore
> commit a new record but read the old consumer position while the
> consumer reads the old producer position. The producer sends no
> notification and the consumer waits despite a queued record.
>
> Insert a full barrier between publishing a consumer position and the
> next producer position load. When a record bound or callback ends the
> current invocation first, execute the barrier before returning so the
> load in a later invocation completes the same handshake.
>
> Add an edge-triggered epoll test that drains one record per call while a
> concurrent producer fills the ring. Without the barrier, a missed
> notification leaves the producer dropping records from a full ring while
> the consumer times out. Document that bounded consumers and callbacks
> that terminate consumption must drain before waiting again.
>
> Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
> Reported-by: Andrew Werner <[email protected]>
> Reported-by: Sashiko <[email protected]>
> Closes:
> https://lore.kernel.org/bpf/[email protected]/
> Assisted-by: Codex:gpt-5.5
> Signed-off-by: Tamir Duberstein <[email protected]>
> ---
Took me a while, but I agree there is a race here.
FWIW, here is the description of the race as far as I understand it.
Simplified pseudo-code for the consumer (ringbuf_process_ring)
cons_pos = load_acquire(consumer_pos); // op#1 [orders before 2,3,4,5]
do {
got_new_data = false;
prod_pos = load_acquire(producer_pos); // op#2 [orders before 3,4,5]
while (cons_pos < prod_pos) {
len_ptr = ... cons_pos ...;
len = load_acquire(len_ptr); // op#3 [orders before 4,5]
got_new_data = true;
callback(... cons_pos ...); // op#4
cons_pos += len;
store_release(consumer_pos, cons_pos); // op#5 [orders after 2,3,4;
ordering relative to 2' not defined]
}
} while (got_new_data);
The patch fixes the issue with op#5, store_release operation is
ordered after operations 2,3,4, but it's ordering relative to op#2
from the next outer loop iteration (denoted as 2') is not defined.
Simplified pseudo-code for the producer (bpf_ringbuf_{reserve,commit})
// reserve
store_release(producer_pos, ...);
// commit
xchg(... clear BUSY_BIT ...);
cons_pos = load_acquire(consumer_pos);
if (cons_pos == rec_pos)
schedule_wakeup();
A possible sequence of events
Producer:
// submits a record such that:
consumer_pos = 0 (initial state)
producer_pos = 10
Consumer:
cons_pos = load_acquire(consumer_pos) -> cons_pos = 0
prod_pos = load_acquire(producer_pos) -> prod_pos = 10
len = load_acquire(len_ptr) -> len = 10
callback(...) -> (record at pos 0 consumed)
cons_pos += len -> cons_pos = 10
store_release(consumer_pos, cons_pos) -> consumer_pos = 10 (issued;
global visibility deferred)
(cons_pos < prod_pos) 10 < 10 -> false, exit inner loop
while (got_new_data) -> true, re-enter outer loop
prod_pos = load_acquire(producer_pos) -> prod_pos = 10 (reads
current value; record at pos 10 not published yet)
(cons_pos < prod_pos) 10 < 10 -> false, skip inner loop
while (got_new_data) -> false; consumer returns 0 and
enters an epoll wait
Producer (reserves and submits a new record):
store_release(producer_pos, 20) -> producer_pos = 20 (record at
pos 10 reserved)
xchg(... clear BUSY_BIT ...) -> (record at pos 10 committed)
cons_pos = load_acquire(consumer_pos) -> cons_pos = 0
(consumer's store of 10 not visible yet)
rec_pos = 10; cons_pos == rec_pos -> 0 == 10 false -> NO WAKEUP
Consumer:
(deferred store becomes visible) -> consumer_pos = 10 (too late)
---
Consumer exits while a new record is available in the buffer,
and the producer fails to notify the consumer via epoll.
The added barrier prevents such sequence of events.
For the fix:
Reviewed-by: Eduard Zingerman <[email protected]>
Please move the test to a separate commit, I'll review it a bit later.
[...]