On 6/4/26 15:34, Peter Xu wrote:
The race condition was not reported in any real bug, but I found it while
reviewing some relevant chnages from Akihiko [1]. It's not clear if
there's any way to reproduce it, so far it's only theoretical. Hence
there's also no Fixes tag attached. There's also no need to copy stable
until we have a solid reproducer.
Currently, mru_block might still points to ramblocks that are removed if
below race condition happens:
Reader A Writer
-------- ------
rcu_read_lock()
walk list, find block X
QLIST_REMOVE_RCU(X)
qatomic_set(&mru_block, NULL)
call_rcu(X, reclaim_ramblock)
qatomic_set(&mru_block, X) <----------- overwrites NULL
rcu_read_unlock()
grace period ends, X freed
Reader C
--------
rcu_read_lock()
qatomic_rcu_read(&mru_block) -> X
Read X's block->offset ... <----------- UAF
To fix it, we can introduce a nested RCU free for reset of the mru_block
field, and only free the ramblock until the 2nd RCU call.
There is a version field in RAMList. Maybe we need mru_version too?
- reader A reads version before starting the list walk, then writes
mru_version and mru_block
- writers update version and do not even have to clear mru_block
- new readers ignore mru_block if mru_version != version
This works, but there is a snag:
reader A (MRU parts elided, assuming a miss)
--------------------------------------------
load_acquire version // load version before walking list
walk list
...
writer B
-----------------------
// like seqcount_write_begin
store version++
smp_wmb();
// ----
QLIST_REMOVE_RCU
store release version++ // like seqcount_write_end
// these two must be atomic!
store release mru_version = version & ~1, mru_block = block
reader C (MRU parts shown)
--------------------
load_acquire mru_version // like seqcount_read_begin
load mru_block
// like seqcount_read_retry
smp_rmb()
load_acquire version
// ----
if block == NULL || version != mru_version || block mismatch
walk list
In other words, the seqcount retry check compares the current version of
the list against the one *stored by the previous reader*.
The snag is that torn writes of the (version, block) pair are a problem,
so the store of mru_version and mru_block must be atomic and that
requires CONFIG_ATOMIC128. Maybe avoid MRU altogether if it's not
available, and if it is use qatomic_read__nocheck/qatomic_set__nocheck?
Paolo