On 27/03/2026 13:34, Simon Horman wrote:
On Wed, Mar 25, 2026 at 08:24:38PM -0700, Xiang Mei wrote:
br_mrp_start_test() and br_mrp_start_in_test() accept the user-supplied
interval value from netlink without validation. When interval is 0,
usecs_to_jiffies(0) yields 0, causing the delayed work
(br_mrp_test_work_expired / br_mrp_in_test_work_expired) to reschedule
itself with zero delay. This creates a tight loop on system_percpu_wq
that allocates and transmits MRP test frames at maximum rate, exhausting
all system memory and causing a kernel panic via OOM deadlock.
I would suspect the primary outcome of this problem is high CPU consumption
rather than memory exhaustion. Is there a reason to expect that
the transmitted fames can't be consumed as fast as they are created?
+1
More so with CAP_NET_ADMIN you can cause all sorts of OOM and high-cpu usage
conditions. This is a configuration error and OOM doesn't lead to panic unless
instructed to. I don't think this is worth changing at all.
The same zero-interval issue applies to br_mrp_start_in_test_parse()
for interconnect test frames.
Use NLA_POLICY_MIN(NLA_U32, 1) in the nla_policy tables for both
IFLA_BRIDGE_MRP_START_TEST_INTERVAL and
IFLA_BRIDGE_MRP_START_IN_TEST_INTERVAL, so zero is rejected at the
netlink attribute parsing layer before the value ever reaches the
workqueue scheduling code. This is consistent with how other bridge
subsystems (br_fdb, br_mst) enforce range constraints on netlink
attributes.
Fixes: 7ab1748e4ce6 ("bridge: mrp: Extend MRP netlink interface for configuring MRP
interconnect")
I think you also want
Fixes: 20f6a05ef635 ("bridge: mrp: Rework the MRP netlink interface")
As highlighted by AI review.
Reported-by: Weiming Shi <[email protected]>
Signed-off-by: Xiang Mei <[email protected]>
...