> -----Original Message-----
> From: Stephen Hemminger <[email protected]>
> Sent: Wednesday, May 27, 2026 3:38 AM
> To: Wei Hu <[email protected]>
> Cc: [email protected]; Long Li <[email protected]>; Wei Hu
> <[email protected]>
> Subject: [EXTERNAL] Re: [PATCH v3 0/2] net/mana: add device reset support
>
>
> Warnings
>
> 3. Secondary process: qsbr does not actually quiesce secondary lcores.
>
> rte_rcu_qsbr_thread_register is only called from
> mana_dev_configure, which the secondary never runs. The
> secondary's rx_burst/tx_burst still call thread_online/offline
> against the shared qsv, but the secondary tids are unregistered,
> so they are invisible to rte_rcu_qsbr_check in the primary, and
> the secondary MP handler (mana_mp_reset_enter) does not call
> qsbr_check at all -- it just sets db_page = NULL and munmaps.
>
> The dev_state check at the top of secondary tx_burst is racy:
> the page can be munmapped after the in-loop read of db_page but
> before the doorbell write at the bottom. The "All secondary
> threads are quiescent" log line in mana_mp_reset_enter is not
> true.
>
> The secondary needs a real reader-side primitive -- its own
> qsbr with secondary lcore registration, or an rwlock the MP
> handler takes before munmap.
>
Thanks for the v3 review, @Stephen. I will send out v4 to incorporate most
of the review comments except for this one.
The review on this point is not correct. Here I am providing analysis from
AI and my own test results to show why.
The concern is that "rte_rcu_qsbr_thread_register is only called
from mana_dev_configure, which the secondary never runs", so
secondary tids are unregistered and invisible to rte_rcu_qsbr_check.
This is incorrect. The RCU thread IDs are per-queue, not per-process.
The tid used in rte_rcu_qsbr_thread_online/offline is:
- RX path: tid = rxq->rxq_idx (0 to num_queues-1)
- TX path: tid = num_queues + txq_idx (num_queues to 2*num_queues-1)
These tids are registered once by primary in mana_dev_configure
(mana.c:117), which calls rte_rcu_qsbr_thread_register for IDs
0..2*num_queues-1. The QSV (priv->dev_state_qsv) is allocated
with rte_zmalloc_socket in shared hugepage memory (mana.c:2207).
The queue structures (rxq, txq) also live in shared memory via
dev->data->rx_queues[] / tx_queues[].
When a secondary lcore polls a queue, it calls
rte_rcu_qsbr_thread_online(qsv, tid) with the SAME shared QSV
and SAME queue-based tid. DPDK's fundamental model requires that
a given queue is polled by exactly one lcore at a time (queues
are not thread-safe), so the tid maps 1:1 to a lcore regardless
of which process that lcore belongs to. The tid IS registered ―
registration is a property of the QSV structure, not of the process.
The reset synchronization flow (mana_reset_enter, mana.c:1458-1488):
1. Primary: dev_state = MANA_DEV_RESET_ENTER (release store)
2. Primary: rte_rcu_qsbr_start + rte_rcu_qsbr_check ― blocks
until ALL registered tids have passed through thread_offline
3. Secondary lcores in rx_burst/tx_burst eventually call
rte_rcu_qsbr_thread_offline, and on re-entry see
dev_state != ACTIVE → return 0 without touching db_page
4. Only after QSBR check passes (ALL tids quiescent, including
secondary's) does primary proceed to mana_dev_stop (line 1472)
5. Primary sends MANA_MP_REQ_RESET_ENTER IPC (line 1481)
6. Secondary MP handler: sets db_page = NULL, munmaps
The alleged race ― "db_page can be munmapped after the in-loop
read but before the doorbell write" ― cannot happen because:
- db_page is munmapped at step 6, which is AFTER step 2 (QSBR).
- After QSBR passes, any secondary re-entry into tx_burst/rx_burst
reads dev_state with acquire ordering (tx.c:214, rx.c:468),
sees RESET_ENTER, and returns immediately without using the
local db_page copy.
- The memory ordering is sound: primary's release store of
dev_state (step 1) happens-before QSBR start (step 2).
Secondary's thread_offline (release) synchronizes-with
primary's QSBR check (acquire). On secondary re-entry,
thread_online + acquire load of dev_state sees the update.
The secondary MP handler does NOT need its own QSBR check ―
the primary's check at step 2 already covers all data path
threads including secondary's.
Note: The log "All secondary threads are quiescent" in
mana_mp_reset_enter is misleading and has been removed.
The actual quiescence is enforced by the primary's QSBR
check before IPC is sent.
Followings are from my own test. I added the debug log in mana_tx_burst() call.
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 115bc722f4..038a3d9e00 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -454,6 +454,9 @@ struct mana_txq {
struct mana_stats stats;
unsigned int socket;
unsigned int txq_idx;
+#if 1
+ unsigned int call_cnt;
+#endif
};
struct mana_rxq {
diff --git a/drivers/net/mana/tx.c b/drivers/net/mana/tx.c
index 3b07a8b9a6..3ae825190e 100644
--- a/drivers/net/mana/tx.c
+++ b/drivers/net/mana/tx.c
@@ -210,6 +210,14 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts)
}
rte_rcu_qsbr_thread_online(dstate_qsv, tid);
+#if 1
+ if (txq->call_cnt++ % 1000000 == 0) {
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY)
+ DP_LOG(ERR, "secondary tid %u online", tid);
+ else
+ DP_LOG(ERR, "primary tid %u online", tid);
+ }
+#endif
if (unlikely(rte_atomic_load_explicit(&priv->dev_state,
rte_memory_order_acquire) != MANA_DEV_ACTIVE ||
!db_page)) {
Primary command:
dpdk-testpmd --log-level='.*,8' -l 2-4 ... --proc-type=primary -- --nb-cores 2
--forward-mode=txonly ... --txq=4 --rxq=4 ... --num-procs=2 --proc-id 0
Secondary command:
dpdk-testpmd --log-level='.*,8' -l 6-8 ... --proc-type=secondary -- --nb-cores
2
--forward-mode=txonly ... --txq=4 --rxq=4 ... --num-procs=2 --proc-id 1
Here in both commands, I specify the number of txq and rxq to be 4.
After primary start, the log shows 8 threads total were registered in
mana_dev_configure
and their tids:
Configuring Port 0 (socket 0)
MANA_DRIVER: mana_dev_configure(): priv 0x1003cdc80, port 0, dev port 1,
num_queues: 4
MANA_DRIVER: mana_dev_configure(): Register thread 0x0 for priv 0x1003cdc80,
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x1 for priv 0x1003cdc80,
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x2 for priv 0x1003cdc80,
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x3 for priv 0x1003cdc80,
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x4 for priv 0x1003cdc80,
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x5 for priv 0x1003cdc80,
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x6 for priv 0x1003cdc80,
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x7 for priv 0x1003cdc80,
port 0
ETHDEV: Port 0 Rx offload RSS_HASH is not requested but enabled
Note tid 0x0 to 0x3 are rx threads, 0x4 to 0x7 are tx threads. Then the primary
only printed out the tid 4 and 5 sere online:
MANA_DRIVER: primary tid 4 online
MANA_DRIVER: primary tid 5 online
Then I started the secondary and saw tid 6 and 7 were online in its
mana_tx_burst call:
MANA_DRIVER: secondary tid 7 online
MANA_DRIVER: secondary tid 6 online
MANA_DRIVER: secondary tid 7 online
This simply means primary has registered all threads which would be up in both
primary and secondary in mana_ddev_configure. Primary only starts half of the
threads since --num-procs=2 passed in already tells primary that there will be
a secondary to start.
When the secondary starts, it takes the rest two tx and rx threads. All these
threads
have already been registered in mana_dev_configure() by calling
rte_rcu_qsbr_thread_register() in the primary.
So, the review comments on this point is incorrect. All threads are covered and
there is no race at all.
Thanks,
Wei