On 02/01/2026 08.17, Leon Hwang wrote:
Introduce a new tracepoint to track stalled page pool releases,
providing better observability for page pool lifecycle issues.
In general I like/support adding this tracepoint for "debugability" of
page pool lifecycle issues.
For "observability" @Kuba added a netlink scheme[1][2] for page_pool[3],
which gives us the ability to get events and list page_pools from userspace.
I've not used this myself (yet) so I need input from others if this is
something that others have been using for page pool lifecycle issues?
Need input from @Kuba/others as the "page-pool-get"[4] state that "Only
Page Pools associated with a net_device can be listed". Don't we want
the ability to list "invisible" page_pool's to allow debugging issues?
[1] https://docs.kernel.org/userspace-api/netlink/intro-specs.html
[2] https://docs.kernel.org/userspace-api/netlink/index.html
[3] https://docs.kernel.org/netlink/specs/netdev.html
[4] https://docs.kernel.org/netlink/specs/netdev.html#page-pool-get
Looking at the code, I see that NETDEV_CMD_PAGE_POOL_CHANGE_NTF netlink
notification is only generated once (in page_pool_destroy) and not when
we retry in page_pool_release_retry (like this patch). In that sense,
this patch/tracepoint is catching something more than netlink provides.
First I though we could add a netlink notification, but I can imagine
cases this could generate too many netlink messages e.g. a netdev with
128 RX queues generating these every second for every RX queue.
Guess, I've talked myself into liking this change, what do other
maintainers think? (e.g. netlink scheme and debugging balance)
Problem:
Currently, when a page pool shutdown is stalled due to inflight pages,
the kernel only logs a warning message via pr_warn(). This has several
limitations:
1. The warning floods the kernel log after the initial DEFER_WARN_INTERVAL,
making it difficult to track the progression of stalled releases
2. There's no structured way to monitor or analyze these events
3. Debugging tools cannot easily capture and correlate stalled pool
events with other network activity
Solution:
Add a new tracepoint, page_pool_release_stalled, that fires when a page
pool shutdown is stalled. The tracepoint captures:
- pool: pointer to the stalled page_pool
- inflight: number of pages still in flight
- sec: seconds since the release was deferred
The implementation also modifies the logging behavior:
- pr_warn() is only emitted during the first warning interval
(DEFER_WARN_INTERVAL to DEFER_WARN_INTERVAL*2)
- The tracepoint is fired always, reducing log noise while still
allowing monitoring tools to track the issue
This allows developers and system administrators to:
- Use tools like perf, ftrace, or eBPF to monitor stalled releases
- Correlate page pool issues with network driver behavior
- Analyze patterns without parsing kernel logs
- Track the progression of inflight page counts over time
Signed-off-by: Leon Huang Fu <[email protected]>
Signed-off-by: Leon Hwang <[email protected]>
---
v2 -> v3:
- Print id using '%u'.
- https://lore.kernel.org/netdev/[email protected]/
v1 -> v2:
- Drop RFC.
- Store 'pool->user.id' to '__entry->id' (per Steven Rostedt).
- https://lore.kernel.org/netdev/[email protected]/
---
include/trace/events/page_pool.h | 24 ++++++++++++++++++++++++
net/core/page_pool.c | 6 ++++--
2 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h
index 31825ed30032..a851e0f6a384 100644
--- a/include/trace/events/page_pool.h
+++ b/include/trace/events/page_pool.h
@@ -113,6 +113,30 @@ TRACE_EVENT(page_pool_update_nid,
__entry->pool, __entry->pool_nid, __entry->new_nid)
);
+TRACE_EVENT(page_pool_release_stalled,
+
+ TP_PROTO(const struct page_pool *pool, int inflight, int sec),
+
+ TP_ARGS(pool, inflight, sec),
+
+ TP_STRUCT__entry(
+ __field(const struct page_pool *, pool)
+ __field(u32, id)
+ __field(int, inflight)
+ __field(int, sec)
+ ),
+
+ TP_fast_assign(
+ __entry->pool = pool;
+ __entry->id = pool->user.id;
+ __entry->inflight = inflight;
+ __entry->sec = sec;
+ ),
+
+ TP_printk("page_pool=%p id=%u inflight=%d sec=%d",
+ __entry->pool, __entry->id, __entry->inflight, __entry->sec)
+);
+
#endif /* _TRACE_PAGE_POOL_H */
/* This part must be outside protection */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 265a729431bb..01564aa84c89 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1222,8 +1222,10 @@ static void page_pool_release_retry(struct work_struct
*wq)
(!netdev || netdev == NET_PTR_POISON)) {
int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;
- pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
- __func__, pool->user.id, inflight, sec);
+ if (sec >= DEFER_WARN_INTERVAL / HZ && sec <
DEFER_WARN_INTERVAL * 2 / HZ)
+ pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d
sec\n",
+ __func__, pool->user.id, inflight, sec);
+ trace_page_pool_release_stalled(pool, inflight, sec);
pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
}