在 2024/2/21 15:44, Michal Hocko 写道:
It would be really helpful to have more details on why we need those
trace points. It is my understanding that you would like to have a more
fine grained numbers for the time duration of different parts of the
reclaim process. I can imagine this could be useful in some cases but is
it useful enough and for a wider variety of workloads? Is that worth a
dedicated static tracepoints? Why an add-hoc dynamic tracepoints or BPF
for a very special situation is not sufficient? In other words, tell us
more about the usecases and why is this generally useful.
Thank you for your reply, I'm sorry that I forgot to describe the
detailed reason.
Memory reclamation usually occurs when there is high memory pressure (or
low memory) and is performed by Kswapd. In embedded systems, CPU
resources are limited, and it is common for kswapd and critical
processes (which typically require a large amount of memory and trigger
memory reclamation) to compete for CPU resources. which in turn affects
the execution of this key process, causing the execution time to
increase and causing lags,such as dropped frames or slower startup times
in mobile games.
Currently, with the help of kernel trace events or tools like Perfetto,
we can only see that kswapd is competing for CPU and the frequency of
memory reclamation triggers, but we do not have detailed information or
metrics about memory reclamation, such as the duration and amount of
each reclamation, or who is releasing memory (super_cache, f2fs, ext4),
etc. This makes it impossible to locate the above problems.
Currently this patch helps us solve 2 actual performance problems
(kswapd preempts the CPU causing game delay)
1. The increased memory allocation in the game (across different
versions) has led to the degradation of kswapd.
This is found by calculating the total amount of Reclaim(page)
during the game startup phase.
2. The adoption of a different file system in the new system version has
resulted in a slower reclamation rate.
This is discovered through the OBJ_NAME change. For example,
OBJ_NAME changes from super_cache_scan to ext4_es_scan.
Subsequently, it is also possible to calculate the memory reclamation
rate to evaluate the memory performance of different versions.
The main reasons for adding static tracepoints are:
1. To subdivide the time spent in the shrinker->count_objects() and
shrinker->scan_objects() functions within the do_shrink_slab function.
Using BPF kprobe, we can only track the time spent in the do_shrink_slab
function.
2. When tracing frequently called functions, static tracepoints (BPF
tp/tracepoint) have lower performance impact compared to dynamic
tracepoints (BPF kprobe).
Thanks
Bixuan Cui