From: Namhyung Kim <namhy...@google.com> The kernel can allocate a lot of struct perf_event when profiling. For example, 256 cpu x 8 events x 20 cgroups = 40K instances of the struct would be allocated on a large system.
The size of struct perf_event in my setup is 1152 byte. As it's allocated by kmalloc, the actual allocation size would be rounded up to 2K. Then there's 896 byte (~43%) of waste per instance resulting in total ~35MB with 40K instances. We can create a dedicated kmem_cache to avoid such a big unnecessary memory consumption. With this change, I can see below (note this machine has 112 cpus). # grep perf_event /proc/slabinfo perf_event 224 784 1152 7 2 : tunables 24 12 8 : slabdata 112 112 0 The sixth column is pages-per-slab which is 2, and the fifth column is obj-per-slab which is 7. Thus actually it can use 1152 x 7 = 8064 byte in the 8K, and wasted memory is (8192 - 8064) / 7 = ~18 byte per instance. Signed-off-by: Namhyung Kim <namhy...@kernel.org> --- kernel/events/core.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 5206097d4d3d..10f2548211d0 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -402,6 +402,7 @@ static LIST_HEAD(pmus); static DEFINE_MUTEX(pmus_lock); static struct srcu_struct pmus_srcu; static cpumask_var_t perf_online_mask; +static struct kmem_cache *perf_event_cache; /* * perf event paranoia level: @@ -4591,7 +4592,7 @@ static void free_event_rcu(struct rcu_head *head) if (event->ns) put_pid_ns(event->ns); perf_event_free_filter(event); - kfree(event); + kmem_cache_free(perf_event_cache, event); } static void ring_buffer_attach(struct perf_event *event, @@ -11251,7 +11252,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, return ERR_PTR(-EINVAL); } - event = kzalloc(sizeof(*event), GFP_KERNEL); + event = kmem_cache_zalloc(perf_event_cache, GFP_KERNEL); if (!event) return ERR_PTR(-ENOMEM); @@ -11455,7 +11456,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, put_pid_ns(event->ns); if (event->hw.target) put_task_struct(event->hw.target); - kfree(event); + kmem_cache_free(perf_event_cache, event); return ERR_PTR(err); } @@ -13087,6 +13088,8 @@ void __init perf_event_init(void) ret = init_hw_breakpoint(); WARN(ret, "hw_breakpoint initialization failed with: %d", ret); + perf_event_cache = KMEM_CACHE(perf_event, SLAB_PANIC); + /* * Build time assertion that we keep the data_head at the intended * location. IOW, validation we got the __reserved[] size right. -- 2.31.0.rc2.261.g7f71774620-goog