From: Kan Liang <kan.li...@linux.intel.com> On systems with very high context switch rates between cgroups, there are high overhead using cgroup perf.
Current codes have two issues. - System-wide events are mistakenly switched in cgroup context switch. It causes system-wide events miscounting, and brings avoidable overhead. Patch 1 fixes the issue. - The cgroup context switch sched_in is low efficient. All cgroup events share the same per-cpu pinned/flexible groups. The RB trees for pinned/flexible groups don't understand cgroup. Current code has to traverse all events, and use event_filter_match() to filter the events for specific cgroup. Patch 2-4 adds a fast path for cgroup context switch sched_in by training the RB tree to understand cgroup. The extra filtering can be avoided. Here is test with 6 cgroups running. Each cgroup has a specjbb benchmark running. The perf command is as below. perf stat -e cycles,instructions -e cycles,instructions -e cycles,instructions -e cycles,instructions -e cycles,instructions -e cycles,instructions -G cgroup1,cgroup1,cgroup2,cgroup2,cgroup3,cgroup3 -G cgroup4,cgroup4,cgroup5,cgroup5,cgroup6,cgroup6 -a -e cycles,instructions -I 1000 The average RT (Response Time) reported from specjbb is used as key performance metrics. (The lower the better) RT(us) Overhead Baseline (no perf stat): 4286.9 Use cgroup perf, no patches: 4483.6 4.6% Use cgroup perf, apply patch 1: 4369.2 1.9% Use cgroup perf, apple all patches: 4335.3 1.1% Kan Liang (4): perf: Fix system-wide events miscounting during cgroup monitoring perf: Add filter_match() as a parameter for pinned/flexible_sched_in() perf cgroup: Add cgroup ID as a key of RB tree perf cgroup: Add fast path for cgroup switch include/linux/perf_event.h | 7 ++ kernel/events/core.c | 171 +++++++++++++++++++++++++++++++++++++++------ 2 files changed, 157 insertions(+), 21 deletions(-) -- 2.7.4