Re: [PATCH] perf evsel: Get group fd from CPU0 for system wide event

Jin, Yao Thu, 14 May 2020 23:05:24 -0700

Hi Jiri,

On 5/9/2020 3:37 PM, Jin, Yao wrote:

Hi Jiri,


On 5/5/2020 8:03 AM, Jiri Olsa wrote:

On Sat, May 02, 2020 at 10:33:59AM +0800, Jin, Yao wrote:

SNIP

@@ -1461,6 +1461,9 @@ static int get_group_fd(struct evsel *evsel, int cpu, int 
thread)
       BUG_ON(!leader->core.fd);
       fd = FD(leader, cpu, thread);
+    if (fd == -1 && leader->core.system_wide)


fd does not need to be -1 in here.. in my setup cstate_pkg/c2-residency/
has cpumask 0, so other cpus never get open and are 0, and the whole thing
ends up with:

    sys_perf_event_open: pid -1  cpu 1  group_fd 0  flags 0
    sys_perf_event_open failed, error -9

I actualy thought we put -1 to fd array but couldn't find it.. perhaps we 
should od that


I have tested on two platforms. On KBL desktop fd is 0 for this case, but on
oncascadelakex server, fd is -1, so the BUG_ON(fd == -1) is triggered.

+        fd = FD(leader, 0, thread);
+


so how do we group following events?

    cstate_pkg/c2-residency/ - cpumask 0
    msr/tsc/                 - all cpus


Not sure if it's enough to only use cpumask 0 because
cstate_pkg/c2-residency/ should be per-socket.

cpu 0 is fine.. the rest I have no idea ;-)


Perhaps we directly remove the BUG_ON(fd == -1) assertion?


I think we need to make clear how to deal with grouping over
events that comes for different cpus

    so how do we group following events?

       cstate_pkg/c2-residency/ - cpumask 0
       msr/tsc/                 - all cpus


what's the reason/expected output of groups with above events?
seems to make sense only if we limit msr/tsc/ to cpumask 0 as well

jirka

On 2-socket machine (e.g cascadelakex), "cstate_pkg/c2-residency/" is per-socket event and thecpumask is 0 and 24.


root@lkp-csl-2sp5 /sys/devices/cstate_pkg# cat cpumask
0,24

We can't limit it to cpumask 0. It should be programmed on CPU0 and CPU24 (the first CPU on eachsocket).

The "msr/tsc" are per-cpu event, it should be programmed on all cpus. So I don't think we can limitmsr/tsc to cpumask 0.


The issue is how we deal with get_group_fd().

static int get_group_fd(struct evsel *evsel, int cpu, int thread)
{
         struct evsel *leader = evsel->leader;
         int fd;

         if (evsel__is_group_leader(evsel))
                 return -1;

         /*
          * Leader must be already processed/open,
          * if not it's a bug.
          */
         BUG_ON(!leader->core.fd);

         fd = FD(leader, cpu, thread);
         BUG_ON(fd == -1);

         return fd;
}

When evsel is "msr/tsc/",

FD(leader, 0, 0) is 3 (3 is the fd of "cstate_pkg/c2-residency/" on CPU0)
FD(leader, 1, 0) is -1
BUG_ON asserted.

If we just return group_fd(-1) for "msr/tsc", it looks like it's not a problem, 
is it?

Thanks
Jin Yao


I think I get the root cause. That should be a serious bug in get_group_fd, 
access violation!

For a group mixed with system-wide event and per-core event and the group leader is system-wideevent, access violation will happen.


perf_evsel__alloc_fd allocates one FD member for system-wide event (only 
FD(evsel, 0, 0) is valid).

But for per core event, perf_evsel__alloc_fd allocates N FD members (N = ncpus). For example, forncpus is 8, FD(evsel, 0, 0) to FD(evsel, 7, 0) are valid.


get_group_fd(struct evsel *evsel, int cpu, int thread)
{
    struct evsel *leader = evsel->leader;

    fd = FD(leader, cpu, thread);    /* access violation may happen here */
}

If leader is system-wide event, only the FD(leader, 0, 0) is valid.

When get_group_fd accesses FD(leader, 1, 0), access violation happens.

My fix is:

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 28683b0eb738..db05b8a1e1a8 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1440,6 +1440,9 @@ static int get_group_fd(struct evsel *evsel, int cpu, int 
thread)
        if (evsel__is_group_leader(evsel))
                return -1;

+       if (leader->core.system_wide && !evsel->core.system_wide)
+               return -2;
+
        /*
         * Leader must be already processed/open,
         * if not it's a bug.
@@ -1665,6 +1668,11 @@ static int evsel__open_cpu(struct evsel *evsel, struct 
perf_cpu_map *cpus,
                                pid = perf_thread_map__pid(threads, thread);

                        group_fd = get_group_fd(evsel, cpu, thread);
+                       if (group_fd == -2) {
+                               errno = EINVAL;
+                               err = -EINVAL;
+                               goto out_close;
+                       }
 retry_open:
                        test_attr__ready();

It enables the perf_evlist__reset_weak_group. And in the second_pass (in __run_perf_stat), theevents will be opened successfully.


I have tested OK for this fix on cascadelakex.

Thanks
Jin Yao

Re: [PATCH] perf evsel: Get group fd from CPU0 for system wide event

Reply via email to