On 03/15/2015 01:15 AM, Elazar Leibovich wrote: > Hi, > > Not an expert, but my understanding is that it's just technical > difficulty. Performance metrics are being saved in per-cpu buffer. > Having pid==-1 and cpu==-1 means that something would aggregate all > buffers in multiple CPUs to a single buffer. That code must exist, > either in userspace or in the kernel. > > The kernel preferred that this code would be in userspace.
Hi Elazar, I suspected the reasoning was something along those lines. I was hoping that someone could point to archived email threads with earlier discussions showing the complications that would arise by having system-wide setup perf event setup and reading handled in the kernel. Looking through the earlier versions of perf see that pid==-1 and cpu=-1 were not allowed in the very early proposed patches (http://thread.gmane.org/gmane.linux.kernel.cross-arch/2578). However, not much in the way explanation in the design tradeoffs in there. Making user-space set up performance events for each cpu certainly simplifies the kernel code for system-wide monitoring. The cgroup support is essentially like system-wide monitoring with additional filtering on the cgroup and things get more complicated using the perf cgroup support when the cgroups are not pinned to a particular processor, O(cgroups*cpus) opens and reads. If the cgroups is scaled up at the same rate as cpus, this would be O(cpus^2). I am wondering if handling the system-wide case (pid==-1 and cpu==-1) in the kernel would make cgroup and system-wide monitoring more efficient or if the complications in the kernel are just too much. -Will > > On Fri, Mar 13, 2015 at 8:49 PM, William Cohen <[email protected]> wrote: >> Hi All, >> >> I have a design question about the linux kernel perf support. A number of >> /proc statistics aggregate data across all the cpus in the system. Why the >> does perf require the user-space application to enumerate all the processors >> and do a perf_event_open syscall for each of the processors? Why not have a >> perf_event_open with pid=-1 and cpu=-1 mean system-wide event and aggregate >> it in the kernel when the value is read? The line below from design.txt >> specifically say it is invalid. >> >> (Note: the combination of 'pid == -1' and 'cpu == -1' is not valid.) >> >> -Will >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" >> in >> the body of a message to [email protected] >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in > the body of a message to [email protected] > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
