On 1/23/2018 5:10 AM, Mathieu Poirier wrote:
On 22 January 2018 at 15:15, Jin Yao <[email protected]> wrote:Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao <[email protected]> --- tools/perf/util/evlist.c | 2 +- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 120efd8..9dff74a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide);At first glance I thought your solution would do the trick but perf record does use target->system_wide when the '-a' switch is used. Moreover specifying the '-a' switch doesn't prevent the '--per-thread' option from being used as well, making both target->perf_thread and target_system_wide equal to true (and that is not good). Although not a fan of adding more to struct target, the advantage of having target->all_threads is that we are guaranteed that it isn't used anywhere else. Let me know what you think, Mathieu
If we specify both '-a' and '--per-thread' to perf record, perf record will override'--per-thread'. So now target->per_thread = false, and target->system_wide = true.
If we specify '--per-thread' only to perf record, target->per_thread = true, and target->system_wide = false.
So whatever for any case, target->per_thread && target->system_wide is false.
Since the parameter is false, in thread_map__new_str(), it will not execute the thread_map__new_all_cpus(). So that will not change perf record previous behavior.
In perf stat, it allows the case that target->per_thread and target->system_wide are all true. That means we want to collect system-wide per-thread metrics.
That's my current thinking. Thanks Jin Yao

