> On Jun 18, 2020, at 2:09 AM, Athira Rajeev <atraj...@linux.vnet.ibm.com> 
> wrote:
> 
> 
> 
>> On 02-May-2020, at 5:51 AM, Song Liu <songliubrav...@fb.com> wrote:
>> 
>> This patch tries to enable PMU sharing. When multiple perf_events are
>> counting the same metric, they can share the hardware PMU counter. We
>> call these events as "compatible events".
>> 
>> The PMU sharing are limited to events within the same perf_event_context
>> (ctx). When a event is installed or enabled, search the ctx for compatible
>> events. This is implemented in perf_event_setup_dup(). One of these
>> compatible events are picked as the master (stored in event->dup_master).
>> Similarly, when the event is removed or disabled, perf_event_remove_dup()
>> is used to clean up sharing.
>> 
>> If the master event is a cgroup event or a event in a group, it is
>> possible that some slave events are ACTIVE, but the master event is not.
>> To handle this scenario, we introduced PERF_EVENT_STATE_ENABLED. Also,
>> since PMU drivers write into event->count, master event needs another
>> variable (master_count) for the reading of this event.
>> 
>> On the critical paths (add, del read), sharing PMU counters doesn't
>> increase the complexity. Helper functions event_pmu_[add|del|read]() are
>> introduced to cover these cases. All these functions have O(1) time
>> complexity.
>> 
>> Cc: Peter Zijlstra <pet...@infradead.org>
>> Cc: Arnaldo Carvalho de Melo <a...@redhat.com>
>> Cc: Jiri Olsa <jo...@kernel.org>
>> Cc: Alexey Budankov <alexey.budan...@linux.intel.com>
>> Cc: Namhyung Kim <namhy...@kernel.org>
>> Cc: Tejun Heo <t...@kernel.org>
>> Reported-by: kernel test robot <rong.a.c...@intel.com>
>> Signed-off-by: Song Liu <songliubrav...@fb.com>
>> 
>> ---
>> Changes in v13:
>> Fix memory ordering in perf_event_exit_dup_master and perf_event_count.
>> Remove test code in perf_event_can_share().
>> 
>> Changes in v12:
>> Add perf_event->dup_active_list to fix rdpmc. (kernel test robot)
>> Fix event reset. (kernel test robot)
>> Do not consider sharing when the perf event is disabled.
>> 
>> Changes in v11:
>> Fold in major fixes by Peter.
>> In perf_event_remove_dup(), del() old master before add() new master.
>> 
>> Changes in v10:
>> Simplify logic that calls perf_event_setup_dup() and
>> perf_event_remove_dup(). (Peter)
>> Other small fixes. (Peter)
>> 
>> Changes in v9:
>> Avoid ctx_resched() on remove/disable event (Peter).
>> Compare the whole perf_event_attr in perf_event_compatible().
>> Small fixes/improvements (Peter).
>> 
>> Changes in v8:
>> Fix issues with task event (Jiri).
>> Fix issues with event inherit.
>> Fix mmap'ed events, i.e. perf test 4 (kernel test bot).
>> 
>> Changes in v7:
>> Major rewrite to avoid allocating extra master event.
>> (cherry picked from commit 8b98afc0ec100b017a5e80c184809939d4f01f8d)
>> 
>> fix perf_event_count()
>> ---
>> arch/x86/events/core.c     |   7 +
>> include/linux/perf_event.h |  22 +-
>> kernel/events/core.c       | 422 ++++++++++++++++++++++++++++++++++---
>> 3 files changed, 416 insertions(+), 35 deletions(-)
>> 
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 9e63ee50b19a..a245cac31a19 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -2345,6 +2345,11 @@ static int x86_pmu_aux_output_match(struct perf_event 
>> *event)
>>      return 0;
>> }
>> 
>> +static void x86_copy_hw_config(struct perf_event *old, struct perf_event 
>> *new)
>> +{
>> +    new->hw.idx = old->hw.idx;
>> +}
>> +
>> static struct pmu pmu = {
>>      .pmu_enable             = x86_pmu_enable,
>>      .pmu_disable            = x86_pmu_disable,
>> @@ -2373,6 +2378,8 @@ static struct pmu pmu = {
>>      .check_period           = x86_pmu_check_period,
>> 
>>      .aux_output_match       = x86_pmu_aux_output_match,
>> +
>> +    .copy_hw_config         = x86_copy_hw_config,
>> };
>> 
>> void arch_perf_update_userpage(struct perf_event *event,
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 87e21681759c..1306fbd920fd 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -503,6 +503,13 @@ struct pmu {
>>       * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
>>       */
>>      int (*check_period)             (struct perf_event *event, u64 value); 
>> /* optional */
>> +
>> +    /*
>> +     * Copy hw configuration from one event to another. This is used
>> +     * to make switching master faster in PMC sharing.
>> +     */
>> +    void (*copy_hw_config)          (struct perf_event *old,
>> +                                     struct perf_event *new); /* optional */
>> };
>> 
>> enum perf_addr_filter_action_t {
>> @@ -552,6 +559,10 @@ struct perf_addr_filter_range {
>> 
>> /**
>>  * enum perf_event_state - the states of an event:
>> + *
>> + * PERF_EVENT_STATE_ENABLED:        Special state for PMC sharing: the hw 
>> PMC
>> + *                          is enabled, but this event is not counting.
>> + *                          See perf_event_init_dup_master().
>>  */
>> enum perf_event_state {
>>      PERF_EVENT_STATE_DEAD           = -4,
>> @@ -559,7 +570,8 @@ enum perf_event_state {
>>      PERF_EVENT_STATE_ERROR          = -2,
>>      PERF_EVENT_STATE_OFF            = -1,
>>      PERF_EVENT_STATE_INACTIVE       =  0,
>> -    PERF_EVENT_STATE_ACTIVE         =  1,
>> +    PERF_EVENT_STATE_ENABLED        =  1,
>> +    PERF_EVENT_STATE_ACTIVE         =  2,
>> };
>> 
>> struct file;
>> @@ -645,6 +657,7 @@ struct perf_event {
>>      int                             group_caps;
>> 
>>      struct perf_event               *group_leader;
>> +    struct perf_event               *dup_master;  /* for PMU sharing */
>>      struct pmu                      *pmu;
>>      void                            *pmu_private;
>> 
>> @@ -762,6 +775,13 @@ struct perf_event {
>>      void *security;
>> #endif
>>      struct list_head                sb_list;
>> +
>> +    int                             dup_active;
>> +    /* See event_pmu_read_dup() */
>> +    local64_t                       dup_count;
>> +    /* See perf_event_init_dup_master() */
>> +    local64_t                       master_count;
>> +    struct list_head                dup_active_list;
>> #endif /* CONFIG_PERF_EVENTS */
>> };
>> 
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 206dd29c3c6c..40c8b15407b7 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -1002,7 +1002,6 @@ perf_cgroup_event_enable(struct perf_event *event, 
>> struct perf_event_context *ct
>>       */
>>      cpuctx = container_of(ctx, struct perf_cpu_context, ctx);
>> 
>> -    pr_info("%s cpu %d\n", __func__, smp_processor_id());
>>      /*
>>       * Since setting cpuctx->cgrp is conditional on the current @cgrp
>>       * matching the event's cgroup, we must do this for every new event,
>> @@ -1012,8 +1011,6 @@ perf_cgroup_event_enable(struct perf_event *event, 
>> struct perf_event_context *ct
>>      if (ctx->is_active && !cpuctx->cgrp) {
>>              struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
>> 
>> -            pr_info("%s cpu %d cgrp %px\n", __func__, smp_processor_id(),
>> -                    cgrp);
> 
> Hi Song Liu,
> 
> I had  problem in trying to apply this patch on latest upstream. Example, the 
> `pr_info` in perf_cgroup_event_enable. Looks like there were some local 
> changes on top of which this patch was created. Could you please confirm to 
> make sure there are no other changes missing.

Sorry for the confusion. You are right that those two pr_info() are from a local
commit. I pushed latest version to 

https://git.kernel.org/pub/scm/linux/kernel/git/song/linux.git/log/?h=pmu-share

Please let me know if you have problem with it. 

Thanks,
Song

Reply via email to