Hi Peter,

The patch is month old. I checked that it still apply to current tip.
Could you please take a look?

Thanks,
Kan

> 
> From: Kan Liang <kan.li...@intel.com>
> 
> For perf record frequency mode, the initial sample_period is 1. That's
> because perf doesn't know what period should be set. It uses the
> minimum period 1 as the first period. It will trigger an interrupt soon. Then
> there will be enough data to calculate the period for the given frequency.
> But too many very short period like 1 may cause various problems and
> increase the overhead. It's better to limit the 1 period to just the first
> several period setting.
> 
> However, for some workload, 1 period is frequently set. For example, perf
> record a busy loop for 10 seconds.
> 
> perf record ./finity_busy_loop.sh 10
> 
> while [ "A" != "B" ]
> do
> date > /dev/null
> done
> 
> Period was changed 150503 times in 10 seconds. 22.5% (33861 times) of the
> period is set to 1. That's because, in the inherit_event, the period for child
> event is inherit from parent's parent's event, which is usually the default
> sample_period 1. Each child event has to recalculate the period from 1
> everytime. That brings high overhead.
> 
> This patch keeps the sample period average in parent event. Each new
> child event can use it as its initial sample period.
> The avg_sample_period was introduced in struct perf_event to track the
> average sample period information. The information is kept in the parent
> event, since it's the only node everyone knows. For reducing the
> contention of updating parent event, the value is updated every tick.
> We also need to get rid of 1 period as earlier as possible. So the value is
> also updated in the first period adjustment.
> For each new child event, the average sample period of parent event will
> be assigned to the child as its first sample period.
> For each new child event, the parent event refcount++. Parent will not go
> away until all children go away. So it's safe to access its parent.
> 
> After applying this patch, the 1 period rate reduces to 0.1%.
> 
> Signed-off-by: Kan Liang <kan.li...@intel.com>
> 
> ---
> 
> Changes since V1
>  - The average sample period will be kept in parent event
>  - Use atomic64_t to replace local64_t
>  - Only update the AVG every tick or the first time
> 
>  include/linux/perf_event.h |  2 ++
>  kernel/events/core.c       | 21 +++++++++++++++++++--
>  2 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index
> 486e84c..bb886f0 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -403,6 +403,8 @@ struct perf_event {
>       struct list_head                child_list;
>       struct perf_event               *parent;
> 
> +     atomic64_t                      avg_sample_period;
> +
>       int                             oncpu;
>       int                             cpu;
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c index
> af0a5ba..3404d52 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2796,6 +2796,7 @@ static void perf_adjust_period(struct perf_event
> *event, u64 nsec, u64 count, bo
>       struct hw_perf_event *hwc = &event->hw;
>       s64 period, sample_period;
>       s64 delta;
> +     struct perf_event *head_event = (event->parent != NULL) ?
> +event->parent : event;
> 
>       period = perf_calculate_period(event, nsec, count);
> 
> @@ -2809,6 +2810,17 @@ static void perf_adjust_period(struct
> perf_event *event, u64 nsec, u64 count, bo
> 
>       hwc->sample_period = sample_period;
> 
> +     /*
> +      * Update the AVG sample period in first period adjustment
> +      * Then update it every tick to reduce contention.
> +      */
> +     if (!disable || (atomic64_read(&head_event->avg_sample_period)
> == 1)) {
> +             s64 avg_period;
> +
> +             avg_period = (atomic64_read(&head_event-
> >avg_sample_period) + sample_period) / 2;
> +             atomic64_set(&head_event->avg_sample_period,
> avg_period);
> +     }
> +
>       if (local64_read(&hwc->period_left) > 8*sample_period) {
>               if (disable)
>                       event->pmu->stop(event, PERF_EF_UPDATE); @@
> -7030,8 +7042,13 @@ perf_event_alloc(struct perf_event_attr *attr, int
> cpu,
> 
>       hwc = &event->hw;
>       hwc->sample_period = attr->sample_period;
> -     if (attr->freq && attr->sample_freq)
> +     if (attr->freq && attr->sample_freq) {
>               hwc->sample_period = 1;
> +             if (parent_event)
> +                     hwc->sample_period =
> atomic64_read(&parent_event->avg_sample_period);
> +             else
> +                     atomic64_set(&event->avg_sample_period, hwc-
> >sample_period);
> +     }
>       hwc->last_period = hwc->sample_period;
> 
>       local64_set(&hwc->period_left, hwc->sample_period); @@ -
> 7902,7 +7919,7 @@ inherit_event(struct perf_event *parent_event,
>               child_event->state = PERF_EVENT_STATE_OFF;
> 
>       if (parent_event->attr.freq) {
> -             u64 sample_period = parent_event->hw.sample_period;
> +             u64 sample_period = atomic64_read(&parent_event-
> >avg_sample_period);
>               struct hw_perf_event *hwc = &child_event->hw;
> 
>               hwc->sample_period = sample_period;
> --
> 1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to