On Sun, Jan 07, 2018 at 05:03:47PM +0100, Jiri Olsa wrote:
> Currently we use perf_event_context::task_ctx_data to save
> and restore the LBR status when the task is scheduled out
> and in.
> 
> We don't allocate it for child contexts, which results in
> shorter task's LBR stack, because we don't save the history
> from previous run and start over every time we schedule the
> task in.
> 
> I made a test to generate samples with LBR call stack
> and got higher numbers on bigger chain depths:
> 
>                             before:     after:
>   LBR call chain: nr: 1       60561     498127
>   LBR call chain: nr: 2           0          0
>   LBR call chain: nr: 3      107030       2172
>   LBR call chain: nr: 4      466685      62758
>   LBR call chain: nr: 5     2307319     878046
>   LBR call chain: nr: 6       48713     495218
>   LBR call chain: nr: 7        1040       4551
>   LBR call chain: nr: 8         481        172
>   LBR call chain: nr: 9         878        120
>   LBR call chain: nr: 10       2377       6698
>   LBR call chain: nr: 11      28830     151487
>   LBR call chain: nr: 12      29347     339867
>   LBR call chain: nr: 13          4         22
>   LBR call chain: nr: 14          3         53

Acked-by: Peter Zijlstra (Intel) <[email protected]>

Fixes: 4af57ef28c2c ("perf: Add pmu specific data for perf task context")

> Cc: Andi Kleen <[email protected]>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
>  kernel/events/core.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4df5b695bf0d..55fb648a32b0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10703,6 +10703,19 @@ inherit_event(struct perf_event *parent_event,
>       if (IS_ERR(child_event))
>               return child_event;
>  
> +
> +     if ((child_event->attach_state & PERF_ATTACH_TASK_DATA) &&
> +         !child_ctx->task_ctx_data) {
> +             struct pmu *pmu = child_event->pmu;
> +
> +             child_ctx->task_ctx_data = kzalloc(pmu->task_ctx_size,
> +                                                GFP_KERNEL);
> +             if (!child_ctx->task_ctx_data) {
> +                     free_event(child_event);
> +                     return NULL;
> +             }
> +     }
> +
>       /*
>        * is_orphaned_event() and list_add_tail(&parent_event->child_list)
>        * must be under the same lock in order to serialize against
> @@ -10713,6 +10726,7 @@ inherit_event(struct perf_event *parent_event,
>       if (is_orphaned_event(parent_event) ||
>           !atomic_long_inc_not_zero(&parent_event->refcount)) {
>               mutex_unlock(&parent_event->child_mutex);
> +             /* task_ctx_data is freed with child_ctx */
>               free_event(child_event);
>               return NULL;
>       }
> -- 
> 2.13.6
> 

Reply via email to