cqm: Add support for monitoring task and cgroup together

Thomas Gleixner Tue, 17 Jan 2017 10:15:57 -0800

On Fri, 6 Jan 2017, Vikas Shivappa wrote:

> From: Vikas Shivappa<vikas.shiva...@intel.com>
> 
> This patch adds support to monitor a cgroup x and a task p1
> when p1 is part of cgroup x. Since we cannot write two RMIDs during
> sched in the driver handles this.


Again you explain WHAT not WHY....

> This patch introduces a u32 *rmid in the task_struck which keeps track
> of the RMIDs associated with the task.  There is also a list in the
> arch_info of perf_cgroup called taskmon_list which keeps track of tasks
> in the cgroup that are monitored.
> 
> The taskmon_list is modified in 2 scenarios.
> - at event_init of task p1 which is part of a cgroup, add the p1 to the
> cgroup->tskmon_list. At event_destroy delete the task from the list.
> - at the time of task move from cgrp x to a cgp y, if task was monitored
> remove the task from the cgrp x tskmon_list and add it the cgrp y
> tskmon_list.
> 
> sched in: When the task p1 is scheduled in, we write the task RMID in
> the PQR_ASSOC MSR

Great information.

> read(for task p1): As any other cqm task event
> 
> read(for the cgroup x): When counting for cgroup, the taskmon list is
> traversed and the corresponding RMID counts are added.
> 
> Tests: Monitoring a cgroup x and a task with in the cgroup x should
> work.

Emphasis on should.

> +static inline int add_cgrp_tskmon_entry(u32 *rmid, struct list_head *l)
> +{
> +     struct tsk_rmid_entry *entry;
> +
> +     entry = kzalloc(sizeof(struct tsk_rmid_entry), GFP_KERNEL);
> +     if (!entry)
> +             return -ENOMEM;
> +
> +     INIT_LIST_HEAD(&entry->list);
> +     entry->rmid = rmid;
> +
> +     list_add_tail(&entry->list, l);
> +
> +     return 0;

And this function has a return value because the return value is never
evaluated at any call site.

> +}
> +
> +static inline void del_cgrp_tskmon_entry(u32 *rmid, struct list_head *l)
> +{
> +     struct tsk_rmid_entry *entry = NULL, *tmp1;

And where is *tmp2? What is wrong with simply *tmp?

> +
> +     list_for_each_entry_safe(entry, tmp1, l, list) {
> +             if (entry->rmid == rmid) {
> +
> +                     list_del(&entry->list);
> +                     kfree(entry);
> +                     break;
> +             }
> +     }
> +}
> +
>  #ifdef CONFIG_CGROUP_PERF
>  struct cgrp_cqm_info *cqminfo_from_tsk(struct task_struct *tsk)
>  {
> @@ -380,6 +410,49 @@ struct cgrp_cqm_info *cqminfo_from_tsk(struct 
> task_struct *tsk)
>  }
>  #endif
>  
> +static inline void
> +     cgrp_tskmon_update(struct task_struct *tsk, u32 *rmid, bool ena)

Sigh

> +{
> +     struct cgrp_cqm_info *ccinfo = NULL;
> +
> +#ifdef CONFIG_CGROUP_PERF
> +     ccinfo = cqminfo_from_tsk(tsk);
> +#endif
> +     if (!ccinfo)
> +             return;
> +
> +     if (ena)
> +             add_cgrp_tskmon_entry(rmid, &ccinfo->tskmon_rlist);
> +     else
> +             del_cgrp_tskmon_entry(rmid, &ccinfo->tskmon_rlist);
> +}
> +
> +static int cqm_assign_task_rmid(struct perf_event *event, u32 *rmid)
> +{
> +     struct task_struct *tsk;
> +     int ret = 0;
> +
> +     rcu_read_lock();
> +     tsk = event->hw.target;
> +     if (pid_alive(tsk)) {
> +             get_task_struct(tsk);

This works because after the pid_alive() check the task cannot be
released before issuing get_task_struct(), right?

That's voodoo protection. How would a non alive task end up as event
target?

> +
> +             if (rmid != NULL)
> +                     cgrp_tskmon_update(tsk, rmid, true);
> +             else
> +                     cgrp_tskmon_update(tsk, tsk->rmid, false);
> +
> +             tsk->rmid = rmid;
> +
> +             put_task_struct(tsk);
> +     } else {
> +             ret = -EINVAL;
> +     }
> +     rcu_read_unlock();
> +
> +     return ret;
> +}
> +
>  static inline void cqm_enable_mon(struct cgrp_cqm_info *cqm_info, u32 *rmid)
>  {
>       if (rmid != NULL) {
> @@ -429,8 +502,12 @@ static void cqm_assign_hier_rmid(struct 
> cgroup_subsys_state *rcss, u32 *rmid)
>  
>  static int cqm_assign_rmid(struct perf_event *event, u32 *rmid)
>  {
> +     if (is_task_event(event)) {
> +             if (cqm_assign_task_rmid(event, rmid))
> +                     return -EINVAL;
> +     }
>  #ifdef CONFIG_CGROUP_PERF
> -     if (is_cgroup_event(event)) {
> +     else if (is_cgroup_event(event)) {
>               cqm_assign_hier_rmid(&event->cgrp->css, rmid);
>       }

So you keep adding stuff to cqm_assign_rmid() which handles enable and
disable. But the only call site is in cqm_event_free_rmid() which calls
that function with rmid = NULL, i.e. disable.

Can you finally explain how this is supposed to work and how all of this
has been tested and validated?

If you had used the already known 'Tests: Same as before' line to the
changelog, then we would have known that it's broken as before w/o looking
at the patch.

So the new variant of 'broken' is: Bla should work ....

Thanks,

        tglx

Re: [PATCH 08/12] x86/cqm: Add support for monitoring task and cgroup together

Reply via email to