On Fri, 6 Jan 2017, Vikas Shivappa wrote: > From: Vikas Shivappa<vikas.shiva...@intel.com> > > This patch adds support to monitor a cgroup x and a task p1 > when p1 is part of cgroup x. Since we cannot write two RMIDs during > sched in the driver handles this.
Again you explain WHAT not WHY.... > This patch introduces a u32 *rmid in the task_struck which keeps track > of the RMIDs associated with the task. There is also a list in the > arch_info of perf_cgroup called taskmon_list which keeps track of tasks > in the cgroup that are monitored. > > The taskmon_list is modified in 2 scenarios. > - at event_init of task p1 which is part of a cgroup, add the p1 to the > cgroup->tskmon_list. At event_destroy delete the task from the list. > - at the time of task move from cgrp x to a cgp y, if task was monitored > remove the task from the cgrp x tskmon_list and add it the cgrp y > tskmon_list. > > sched in: When the task p1 is scheduled in, we write the task RMID in > the PQR_ASSOC MSR Great information. > read(for task p1): As any other cqm task event > > read(for the cgroup x): When counting for cgroup, the taskmon list is > traversed and the corresponding RMID counts are added. > > Tests: Monitoring a cgroup x and a task with in the cgroup x should > work. Emphasis on should. > +static inline int add_cgrp_tskmon_entry(u32 *rmid, struct list_head *l) > +{ > + struct tsk_rmid_entry *entry; > + > + entry = kzalloc(sizeof(struct tsk_rmid_entry), GFP_KERNEL); > + if (!entry) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&entry->list); > + entry->rmid = rmid; > + > + list_add_tail(&entry->list, l); > + > + return 0; And this function has a return value because the return value is never evaluated at any call site. > +} > + > +static inline void del_cgrp_tskmon_entry(u32 *rmid, struct list_head *l) > +{ > + struct tsk_rmid_entry *entry = NULL, *tmp1; And where is *tmp2? What is wrong with simply *tmp? > + > + list_for_each_entry_safe(entry, tmp1, l, list) { > + if (entry->rmid == rmid) { > + > + list_del(&entry->list); > + kfree(entry); > + break; > + } > + } > +} > + > #ifdef CONFIG_CGROUP_PERF > struct cgrp_cqm_info *cqminfo_from_tsk(struct task_struct *tsk) > { > @@ -380,6 +410,49 @@ struct cgrp_cqm_info *cqminfo_from_tsk(struct > task_struct *tsk) > } > #endif > > +static inline void > + cgrp_tskmon_update(struct task_struct *tsk, u32 *rmid, bool ena) Sigh > +{ > + struct cgrp_cqm_info *ccinfo = NULL; > + > +#ifdef CONFIG_CGROUP_PERF > + ccinfo = cqminfo_from_tsk(tsk); > +#endif > + if (!ccinfo) > + return; > + > + if (ena) > + add_cgrp_tskmon_entry(rmid, &ccinfo->tskmon_rlist); > + else > + del_cgrp_tskmon_entry(rmid, &ccinfo->tskmon_rlist); > +} > + > +static int cqm_assign_task_rmid(struct perf_event *event, u32 *rmid) > +{ > + struct task_struct *tsk; > + int ret = 0; > + > + rcu_read_lock(); > + tsk = event->hw.target; > + if (pid_alive(tsk)) { > + get_task_struct(tsk); This works because after the pid_alive() check the task cannot be released before issuing get_task_struct(), right? That's voodoo protection. How would a non alive task end up as event target? > + > + if (rmid != NULL) > + cgrp_tskmon_update(tsk, rmid, true); > + else > + cgrp_tskmon_update(tsk, tsk->rmid, false); > + > + tsk->rmid = rmid; > + > + put_task_struct(tsk); > + } else { > + ret = -EINVAL; > + } > + rcu_read_unlock(); > + > + return ret; > +} > + > static inline void cqm_enable_mon(struct cgrp_cqm_info *cqm_info, u32 *rmid) > { > if (rmid != NULL) { > @@ -429,8 +502,12 @@ static void cqm_assign_hier_rmid(struct > cgroup_subsys_state *rcss, u32 *rmid) > > static int cqm_assign_rmid(struct perf_event *event, u32 *rmid) > { > + if (is_task_event(event)) { > + if (cqm_assign_task_rmid(event, rmid)) > + return -EINVAL; > + } > #ifdef CONFIG_CGROUP_PERF > - if (is_cgroup_event(event)) { > + else if (is_cgroup_event(event)) { > cqm_assign_hier_rmid(&event->cgrp->css, rmid); > } So you keep adding stuff to cqm_assign_rmid() which handles enable and disable. But the only call site is in cqm_event_free_rmid() which calls that function with rmid = NULL, i.e. disable. Can you finally explain how this is supposed to work and how all of this has been tested and validated? If you had used the already known 'Tests: Same as before' line to the changelog, then we would have known that it's broken as before w/o looking at the patch. So the new variant of 'broken' is: Bla should work .... Thanks, tglx