On Tue, Jun 14, 2016 at 04:10:41PM +0100, Mark Rutland wrote: > However, pmu::filter_match is only called for the leader of each event > group. When the leader is a SW event, we do not filter the groups, and > may fail at pmu::add time, and when this happens we'll give up on > scheduling any event groups later in the list until they are rotated > ahead of the failing group.
Ha! indeed. > I've tried to find a better way of handling this (without needing to walk the > siblings list), but so far I'm at a loss. At least it's "only" O(n) in the > size > of the sibling list we were going to walk anyway. > > I suspect that at a more fundamental level, I need to stop sharing a > perf_hw_context between HW PMUs (i.e. replace task_struct::perf_event_ctxp > with > something that can handle multiple HW PMUs). From previous attempts I'm not > sure if that's going to be possible. > > Any ideas appreciated! So I think I have half-cooked ideas. One of the problems I've been wanting to solve for a long time is that the per-cpu flexible list has priority over the per-task flexible list. I would like them to rotate together. One of the ways I was looking at getting that done is a virtual runtime scheduler (just like cfs). The tricky point is merging two virtual runtime trees. But I think that should be doable if we sort the trees on lag. In any case, the relevance to your question is that once we have a tree, we can play games with order; that is, if we first order on PMU-id and only second on lag, we get whole subtree clusters specific for a PMU. Lost of details missing in that picture, but I think something along those lines might get us what we want.