On Tue, Feb 07, 2017 at 12:08:09AM -0800, Stephane Eranian wrote: > Hi, > > I wanted to take a few steps back and look at the overall goals for > cache monitoring. > From the various threads and discussion, my understanding is as follows. > > I think the design must ensure that the following usage models can be > monitored: > - the allocations in your CAT partitions > - the allocations from a task (inclusive of children tasks) > - the allocations from a group of tasks (inclusive of children tasks) > - the allocations from a CPU > - the allocations from a group of CPUs > > All cases but first one (CAT) are natural usage. So I want to describe > the CAT in more details. > The goal, as I understand it, it to monitor what is going on inside > the CAT partition to detect > whether it saturates or if it has room to "breathe". Let's take a > simple example.
By "natural usage" you mean "like perf(1) provides for other events"? But we are trying to figure out requirements here ... what data do people need to manage caches and memory bandwidth. So from this perspective monitoring a CAT group is a natural first choice ... did we provision this group with too much, or too little cache. >From that starting point I can see that a possible next step when finding that a CAT group has too small a cache is to drill down to find out how the tasks in the group are using cache. Armed with that information you could move tasks that hog too much cache (and are believed to be streaming through memory) into a different CAT group. What I'm not seeing is how drilling to CPUs helps you. Say you have CPUs=CPU0,CPU1 in the CAT group and you collect data that shows that 75% of the cache occupancy is attributed to CPU0, and only 25% to CPU1. What can you do with this information to improve things? If it is deemed too complex (from a kernel code perspective) to implement per-CPU reporting how bad a loss would that be? -Tony

