On Thu, 30 Jul 2015 02:44:45 +0200 Frederic Weisbecker <fweis...@gmail.com> wrote:
> > On Wed, Jul 29, 2015 at 01:24:16PM -0400, Chris Metcalf wrote: > > On 07/29/2015 09:23 AM, Frederic Weisbecker wrote: > > >>At a higher level, is the posix-cpu-timers code here really providing the > > >>>right semantics? It seems like before, the code was checking a struct > > >>>task-specific state, and now you are setting a global state such that if > > >>>ANY > > >>>task anywhere in the system (even on housekeeping cores) has a pending > > >>>posix > > >>>cpu timer, then nothing can go into nohz_full mode. > > >>> > > >>>Perhaps what is needed is a task_struct->tick_dependency to go along with > > >>>the system-wide and per-cpu flag words? > > >That's an excellent point! Indeed the tick dependency check on > > >posix-cpu-timers > > >was made on task granularity before and now it's a global dependency. > > > > > >Which means that if any task in the system has a posix-cpu-timer enqueued, > > >it > > >prevents all CPUs from shutting down the tick. I need to mention that in > > >the > > >changelog. > > > > > >Now here is the rationale: I expect that nohz full users are not > > >interested in > > >posix cpu timers at all. The only chance for one to run without breaking > > >the > > >isolation is on housekeeping CPUs. So perhaps there is a corner case > > >somewhere > > >but I assume there isn't until somebody reports an issue. > > > > > >Keeping a task level dependency check means that we need to update it on > > >context > > >switch. Plus it's not only about task but also process. So that means two > > >states to update on context switch and to check from interrupts. I don't > > >think > > >it's worth the effort if there is no user at all. > > > > I really worry about this! The vision EZchip offers our customers is > > that they can run whatever they want on the slow path housekeeping > > cores, i.e. random control-plane code. Then, on the fast-path cores, > > they run their nohz_full stuff without interruption. Often they don't > > even know what the hell is running on their control plane cores - SNMP > > or random third-party crap or god knows what. And there is a decent > > likelihood that some posix cpu timer code might sneak in. I share this thinking. We do the exactly same thing for KVM-RT and I wouldn't be surprised at all if a posix timer pops up in the housekeeping CPUs. > I see. But note that installing a posix cpu timer ends up triggering an > IPI to all nohz full CPUs. That's how nohz full has always behaved. > So users running posix timers on nohz should already suffer issues anyway. I haven't checked how this would affect us, but seems a lot less serious then not having nohz at all. > > > > > You mentioned needing two fields, for task and for process, but in > > fact let's just add the one field to the one thing that needs it and > > not worry about additional possible future needs. And note that it's > > the task_struct->signal where we need to add the field for posix cpu > > timers (the signal_struct) since that's where the sharing occurs, and > > given CLONE_SIGHAND I imagine it could be different from the general > > "process" model anyway. > > Well, posix cpu timers can be install per process (signal struct) or > per thread (task struct). > > But we can certainly simplify that with a per process flag and expand > the thread dependency to the process scope. > > Still there is the issue of telling the CPUs where a process runs when > a posix timer is installed there. There is no process-like tsk->cpus_allowed. > Either we send an IPI everywhere like we do now or we iterate through all > threads in the process to OR all their cpumasks in order to send that IPI. > > > > > In any case it seems like we don't need to do work at context switch. > > Updates to the task's tick_dependency are just done as normal in the > > task context via "current->signal->". When we are returning to user > > space and we want to check the tick, again, we can just read via > > "current->signal->". Why would we need to copy the value around at > > task switch time? That's only necessary if you want to do something > > like read/write the task tick_dependency via the cpu index, I would think. > > Yeah you're right, at least the context switch should be fine. > > Thanks. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/