On Thu, Feb 19, 2015 at 09:40:36PM +0100, Vojtech Pavlik wrote: > On Thu, Feb 19, 2015 at 11:32:55AM -0600, Josh Poimboeuf wrote: > > On Thu, Feb 19, 2015 at 06:19:29PM +0100, Vojtech Pavlik wrote: > > > On Thu, Feb 19, 2015 at 11:03:53AM -0600, Josh Poimboeuf wrote: > > > > On Thu, Feb 19, 2015 at 05:33:59PM +0100, Vojtech Pavlik wrote: > > > > > On Thu, Feb 19, 2015 at 10:24:29AM -0600, Josh Poimboeuf wrote: > > > > > > > > > > > > No, these tasks will _never_ make syscalls. So you need to > > > > > > > guarantee > > > > > > > they don't accidentally enter the kernel while you flip them. > > > > > > > Something > > > > > > > like so should do. > > > > > > > > > > > > > > You set TIF_ENTER_WAIT on them, check they're still in userspace, > > > > > > > flip > > > > > > > them then clear TIF_ENTER_WAIT. > > > > > > > > > > > > Ah, that's a good idea. But how do we check if they're in user > > > > > > space? > > > > > > > > > > I don't see the benefit in holding them in a loop - you can just as > > > > > well > > > > > flip them from the syscall code as kGraft does. > > > > > > > > But we were talking specifically about HPC tasks which never make > > > > syscalls. > > > > > > Yes. I'm saying that rather than guaranteeing they don't enter the > > > kernel (by having them spin) you can flip them in case they try to do > > > that instead. That solves the race condition just as well. > > > > Ok, gotcha. > > > > We'd still need a safe way to check if they're in user space though. > > Having a safe way would be very nice and actually quite useful in other > cases, too. > > For this specific purpose, however, we don't need a very safe way, > though. We don't require atomicity in any way, we don't mind even if it > creates false negatives, only false positives would be bad. > > kGraft looks at the stacktrace of CPU hogs and if it finds no kernel > addresses there, it assumes userspace. Not very nice, but does the job.
So I've looked at kgr_needs_lazy_migration(), but I still have no idea how it works. First of all, I think reading the stack while its being written to could give you some garbage values, and a completely wrong nr_entries value from save_stack_trace_tsk(). But also, how would you walk a stack without knowing its stack pointer? That function relies on the saved stack pointer in task_struct.thread.sp, which, AFAICT, was last saved during the last call to schedule(). Since then, the stack could have been completely rewritten, with different size stack frames, before the task exited the kernel. Am I missing something? -- Josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/