On 08/14/2014 12:12 PM, Oleg Nesterov wrote: > On 08/14, Rik van Riel wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 08/14/2014 10:24 AM, Oleg Nesterov wrote: >>> On 08/13, Rik van Riel wrote: >>>> >>>> @@ -862,11 +862,9 @@ void do_sys_times(struct tms *tms) { >>>> cputime_t tgutime, tgstime, cutime, cstime; >>>> >>>> - spin_lock_irq(¤t->sighand->siglock); >>>> thread_group_cputime_adjusted(current, &tgutime, &tgstime); >>>> cutime = current->signal->cutime; cstime = >>>> current->signal->cstime; - >>>> spin_unlock_irq(¤t->sighand->siglock); >>> >>> Ah, wait, there is another problem afaics... >> >> Last night I worked on another problem with this code. >> >> After propagating the stats from a dying task to the signal struct, >> we need to make sure that that task's stats are not counted twice. > > Heh indeed ;) Can't understand how I missed that. > >> This requires zeroing the stats under the write_seqlock, which was >> easy enough to add. > > Or you can expand the scope of write_seqlock/write_sequnlock, so that > __unhash_process in called from inside the critical section. This looks > simpler at first glance.
The problem with that is that wait_task_zombie() calls thread_group_cputime_adjusted() in that if() branch, and that code ends up taking the seqlock for read... However, in __exit_signal that approach should work. > Hmm, wait, it seems there is yet another problem ;) Afaics, you also > need to modify __exit_signal() so that ->sum_sched_runtime/etc are > accounted unconditionally, even if the group leader exits. > > Probably this is not a big problem, and sys_times() or clock_gettime() > do not care at all because they use current. > > But without this change thread_group_cputime(reaped_zombie) won't look > at this task_struct at all, this can lead to non-monotonic result if > it was previously called when this task was alive (non-reaped). You mean this whole block needs to run regardless of whether the group is dead? task_cputime(tsk, &utime, &stime); write_seqlock(&sig->stats_lock); sig->utime += utime; sig->stime += stime; sig->gtime += task_gtime(tsk); sig->min_flt += tsk->min_flt; sig->maj_flt += tsk->maj_flt; sig->nvcsw += tsk->nvcsw; sig->nivcsw += tsk->nivcsw; sig->inblock += task_io_get_inblock(tsk); sig->oublock += task_io_get_oublock(tsk); task_io_accounting_add(&sig->ioac, &tsk->ioac); sig->sum_sched_runtime += tsk->se.sum_exec_runtime; How does that square with wait_task_zombie reaping the statistics of the whole group with thread_group_cputime_adjusted() when the group leader is exiting? Could that lead to things being double-counted? Or do you mean ONLY ->sum_sched_runtime is unconditionally accounted in __exit_signal(), because wait_task_zombie() seems to be missing that one? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/